#+title: A Virtual Machine (AVM) #+author: Aryadev Chavali #+date: 2023-10-15 An exercise in making a virtual machine in C11 with both a stack and registers. This repository contains both a shared library ([[file:lib/][lib]]) to (de)serialise bytecode and a program ([[file:vm/][vm]]) to execute said bytecode. * How to build Requires =GNU make= and a compliant C11 compiler. Look [[file:Makefile::CC=gcc][here]] to change the compiler used. To build a release version simply run ~make all~. To build a debug version run ~make all RELEASE=0 VERBOSE=2~ which has most runtime logs on. This will build: + [[file:lib/][instruction bytecode system]] which provides a shared library for serialising and deserialising bytecode + [[file:vm/][VM executable]] to execute bytecode * Targeting the virtual machine Link with the shared library =libavm.so=. The general idea is to construct a ~prog_t~ structure, which consists of: 1) A program header with some essential properties of the program (start address, count, etc) 2) An array of type ~inst_t~ which is an ordered set of instructions for execution This structure can be executed in two ways. ** Compilation then separate execution The ~prog_t~ structure along with a sufficiently sized buffer of bytes (~prog_bytecode_size~ gives the exact number of bytes necessary) can be used in calling ~prog_write_bytecode~, which will populate the buffer with the corresponding bytecode. The buffer is written to some file then executed using the =avm= executable. This is the classical way I expect languages to target the virtual machine. ** In memory virtual machine This method is works by introducing the virtual machine runtime into the program that wishes to utilise the AVM itself. After constructing a ~prog_t~ structure, it can be fit into a ~vm_t~ structure. This structure maintains various other components such as the stack, heap and call stack. This structure can then be used with ~vm_execute_all~ to execute the program. Look at [[file:vm/main.c]] to see this in practice. Note that this skips the serialising process (i.e. the /compilation/) by utilising the runtime directly. I could see this approach being used when writing an interpreted language such as Lisp where code should be executed immediately after parsing. Furthermore, introducing the runtime directly into the calling program gives much greater control over parameters such as stack/heap size and step by step execution which can be useful in dynamic contexts. Furthermore, the ~prog_t~ can still be compiled into bytecode whenever required. * Related projects [[https://github.com/aryadev-software/aal][Assembler]] program which can compile an assembly-like language to bytecode. * Lines of code #+begin_src sh :results table :exports results echo 'Files Lines Words Characters' wc -lwc $(find vm/ lib/ -regex ".*\.[ch]\(pp\)?") | awk '{print $4 "\t" $1 "\t" $2 "\t" $3}' #+end_src #+RESULTS: | Files | Lines | Words | Characters | |------------------+-------+-------+------------| | vm/runtime.h | 327 | 872 | 9082 | | vm/main.c | 136 | 381 | 3517 | | vm/runtime.c | 735 | 2454 | 26742 | | vm/struct.c | 252 | 780 | 6805 | | vm/struct.h | 74 | 204 | 1564 | | lib/inst.c | 567 | 1369 | 14899 | | lib/darr.h | 149 | 709 | 4482 | | lib/inst.h | 277 | 547 | 5498 | | lib/inst-macro.h | 71 | 281 | 2806 | | lib/heap.h | 125 | 453 | 3050 | | lib/base.h | 236 | 895 | 5868 | | lib/heap.c | 79 | 214 | 1647 | | lib/base.c | 82 | 288 | 2048 | | lib/darr.c | 76 | 219 | 1746 | |------------------+-------+-------+------------| | total | 3186 | 9666 | 89754 |