#+title: Aryadev's Virtual Machine (AVM) #+author: Aryadev Chavali #+date: 2023-10-15 A virtual machine in C11, stack oriented with a dynamic register. Deals primarily in bytes, doesn't make assertions about typing and is very simple to target. This repository contains both a library ([[file:lib/][lib]]) to (de)serialize bytecode and a program ([[file:vm/][vm]]) to execute said bytecode. * How to build Requires =GNU make= and a compliant C11 compiler. Code base has been tested against =gcc= and =clang=, but given how the project has been written without use of GNU'isms (that I'm aware of) it shouldn't be an issue to compile using something like =tcc= or another compiler (look at [[file:Makefile::CC=gcc][here]] to change the compiler used). To build a release version simply run ~make all RELEASE=1~. To build a debug version run ~make all VERBOSE=~ where n can be 0, 1 or 2 depending on how verbose you want logs to standard output to be. This will build: + [[file:lib/][instruction bytecode system]] which provides a shared library for serialising and deserialising bytecode + [[file:vm/][VM executable]] to execute bytecode You may also build each component individually through the corresponding recipe: + ~make lib~ + ~make vm~ * Targeting the virtual machine Link with the shared library =libavm.so= which should be located in the =build= folder. The general idea is to construct a ~prog_t~ structure, which consists of: 1) A program header with some essential properties of the program (start address, count, etc) 2) An array of type ~inst_t~, ordered instructions for execution This structure may be executed in two ways. ** Compilation then separate execution The ~prog_t~ structure along with a sufficiently sized buffer of bytes (using ~prog_bytecode_size~ to get the size necessary) can be used to call ~prog_write_bytecode~, which will populate the buffer with the corresponding bytecode. The buffer is written to some file then executed using the =avm= executable. This is the classical way I expect languages to target the virtual machine. ** In memory virtual machine This method is more involved, introducing the virtual machine runtime into the program itself. After constructing a ~prog_t~ structure, it can be fit into a ~vm_t~ structure. This ~vm_t~ structure also must have a stack, heap and call stack (look at [[file:vm/main.c]] to see this in practice). This structure can then be used with ~vm_execute_all~ to execute the program. Note that this skips the serialising process (i.e. the /compilation/) by utilising the runtime directly. I could see this approach being used when writing an interpreted language such as Lisp where code should be executed immediately after parsing. Furthermore, introducing the runtime directly into the calling program gives much greater control over parameters such as stack/heap size and step by step execution which can be useful in dynamic contexts. Furthermore, the ~prog_t~ can still be compiled into bytecode whenever required. * Related projects [[https://github.com/aryadev-software/aal][Assembler]] program which can compile an assembly-like language to bytecode. * Lines of code #+begin_src sh :results table :exports results wc -lwc $(find vm/ lib/ -regex ".*\.[ch]\(pp\)?") #+end_src #+RESULTS: |------------------+-------+-------+------------| | File | Lines | Words | Characters | |------------------+-------+-------+------------| | vm/runtime.h | 266 | 699 | 7250 | | vm/main.c | 135 | 375 | 3448 | | vm/runtime.c | 802 | 2441 | 23634 | | vm/struct.c | 262 | 783 | 7050 | | vm/struct.h | 69 | 196 | 1531 | | lib/inst.c | 493 | 1215 | 13043 | | lib/darr.h | 149 | 709 | 4482 | | lib/inst.h | 248 | 519 | 4964 | | lib/inst-macro.h | 71 | 281 | 2806 | | lib/heap.h | 125 | 453 | 3050 | | lib/base.h | 190 | 710 | 4633 | | lib/heap.c | 79 | 214 | 1647 | | lib/base.c | 61 | 226 | 1583 | | lib/darr.c | 76 | 219 | 1746 | |------------------+-------+-------+------------| | total | 3026 | 9040 | 80867 | |------------------+-------+-------+------------|