avm/README.org

#+title: A Virtual Machine (AVM)
#+author: Aryadev Chavali
#+date: 2023-10-15

An exercise in making a virtual machine in C11 with both a stack and
registers.

This repository contains both a shared library ([[file:lib/][lib]]) to
(de)serialise bytecode and a program ([[file:vm/][vm]]) to execute
said bytecode.

* How to build
Requires =GNU make= and a compliant C11 compiler.  Look
[[file:Makefile::CC=gcc][here]] to change the compiler used.

To build a release version simply run ~make all~.  To build a debug
version run ~make all RELEASE=0 VERBOSE=2~ which has most runtime logs
on.  This will build:
+ [[file:lib/][instruction bytecode system]] which provides a shared
  library for serialising and deserialising bytecode
+ [[file:vm/][VM executable]] to execute bytecode
* Targeting the virtual machine
Link with the shared library =libavm.so=.  The general idea is to
construct a ~prog_t~ structure, which consists of:
1) A program header with some essential properties of the program
   (start address, count, etc)
2) An array of type ~inst_t~ which is an ordered set of instructions
   for execution

This structure can be executed in two ways.
** Compilation then separate execution
The ~prog_t~ structure along with a sufficiently sized buffer of bytes
(~prog_bytecode_size~ gives the exact number of bytes necessary) can
be used in calling ~prog_write_bytecode~, which will populate the
buffer with the corresponding bytecode.

The buffer is written to some file then executed using the =avm=
executable.  This is the classical way I expect languages to target
the virtual machine.
** In memory virtual machine
This method is works by introducing the virtual machine runtime into
the program that wishes to utilise the AVM itself.  After constructing
a ~prog_t~ structure, it can be fit into a ~vm_t~ structure.  This
structure maintains various other components such as the stack, heap
and call stack.  This structure can then be used with ~vm_execute_all~
to execute the program.

Look at [[file:vm/main.c]] to see this in practice.

Note that this skips the serialising process (i.e. the /compilation/)
by utilising the runtime directly.  I could see this approach being
used when writing an interpreted language such as Lisp where code
should be executed immediately after parsing.  Furthermore,
introducing the runtime directly into the calling program gives much
greater control over parameters such as stack/heap size and step by
step execution which can be useful in dynamic contexts.  Furthermore,
the ~prog_t~ can still be compiled into bytecode whenever required.
* Related projects
[[https://github.com/aryadev-software/aal][Assembler]] program which
can compile an assembly-like language to bytecode.
* Lines of code
#+begin_src sh :results table :exports results
echo 'Files     Lines    Words    Characters'
wc -lwc $(find vm/ lib/ -regex ".*\.[ch]\(pp\)?") | awk '{print $4 "\t" $1 "\t" $2 "\t" $3}'
#+end_src

#+RESULTS:
| Files            | Lines | Words | Characters |
|------------------+-------+-------+------------|
| vm/runtime.h     |   327 |   872 |       9082 |
| vm/main.c        |   136 |   381 |       3517 |
| vm/runtime.c     |   735 |  2454 |      26742 |
| vm/struct.c      |   252 |   780 |       6805 |
| vm/struct.h      |    74 |   204 |       1564 |
| lib/inst.c       |   567 |  1369 |      14899 |
| lib/darr.h       |   149 |   709 |       4482 |
| lib/inst.h       |   277 |   547 |       5498 |
| lib/inst-macro.h |    71 |   281 |       2806 |
| lib/heap.h       |   125 |   453 |       3050 |
| lib/base.h       |   236 |   895 |       5868 |
| lib/heap.c       |    79 |   214 |       1647 |
| lib/base.c       |    82 |   288 |       2048 |
| lib/darr.c       |    76 |   219 |       1746 |
|------------------+-------+-------+------------|
| total            |  3186 |  9666 |      89754 |