I've had a revelation: the virtual machine shouldn't cater to any specific dynamic or trend. What I need the VM to do is provide a basis where very useful features such as arithmetic of arbitrary sized integers, basic control flow, memory management and file I/O are a bit nicer than doing it myself in C. Past that it can be as barebones as necessary. Previous development was done in tandem with the assembler, which influenced how I designed the system. I will no longer do this. Here I describe the creation of more generic opcodes and instructions. These complicate the virtual machined data model but they are also extensible, can be generalised to a wider array of use cases and can be optimised by the virtual machine. Instead, the assembler can use these generic instructions and make porcelains over them to provide a nicer environment. Take the new PUSH opcode: since it can take an arbitrary payload of bytes and push them all onto the stack, the assembler can provide porcelains for shorts, half words and words as well as more interesting macros.
A Virtual Machine (AVM)
An exercise in making a virtual machine in C11 with both a stack and registers.
This repository contains both a shared library (lib) to (de)serialise bytecode and a program (vm) to execute said bytecode.
How to build
Requires GNU make and a compliant C11 compiler. Look
here to change the compiler used.
To build a release version simply run make all. To build a debug
version run make all RELEASE=0 VERBOSE=2 which has most runtime logs
on. This will build:
- instruction bytecode system which provides a shared library for serialising and deserialising bytecode
- VM executable to execute bytecode
Targeting the virtual machine
Link with the shared library libavm.so. The general idea is to
construct a prog_t structure, which consists of:
- A program header with some essential properties of the program (start address, count, etc)
- An array of type
inst_twhich is an ordered set of instructions for execution
This structure can be executed in two ways.
Compilation then separate execution
The prog_t structure along with a sufficiently sized buffer of bytes
(prog_bytecode_size gives the exact number of bytes necessary) can
be used in calling prog_write_bytecode, which will populate the
buffer with the corresponding bytecode.
The buffer is written to some file then executed using the avm
executable. This is the classical way I expect languages to target
the virtual machine.
In memory virtual machine
This method is works by introducing the virtual machine runtime into
the program that wishes to utilise the AVM itself. After constructing
a prog_t structure, it can be fit into a vm_t structure. This
structure maintains various other components such as the stack, heap
and call stack. This structure can then be used with vm_execute_all
to execute the program.
Look at vm/main.c to see this in practice.
Note that this skips the serialising process (i.e. the compilation)
by utilising the runtime directly. I could see this approach being
used when writing an interpreted language such as Lisp where code
should be executed immediately after parsing. Furthermore,
introducing the runtime directly into the calling program gives much
greater control over parameters such as stack/heap size and step by
step execution which can be useful in dynamic contexts. Furthermore,
the prog_t can still be compiled into bytecode whenever required.
Related projects
Assembler program which can compile an assembly-like language to bytecode.
Lines of code
| Files | Lines | Words | Characters |
|---|---|---|---|
| vm/runtime.h | 327 | 872 | 9082 |
| vm/main.c | 136 | 381 | 3517 |
| vm/runtime.c | 735 | 2454 | 26742 |
| vm/struct.c | 252 | 780 | 6805 |
| vm/struct.h | 74 | 204 | 1564 |
| lib/inst.c | 567 | 1369 | 14899 |
| lib/darr.h | 149 | 709 | 4482 |
| lib/inst.h | 277 | 547 | 5498 |
| lib/inst-macro.h | 71 | 281 | 2806 |
| lib/heap.h | 125 | 453 | 3050 |
| lib/base.h | 236 | 895 | 5868 |
| lib/heap.c | 79 | 214 | 1647 |
| lib/base.c | 82 | 288 | 2048 |
| lib/darr.c | 76 | 219 | 1746 |
| total | 3186 | 9666 | 89754 |