I've had a revelation: the virtual machine shouldn't cater to any
specific dynamic or trend. What I need the VM to do is provide a
basis where very useful features such as arithmetic of arbitrary sized
integers, basic control flow, memory management and file I/O are a bit
nicer than doing it myself in C. Past that it can be as barebones as
necessary.
Previous development was done in tandem with the assembler, which
influenced how I designed the system. I will no longer do this. Here
I describe the creation of more generic opcodes and instructions.
These complicate the virtual machined data model but they are also
extensible, can be generalised to a wider array of use cases and can
be optimised by the virtual machine.
Instead, the assembler can use these generic instructions and make
porcelains over them to provide a nicer environment. Take the new
PUSH opcode: since it can take an arbitrary payload of bytes and push
them all onto the stack, the assembler can provide porcelains for
shorts, half words and words as well as more interesting macros.
Memory operations used operands to encode positional/size arguments,
with stack variants in case the user wanted to programmatically do so.
In most large scale cases I don't see the non-stack variants being
used; many cases require using the stack for additions and
subtractions to create values such as indexes or sizes. Therefore,
it's better to be stack-first.
One counter point is inline optimisation of code at runtime: if an
compile-time-known object is pushed then immediately used in an
operation, we can instead encode the value directly into an operand
based instruction which will speed up execution time because it's
slower to pop the value off the stack than have it available as part
of the instruction.
Little Endian ordering is now ensured for stack based and register
based operations e.g. PUSH now pushes datums in LE ordering onto the
stack, POP pops data on the stack, converting from LE ordering to host
ordering.
More functions in the runtime are now macro defined, so there's less
code while still maintaining the lookup table.
--------------------------------------------------------------------------------
What LE ordering means for actual source code is that I utilise the
convert_*_to_* functions for PUSH and POP. All other data oriented
instructions are completely internal to the system and any data
storage must be in Little Endian. So MOV, PUSH-REGISTER and DUP can
all directly memcpy the data around the system without needing to
consider endian at all.
Macros were a necessity after I felt the redefinition of push for 4
different data types was too much. Most functions are essentially
copies for each datatype. Lisp macros would make this so easy :(
OP_HALT = 1 now. This commit also adjusts the error checking in
inst_read_bytecode.
The main reasoning behind this is when other platforms or applications
target the AVM: whenever a new opcode may be added, the actual binary
for OP_HALT changes (as a result of how C enums work).
Say your application targets commit alpha of AVM. OP_HALT is, say,
98. In commit beta, AVM is updated with a new opcode so OP_HALT is
changed to 99 (due to the new opcode being placed before OP_HALT). If
your application builds a binary for AVM version alpha and AVM version
beta is used instead, OP_HALT will be interpreted as another
instruction, which can lead to undefined behaviour.
This can be hard to debug, so here I've made the decision to try and
not place new opcodes in between old ones; new ones will always be
placed *before* NUMBER_OF_OPCODES.
Instead of using a linked list, which is incredibly fragmented, a
vector keeps all pointers together. Keeps all our stuff together and
in theory we should have less cache misses when deleting pages.
It does introduce the issue of fragmenting, where if we allocate and
then delete many times a lot of the heap vector will be empty so
traversal will be over a ton of useless stuff.