3.8 KiB
VM Specification
WIP Data types
There are 3 main data types of the virtual machine. They are all unsigned. There exist signed versions of these data types, though there is no difference (internally) between them. For an unsigned type <T> the signed version is simply S_<T>.
| Name | Bits |
| Byte | 8 |
| HWord | 32 |
| Word | 64 |
Generally, the abbreviations B, H and W are used for Byte, HWord and Word respectively. The following table shows a comparison between the data types where an entry (row and column) $A\times{B}$ refers to "How many of A can I fit in B".
| Byte | Hword | Word | |
| Byte | 1 | 4 | 8 |
| HWord | 1/4 | 1 | 2 |
| Word | 1/8 | 1/2 | 1 |
WIP Instructions
An instruction for the virtual machine is composed of an opcode and, potentially, an operand. The opcode represents the behaviour of the instruction i.e. what is the instruction. The operand is an element of one of the data types described previously.
Some instructions do have operands while others do not. The former type of instructions are called UNIT instructions while the latter type are called MULTI instructions1.
All opcodes (with very few exceptions2) have two components:
the root and the type specifier. The root represents the
general behaviour of the instruction: PUSH, POP, MOV, etc. The
type specifier specifies what data type it manipulates. A
complete opcode will be a combination of these two e.g. PUSH_BYTE,
POP_WORD, etc. Some opcodes may have more type specifiers than
others.
TODO Bytecode format
Bytecode files are byte sequence which encode instructions for the virtual machine. Any instruction (even with an operand) has one and only one byte sequence associated with it.
TODO Storage
Two types of storage:
-
Data stack which all core VM routines manipulate and work on (FILO)
DSin shorthand, with indexing from 0 (referring to the top of the stack) up to n (referring to the bottom of the stack). B(DS) refers to the bytes in the stack (the default).
-
Register space which is generally reserved for user space code i.e. other than
movno other core VM routine manipulates the registersRin shorthand, with indexing from 0 to $\infty$.
TODO Standard library
Standard library subroutines reserve the first 16 words (128 bytes) of register space (W(R)[0] to W(R)[15]). The first 8 words (W(R)[0] to W(R)[7]) are generally considered "arguments" to the subroutine while the remaining 8 words (W(R)[8] to W(R)[15]) are considered additional space that the subroutine may access and mutate for internal purposes.
The stack may have additional bytes pushed, which act as the "return value" of the subroutine, but no bytes will be popped off (Stack Preservation).
If a subroutine requires more than 8 words for its arguments, then it will use the stack. This is the only case where the stack is mutated due to a subroutine call, as those arguments will always be popped off the stack.
Subroutines must always end in RET. Therefore, they must always be
called via CALL, never by JUMP (which will always cause error
prone behaviour).
Footnotes
UNIT refers to the fact that the internal representation of these instructions are singular: two instances of the same UNIT instruction will be identical in terms of their binary. On the other hand, two instances of the same MULTI instruction may not be equivalent due to the operand they take. Crucially, most if not all MULTI instructions have different versions for each data type.
NOOP, HALT, MDELETE, MSIZE, JUMP_*