89 lines
3.8 KiB
Org Mode
89 lines
3.8 KiB
Org Mode
#+title: VM Specification
|
|
#+author: Aryadev Chavali
|
|
#+description: A specification of instructions for the virtual machine
|
|
#+date: 2023-11-02
|
|
|
|
* WIP Data types
|
|
There are 3 main data types of the virtual machine. They are all
|
|
unsigned. There exist signed versions of these data types, though
|
|
there is no difference (internally) between them. For an unsigned
|
|
type <T> the signed version is simply S_<T>.
|
|
|-------+------|
|
|
| Name | Bits |
|
|
|-------+------|
|
|
| Byte | 8 |
|
|
| HWord | 32 |
|
|
| Word | 64 |
|
|
|-------+------|
|
|
|
|
Generally, the abbreviations B, H and W are used for Byte, HWord and
|
|
Word respectively. The following table shows a comparison between the
|
|
data types where an entry (row and column) $A\times{B}$ refers to "How
|
|
many of A can I fit in B".
|
|
|-------+------+-------+------|
|
|
| | Byte | Hword | Word |
|
|
|-------+------+-------+------|
|
|
| Byte | 1 | 4 | 8 |
|
|
| HWord | 1/4 | 1 | 2 |
|
|
| Word | 1/8 | 1/2 | 1 |
|
|
|-------+------+-------+------|
|
|
* WIP Instructions
|
|
An instruction for the virtual machine is composed of an *opcode* and,
|
|
potentially, an *operand*. The /opcode/ represents the behaviour of
|
|
the instruction i.e. what _is_ the instruction. The /operand/ is an
|
|
element of one of the /data types/ described previously.
|
|
|
|
Some instructions do have /operands/ while others do not. The former
|
|
type of instructions are called *UNIT* instructions while the latter
|
|
type are called *MULTI* instructions[fn:1].
|
|
|
|
All /opcodes/ (with very few exceptions[fn:2]) have two components:
|
|
the *root* and the *type specifier*. The /root/ represents the
|
|
general behaviour of the instruction: ~PUSH~, ~POP~, ~MOV~, etc. The
|
|
/type specifier/ specifies what /data type/ it manipulates. A
|
|
complete opcode will be a combination of these two e.g. ~PUSH_BYTE~,
|
|
~POP_WORD~, etc. Some /opcodes/ may have more /type specifiers/ than
|
|
others.
|
|
* TODO Bytecode format
|
|
Bytecode files are byte sequence which encode instructions for the
|
|
virtual machine. Any instruction (even with an operand) has one and
|
|
only one byte sequence associated with it.
|
|
* TODO Storage
|
|
Two types of storage:
|
|
+ Data stack which all core VM routines manipulate and work on (FILO)
|
|
+ ~DS~ in shorthand, with indexing from 0 (referring to the top of the
|
|
stack) up to n (referring to the bottom of the stack). B(DS)
|
|
refers to the bytes in the stack (the default).
|
|
+ Register space which is generally reserved for user space code
|
|
i.e. other than ~mov~ no other core VM routine manipulates the
|
|
registers
|
|
+ ~R~ in shorthand, with indexing from 0 to $\infty$.
|
|
* TODO Standard library
|
|
Standard library subroutines reserve the first 16 words (128 bytes) of
|
|
register space (W(R)[0] to W(R)[15]). The first 8 words (W(R)[0] to
|
|
W(R)[7]) are generally considered "arguments" to the subroutine while
|
|
the remaining 8 words (W(R)[8] to W(R)[15]) are considered additional
|
|
space that the subroutine may access and mutate for internal purposes.
|
|
|
|
The stack may have additional bytes pushed, which act as the "return
|
|
value" of the subroutine, but no bytes will be popped off (*Stack
|
|
Preservation*).
|
|
|
|
If a subroutine requires more than 8 words for its arguments, then it
|
|
will use the stack. This is the only case where the stack is mutated
|
|
due to a subroutine call, as those arguments will always be popped off
|
|
the stack.
|
|
|
|
Subroutines must always end in ~RET~. Therefore, they must always be
|
|
called via ~CALL~, never by ~JUMP~ (which will always cause error
|
|
prone behaviour).
|
|
* Footnotes
|
|
[fn:2] ~NOOP~, ~HALT~, ~MDELETE~, ~MSIZE~, ~JUMP_*~
|
|
|
|
[fn:1] /UNIT/ refers to the fact that the internal representation of
|
|
these instructions are singular: two instances of the same /UNIT/
|
|
instruction will be identical in terms of their binary. On the other
|
|
hand, two instances of the same /MULTI/ instruction may not be
|
|
equivalent due to the operand they take. Crucially, most if not all
|
|
/MULTI/ instructions have different versions for each /data type/.
|