aboutsummaryrefslogtreecommitdiff
path: root/spec.org
blob: 4a995d26f56a406e168a1107709f3fe7f889a278 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
#+title: VM Specification
#+author: Aryadev Chavali
#+description: A specification of instructions for the virtual machine
#+date: 2023-11-02

* WIP Data types
There are 3 main data types of the virtual machine.  They are all
unsigned.  There exist signed versions of these data types, though
there is no difference (internally) between them.  For an unsigned
type <T> the signed version is simply S_<T>.
|-------+------|
| Name  | Bits |
|-------+------|
| Byte  |    8 |
| HWord |   32 |
| Word  |   64 |
|-------+------|

Generally, the abbreviations B, H and W are used for Byte, HWord and
Word respectively.  The following table shows a comparison between the
data types where an entry (row and column) $A\times{B}$ refers to "How
many of A can I fit in B".
|-------+------+-------+------|
|       | Byte | Hword | Word |
|-------+------+-------+------|
| Byte  | 1    |     4 |    8 |
| HWord | 1/4  |     1 |    2 |
| Word  | 1/8  |   1/2 |    1 |
|-------+------+-------+------|
* WIP Instructions
An instruction for the virtual machine is composed of an *opcode* and,
potentially, an *operand*.  The /opcode/ represents the behaviour of
the instruction i.e. what _is_ the instruction.  The /operand/ is an
element of one of the /data types/ described previously.

Some instructions do have /operands/ while others do not.  The former
type of instructions are called *UNIT* instructions while the latter
type are called *MULTI* instructions[fn:1].

All /opcodes/ (with very few exceptions[fn:2]) have two components:
the *root* and the *type specifier*.  The /root/ represents the
general behaviour of the instruction: ~PUSH~, ~POP~, ~MOV~, etc.  The
/type specifier/ specifies what /data type/ it manipulates.  A
complete opcode will be a combination of these two e.g. ~PUSH_BYTE~,
~POP_WORD~, etc.  Some /opcodes/ may have more /type specifiers/ than
others.
* TODO Bytecode format
Bytecode files are byte sequence which encode instructions for the
virtual machine.  Any instruction (even with an operand) has one and
only one byte sequence associated with it.
* TODO Storage
Two types of storage:
+ Data stack which all core VM routines manipulate and work on (FILO)
  + ~DS~ in shorthand, with indexing from 0 (referring to the top of the
    stack) up to n (referring to the bottom of the stack). B(DS)
    refers to the bytes in the stack (the default).
+ Register space which is generally reserved for user space code
  i.e. other than ~mov~ no other core VM routine manipulates the
  registers
  + ~R~ in shorthand, with indexing from 0 to $\infty$.
* TODO Standard library
Standard library subroutines reserve the first 16 words (128 bytes) of
register space (W(R)[0] to W(R)[15]).  The first 8 words (W(R)[0] to
W(R)[7]) are generally considered "arguments" to the subroutine while
the remaining 8 words (W(R)[8] to W(R)[15]) are considered additional
space that the subroutine may access and mutate for internal purposes.

The stack may have additional bytes pushed, which act as the "return
value" of the subroutine, but no bytes will be popped off (*Stack
Preservation*).

If a subroutine requires more than 8 words for its arguments, then it
will use the stack.  This is the only case where the stack is mutated
due to a subroutine call, as those arguments will always be popped off
the stack.

Subroutines must always end in ~RET~.  Therefore, they must always be
called via ~CALL~, never by ~JUMP~ (which will always cause error
prone behaviour).
* Footnotes
[fn:2] ~NOOP~, ~HALT~, ~MDELETE~, ~MSIZE~, ~JUMP_*~

[fn:1] /UNIT/ refers to the fact that the internal representation of
these instructions are singular: two instances of the same /UNIT/
instruction will be identical in terms of their binary.  On the other
hand, two instances of the same /MULTI/ instruction may not be
equivalent due to the operand they take.  Crucially, most if not all
/MULTI/ instructions have different versions for each /data type/.