This repository has been archived on 2025-11-10. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
avm/spec.org
2024-07-07 03:04:19 +01:00

11 KiB

VM Specification

Data types

There are 4 main data types of the virtual machine. They are all unsigned.

Name Bits
Byte 8
Short 16
HWord 32
Word 64

Generally, the abbreviations B, S, H and W are used for Byte, Short, HWord and Word respectively. The following table shows a comparison between the data types where an entry (row and column) $A\times{B}$ refers to "How many of A can I fit in B".

Byte Short HWord Word
Byte 1 2 4 8
Short 1/2 1 2 4
HWord 1/4 1/2 1 2
Word 1/8 1/4 1/2 1

These unsigned types can be trivially considered signed via 2s complement. The signed version of some unsigned type is abbreviated by prefixing the type with a S_. So the signed version of each type is S_B, S_S, S_H, S_W.

TODO Storage

There are 4 forms of storage available in the virtual machine: the stack, registers and heap. The stack, registers and call stack are considered fixed storage in that they have an exact fixed capacity within the virtual machine. The heap, on the other hand, can grow dynamically as it supports user requested allocations and is thus considered dynamic storage.

Stack

  • FILO data structure
  • S in shorthand
  • ptr represents the top of the stack at any one point during execution, 0 refers to the address for the bottom of the stack (aka the minimal value of ptr) and n refers to the address of the end of the usable stack space (aka the maximal value for ptr, MAX_STACK)

Registers

  • constant time read/write data structure
  • R in shorthand
  • Reserves m bytes of space (called the MAX_REG), where m must be a positive multiple of 8
  • May be indexed via a pointer in one of the 4 following forms:

    • b<i>: the ith byte, where i in [0, m)
    • s<i>: the ith short, where i in [0, m/2)
    • h<i>: the ith hword, where i in [0, m/4)
    • w<i>: the ith word, where i in [0, m/8)
  • w<i> refers to the 8 bytes between [8i, 8(i+1)), which implicitly refers to the:

    • 8 byte registers {b<j> | j in [8i, 8(i + 1))}
    • 4 short registers {s<j> | j in [4i, 4(i + 1))}
    • 2 hword registers {h<j> | j in [2i, 2(i + 1))}

TODO Heap

  • Random access storage which can be allocated into chunks
  • H in shorthand

Call stack

  • FILO data structure containing program addresses (indexes in the program)
  • C in shorthand
  • Is reserved for a very small subset of operations for control flow

WIP Instructions

An instruction for the virtual machine is composed of an opcode and, optionally, an operand. The opcode represents the specific behaviour of the instruction i.e. what the instruction does. The operand is an element of one of the data types described previously which the opcode uses as part of its function. The operand is optional based on the opcode: certain opcodes will never require an operand.

Operations: abstracting over opcodes

An operation is some generic behaviour, potentially involving data storage. Many operations are generic over data types i.e. they describe some behaviour that works for some subset of types. Opcodes are simply specialisations of operations over some data type. For example the generic behaviour of the operation PUSH, which pushes the operand onto the stack, is specialised into the opcode PUSH_WORD, which pushes the operand, a word, onto the stack. An operation may, thus, describe many opcodes and each opcode is a specialisation of exactly one operation.

The order of an operation is the number of specialisations it has i.e. the number of opcodes that specialise one operation.

Some operations may not be generic over data types in which case they are of order 1 i.e. the opcode describes the exact behaviour of only one operation.

There are only 3 possible orders for operations: 1, 4 and 8. They are given the names Nil, Unsigned and Signed for specialising over:

  • No types
  • The 4 unsigned data types described earlier
  • The 4 unsigned data types and their signed variants as well

Arity

The arity of an operation is the number of input data it takes. An operation can take input in two ways:

  • From the operand, encoded in the bytecode
  • From the stack by popping from the top

An operation that takes n input data from the stack pops n data from the stack to use as input.

Since there can only be at most one operand, an operation that takes input from the operand must have an arity of at least one.

Hence the arity is the sum of inputs taken from both. This can be 0, in which case the operation is nullary. An operation that takes one input, whether that be from the stack or operand, is unary. An operation that takes two inputs, whichever source either are from, is binary.

Orientation

An operation can be considered oriented around a data storage if it only takes input from that data storage. So an operation that only takes input from the stack is stack-oriented. Or an operation that only takes input from the operand is operand-oriented.

Categorisation of operations

With the notation done, we can now describe all operations that the virtual machine supports. Through describing all of these operations, including their orders and what operand they accept (if any), we can describe all opcodes.

Trivial nullary operations

These are NIL order operations which are super simple to describe.

  • NOOP: Doesn't do anything.
  • HALT: Stops execution at point

Moving data in fixed storage

There are 5 operations that move data through fixed storage in the virtual machine. They are of Unsigned order, unary and operand-oriented.

Name Behaviour
PUSH Pushes operand onto stack
POP Pops datum off stack
PUSH_REGISTER Pushes datum from (operand)th register onto stack
MOV Moves datum off stack to the (operand)th register
DUP Pushes the (operand)th datum in stack onto stack

Using the heap

The heap is utilised through a set of "helper" operations that safely abstract the underlying implementation. All of these operations are stack-oriented.

Name Behaviour Arity
MALLOC Allocate n amount of data in the heap, pushing a pointer 1
MSET Pop a value, set the nth datum of data in the heap 3
MGET Push the nth datum of data in the heap onto the stack 3
MDELETE Free data in the heap 1
MSIZE Get the size of allocation in the heap 1

MALLOC, MSET and MGET are of Unsigned order. Due to unsigned and signed types taking the same size, they can be used for signed data as well.

Boolean operations

There are 5 boolean operations. They are of Unsigned order, binary and stack-oriented. These are:

  • NOT
  • OR
  • AND
  • XOR
  • EQ

Though they are all of unsigned order they can be used for signed data trivially.

Comparison operations

There are 4 comparison operations. They are all signed operations, binary and stack-oriented. They are:

  • LT: Less Than
  • LTE: Less Than or Equal
  • GT: Greater Than
  • GTE: Greater Than or Equal

As EQ is an unsigned order operation and doesn't assert anything on the actual values, it can be used for comparing two signed inputs. It doesn't perform a cast when comparing and unsigned and signed input which may mean certain non equivalent values may be considered equal (e.g. 0xFAA9 is a negative number in 2s complement but a positive number in unsigned, considered the same under EQ).

Mathematical operations

There are 3 mathematical operations. They are of unsigned order, binary and stack-oriented. These are:

  • PLUS
  • SUB
  • MULT

Though they are unsigned, any overflowing operation is wrapped around. With some thought these operations can treat unsigned data and be used to generate them.

Control flow operations

There are 2 control flow operations. Each perform a "jump", changing the point of execution to a different point in the program.

Name Order Orientation Arity
JUMP_ABS NIL Operand 1
JUMP_IF UNSIGNED Operand+Stack 2
  • JUMP_ABS interprets the operand as an absolute program address and sets point of execution to that address
  • JUMP_IF pops a datum off the stack and compares it to 0. If true, the point of execution is set to the operand (interpreted as an absolute program address). If false, execution continues past it.

Subroutine operations

There are 2 subroutine operations. They are the only operations that can mutate the call stack. Through utilising reserved storage in the virtual machine that can only be altered through these methods, they abstract control flow to a higher degree than the jump operations.

Name Orientation Arity
CALL Operand 1
RET - 0

The CALL* operations take a program address as input (either from the operand or from the stack). They push the current program address onto the call stack and perform a jump to the input address.

The RET operation pops a program address off the call stack, performing a jump to that address.

These operations allow the implementation of subroutines: sequences of code that can be self contained and generic over a variety of call sites i.e. can return to the address where it was called without hard coding the address.

TODO IO

Currently IO is really bad: the PRINT_* routines are not a nice abstraction over what's really happening and programs cannot take input from stdin.

TODO Bytecode format

Bytecode files are byte sequence which encode instructions for the virtual machine. Any instruction (even with an operand) has one and only one byte sequence associated with it.

Footnotes