Preprocesser requires one function to use: preprocess. Takes Tokens
and gives back Units.
A unit is a tree of tokens, where each unit is a node in that tree. A
unit has a "root" token (value of node) and an "expansion" (children
of node) where the root is some preprocesser token (such as a
reference or USE call) and the expansion is the tokens it yields. In
the case of a USE call this is the tokens of the file it includes, in
the case of a reference it's the tokens of the constant it refers to.
This means that the leaves of the tree of units are the completely
preprocessed/expanded form of the source code.
The function has many working components, which may need to be
extracted. In particular, the function ensures we don't include a
source twice through a hash map and that constants are not redefined
in inner include scopes if they're already defined in outer
scopes (i.e. if compiling a.asm which defines constant N, then include
b.asm which defines constant N, then N uses the definition of a.asm
rather than b.asm).
I need to make a spec for this.
I've decided to split the module parsing into two modules, one for the
preprocessing stage which only deals with tokens and the parsing stage
which generates bytecode.
Best language to use as it's already compatible with the headers I'm
using and can pretty neatly enter the build system while also using
the functions I've built for converting to and from bytecode!
A token_stream being constructed on the spot has different
used/available properties to a fully constructed one: a fully
constructed token stream uses available to hold the total number of
tokens and used as an internal iterator, while one that is still being
constructed uses the semantics of a standard darr.
Furthermore, some loops didn't divide by ~sizeof(token_t)~ which lead
to iteration over bound errors.
Happened because we weren't printing all relevant words due to
naturally flooring the result of division. Here I ceil the division
to ensure we get the maximal number of words necessary.
A page is a flexibly allocated structure of bytes, with a count of the
number of bytes already allocated (used) and number of bytes available
overall (available), with a pointer to the next page, if any.
heap_t is a linked list of pages. One may allocate a requested size
off the heap which causes one of two things:
1) Either a page already exists with enough space for the requested
size, in which case that page's pointer is used as the base for the
requested pointer
2) No pages satisfy the requested size, so a new page is allocated
which is the new end of the heap.
Dependencies are just ASM_OUT binary and the corresponding assembly
program for the bytecode output file. Actually works very well, with
changes triggering a recompilation. Also an `exec` recipe is
introduced to do the task of compiling an assembly program and
executing the corresponding bytecode all at once.
As it has no dependencies on vm specifically, and it's more necessary
for any vendors who wish to target the virtual machine, it makes more
sense for inst to be a lib module rather than a vm module.
This is simply a program with an embedded set of instructions which
indefinitely computes and prints fibonacci numbers, computing them in
pairs.
Does it completely through the virtual machine rather than just hard C
instructions.
Also amended the Makefile to compile it. Required moving the main.c
object file into the dependencies of $(DIST)/$(OUT).
I should track the dependencies for fib.c and main.c as well.
Uses some bit hacks to quickly check what data type an opcode may have
by shifting down to units then casting it to a data_type_t.
Not very well tested yet, we'll need to see now.
This is a from the ground rework of an old project of the same name.
I'm hoping to be more concerned with runtime efficiency, bytecode size
and all those things that should actually matter for something that
may host time/space critical code.