Copied the code from stack overflow without thinking about it. The
first byte in little endian order should always be LSB so I construct
a more contrived example (0xFFFF0000) which should make it easier to
detect what the first byte is considered on the machine. If it's 0
then the LSB is the first byte hence little endian, otherwise it's big
endian.
On a greater note: Don't never copy no code from stack overflow, bro.
I went up there at 11 o'clock last night trynna get me some code.
Bro, I copied that shit, woke up, my motherfucking LITTLE_ENDIAN
detection don't work. Explain, bro.
While it helped with understanding to use unions as a safe way to
access the underlying bits, this shift based mechanism actually makes
more sense at a glance, particularly by utilising WORD_NTH_BYTE
In particular, __LITTLE_ENDIAN__ was not a functioning macro.
Instead, I implemented a version by hand (copied from IBM) that
actually figures out if the machine is little endian or not.
Thank you unit testing!
No longer relying on darr_t or anything other than the C runtime and
aliases. This means it should be *even easier* to target this via FFI
from other languages without having to initialise my custom made
structures! Furthermore I've removed any form of allocation in the
library so FFI callers don't need to manage memory in any way.
Instead we rely on the caller allocating the correct amount of memory
for the functions to work, with basic error handling if that doesn't
happen.
In the case of inst_read_bytecode, error reporting occurs by making
the return of a function an integer. If the integer is positive it is
the number of bytes read from the buffer. If negative it flags a
possible error, which is a member of read_err_t.
prog_read_bytecode has been split into two functions: prog_read_header
and prog_read_instructions. prog_read_instructions works under the
assumption that the program's header has been filled, e.g. via
prog_read_header. prog_read_header returns 0 if there's not enough
space in the buffer or if the start_address is greater than the count.
prog_read_instructions returns a custom structure which contains an
byte position as well as an error enum, allowing for finer error
reporting.
In the case of inst_write_bytecode via the assumption that the caller
allocated the correct memory there is no need for error reporting.
For prog_write_bytecode if an error occurs due to
In the case of inst_read_bytecode we return the number
Due to reordering I need to have two macros for checking if an opcode
is of a type. If the type is signed then the upper bound must be
OP_<type>_LONG whereas if it is unsigned then the upper bound must be
OP_<type>_WORD.
Moved all opcodes that use unsigned types before the signed types AND
ordered signed types into BYTE, CHAR, HWORD, INT, WORD, LONG. This is
not only logically consistent but also looks prettier.
This simple fix made the routine for OP_POP not require an additional
dispatch step on top of the conditional due to OPCODE_DATA_TYPE,
instead using data_type_t as a map from an opcode's base type to a
specific type.
This "header" is now embedded directly into the struct. The semantic
of a header never really matters in the actual runtime anyway, it's
only for bytecode (de)serialising.
Instead of using endian.h that is not portable AND doesn't work with
C++, I'll just write my own using a forced union based type punning
trick.
I've decided to use little endian for the format as well: it seems to
be used by most desktop computers so it should make these functions
faster to run for most CPUs.
This means if I write the new assembler in another language I only
need to FFI this header rather than all the functions as well which
may not be as useful.
Have to define _DEFAULT_SOURCE before you can use the endian
conversion functions. As most standard library headers use
features.h, and _DEFAULT_SOURCE must be defined before features.h is
included, we have to include base.h before other headers.
Essentially a "program header", followed by a count, followed by
instructions.
Provides a stronger format for bytecode files and allows for better
bounds checking on instructions.
Essentially you may "call" an absolute program address, which pushes
the current address onto the call stack. CALL_STACK does the same
thing but the absolute program address is taken from the data stack.
RET pops an address off the call stack then jumps to that address.
Essentially they use the stack for their one and only operand. This
allows user level control, in particular it allows for loops to work
correctly while using these operands.
This will allow for more library level code to be written. For
example, say you wanted to write a generic byte level reversal
algorithm for dynamically sized allocations. Getting the size of the
allocation would be fundamental to this operation.
One may allocate any number of (bytes|hwords|words), set or get some
index from allocated memory, and delete heap memory.
The idea is that all the relevant datums will be on the stack, so no
register usage. This means no instructions should use register space
at all (other than POP, which I'm debating about currently). Register
space is purely for users.
Instead of having each page be an area of memory, where multiple
pointers to differing data may lie, we instead have each page being
one allocation. This ensures that a deletion algorithm, as provided,
would actually work without destroying older pointers which may have
been allocated. Great!
A page is a flexibly allocated structure of bytes, with a count of the
number of bytes already allocated (used) and number of bytes available
overall (available), with a pointer to the next page, if any.
heap_t is a linked list of pages. One may allocate a requested size
off the heap which causes one of two things:
1) Either a page already exists with enough space for the requested
size, in which case that page's pointer is used as the base for the
requested pointer
2) No pages satisfy the requested size, so a new page is allocated
which is the new end of the heap.
Thankfully multiplication, like addition, is the same under 2s
complement as it is for unsigned numbers. So I just need to implement
those versions to be fine.