Age | Commit message (Collapse) | Author |
|
Note that this is basically the same as the previous version,
excluding the fact that it uses C++ idioms more and does a bit better
in error checking.
|
|
One thing I've realised is that even methods such as this require
error tracking. I won't implement it in the tokenise method as it's
not related to consuming the string per se but instead in the main method.
|
|
I made the escape sequence parsing occur here instead of leaving it to
the main tokenise_buffer function as I think it's better suited here.
|
|
Note the overall size of this function in comparison to the C version,
as well as its clarity.
Of course, it is doing allocations in the background through
std::string which requires more profiling if I want to make this super
efficientâ„¢ but honestly the assembler just needs to work, whereas the
runtime needs to be fast.
|
|
|
|
The implementation for tokenise_symbol is already a lot nicer to look
at and add to due to string/string_view operator overloading of ==.
Furthermore, error handling through pair<> instead of making some
custom structure which essentially does the same thing is already
making me happy for this rewrite.
|
|
Essentially a refactor of the C formed lexer into C++ style. I can
already see some benefits from doing this, in particular speed of
prototyping.
|
|
|
|
Instead of using endian.h that is not portable AND doesn't work with
C++, I'll just write my own using a forced union based type punning
trick.
I've decided to use little endian for the format as well: it seems to
be used by most desktop computers so it should make these functions
faster to run for most CPUs.
|
|
|
|
Best language to use as it's already compatible with the headers I'm
using and can pretty neatly enter the build system while also using
the functions I've built for converting to and from bytecode!
|
|
|
|
This means if I write the new assembler in another language I only
need to FFI this header rather than all the functions as well which
may not be as useful.
|
|
|
|
This means if I write the new assembler in another language I only
need to FFI this header rather than all the functions as well which
may not be as useful.
|
|
|
|
|
|
A token_stream being constructed on the spot has different
used/available properties to a fully constructed one: a fully
constructed token stream uses available to hold the total number of
tokens and used as an internal iterator, while one that is still being
constructed uses the semantics of a standard darr.
Furthermore, some loops didn't divide by ~sizeof(token_t)~ which lead
to iteration over bound errors.
|
|
Basically ensure we're converting to big endian when writing bytecode
and converting from big endian when reading bytecode.
|
|
Have to define _DEFAULT_SOURCE before you can use the endian
conversion functions. As most standard library headers use
features.h, and _DEFAULT_SOURCE must be defined before features.h is
included, we have to include base.h before other headers.
|
|
|
|
|
|
|
|
|
|
Pretty simple
|
|
Though practically this would work, as the storage for the half word is
not limited in any way, nevertheless it isn't syntactically right and
it's better to fix now.
|
|
Just used the endian.h functions to convert host endian to and from
big endian.
|
|
I've flip flopped a bit on this but I believe the virtual machine
bytecode format must have a convention on endianness. This is because
of the issue stated in the TODO which may very well happen.
|
|
|
|
Did some analysis and found that 21! takes above 64 bit integers to
store hence set the limit to 20 instead.
|
|
|
|
Because I was using the hword macros instead of word macros, this
causes truncation of bytes when I didn't want it.
|
|
Just changing some messages and the format of heap printing
|
|
|
|
|
|
Also error now points to the correct place in the file.
|
|
|
|
We have distinct functions for the use blocks and the macro blocks,
which each generate wholesale new token streams via `token_copy` so we
don't run into weird errors around ownership of the internal strings
of each token.
Furthermore, process_presults now uses the stream index in each
presult to report errors when stuff goes wrong.
|
|
So when a presult_t is constructed it holds an index to where it was
constructed in terms of the token stream. This will be useful when
implementing an error checker in the preprocessing or result parsing
stages.
|
|
So %USE <STRING> is the expected call pattern, so there's an error if
there isn't a string after %USE.
The other two errors are file I/O errors i.e. nonexistent files or
errors in parsing the other file. We don't report specifics about the
other file, that should be up to the user to check themselves.
|
|
Forgot to increment buffer->used and memcpy call was just incorrect.
|
|
This essentially just copies the internal string of the token into a
new buffer.
|
|
|
|
|
|
This is different to "%CONST" in that it can take token parameters and
use them. This allows the construction of user code at compile time,
which can be very useful for a variety of use cases.
|
|
Essentially importing another file *literally* into the file. This
would happen before parse results are gathered, similar to how
"%CONST" is implemented currently.
|
|
|
|
Doesn't do much, invalid for most operations.
|
|
Stuff like numeric limits can be codified in constants which act self
documenting.
|
|
|