Age | Commit message (Collapse) | Author |
|
|
|
|
|
Pretty simple fix, stupid error in hindsight.
|
|
Ensures that iteration over vec_out by caller doesn't occur (such as
in a loop to free the memory).
|
|
Name assigned to %CONST is the next symbol in stream, not the symbol
attached to it.
|
|
Instead of %const(<name>) ... %end it will now be %const <name>
... %end i.e. the first symbol after %const will be considered the
name of the constant similar to %use.
|
|
The preprocess_* functions are now privately contained within the
implementation file to help the preprocesser outer function.
Furthermore I've simplified the API of the preprocess_* functions by
making them only return pp_err_t and store their results in a vector
parameter taken by reference.
|
|
|
|
|
|
We leave the parameter tokens alone, considering it constant, while
the parameter vec_out is used to hold the new stream of tokens. This
allows the caller to have a before and after view on the token stream
and reduces the worry of double frees.
|
|
|
|
|
|
|
|
This is actually an improvement on the older lexer.
|
|
Similar principle to pp_err_t in that a structure provides the
opportunity for more information about the error such as location.
|
|
Not much of an actual performance change, more semantic meaning for me.
|
|
|
|
|
|
|
|
|
|
|
|
vec.clear() doesn't delete pointers (unless they're smart) so I need
to do it myself.
|
|
|
|
|
|
They copy and construct new token vectors and just read the token
inputs.
|
|
|
|
Once again quite similar to preprocess_macro_blocks but shorter,
easier to use and easier to read. (76 vs 109)
|
|
Another great thing for C++: the ability to tell it how to print
structures the way I want. In C it's either:
1) Write a function to print the structure out (preferably to a file
pointer)
2) Write a function to return a string (allocated on the heap) which
represents it
Both are not fun to write, whereas it's much easier to write this.
|
|
|
|
While being very similar in style to the C version, it takes 27 lines
of code less to implement it due to the niceties of C++ (41 lines vs
68).
|
|
|
|
|
|
This removes the problem of possibly expensive copies occurring due to
working with tokens produced from the lexer (that C++ just... does):
now we hold pointers where the copy operator is a lot easier to use.
I want expensive stuff to be done by me and for a reason: I want to
be holding the shotgun.
|
|
This C++ rewrite allows me to rewrite the actual API of the system.
In particular, I'm no longer restricting myself to just using enums
then figuring out a way to get proper error logging later down the
line (through tracking tokens in the buffer internally, for example).
Instead I can now design error structures which hold references to the
token they occurred on as well as possible lexical errors (if they're
a FILE_LEXICAL_ERROR which occurs due to the ~%USE~ macro). This
means it's a lot easier to write error logging now at the top level.
|
|
I've decided to split the module parsing into two modules, one for the
preprocessing stage which only deals with tokens and the parsing stage
which generates bytecode.
|
|
This makes enum elements scoped which is actually quite useful as I
prefer the namespacing that enum's give in C++.
|
|
|
|
With error checking!
|
|
Uses std::optional in case file doesn't exist.
|
|
|
|
|
|
|
|
Note that this is basically the same as the previous version,
excluding the fact that it uses C++ idioms more and does a bit better
in error checking.
|
|
One thing I've realised is that even methods such as this require
error tracking. I won't implement it in the tokenise method as it's
not related to consuming the string per se but instead in the main method.
|
|
I made the escape sequence parsing occur here instead of leaving it to
the main tokenise_buffer function as I think it's better suited here.
|
|
Note the overall size of this function in comparison to the C version,
as well as its clarity.
Of course, it is doing allocations in the background through
std::string which requires more profiling if I want to make this super
efficientâ„¢ but honestly the assembler just needs to work, whereas the
runtime needs to be fast.
|
|
|
|
The implementation for tokenise_symbol is already a lot nicer to look
at and add to due to string/string_view operator overloading of ==.
Furthermore, error handling through pair<> instead of making some
custom structure which essentially does the same thing is already
making me happy for this rewrite.
|
|
Essentially a refactor of the C formed lexer into C++ style. I can
already see some benefits from doing this, in particular speed of
prototyping.
|
|
Best language to use as it's already compatible with the headers I'm
using and can pretty neatly enter the build system while also using
the functions I've built for converting to and from bytecode!
|