Age | Commit message (Collapse) | Author |
|
|
|
While being very similar in style to the C version, it takes 27 lines
of code less to implement it due to the niceties of C++ (41 lines vs
68).
|
|
|
|
|
|
This removes the problem of possibly expensive copies occurring due to
working with tokens produced from the lexer (that C++ just... does):
now we hold pointers where the copy operator is a lot easier to use.
I want expensive stuff to be done by me and for a reason: I want to
be holding the shotgun.
|
|
This C++ rewrite allows me to rewrite the actual API of the system.
In particular, I'm no longer restricting myself to just using enums
then figuring out a way to get proper error logging later down the
line (through tracking tokens in the buffer internally, for example).
Instead I can now design error structures which hold references to the
token they occurred on as well as possible lexical errors (if they're
a FILE_LEXICAL_ERROR which occurs due to the ~%USE~ macro). This
means it's a lot easier to write error logging now at the top level.
|
|
I've decided to split the module parsing into two modules, one for the
preprocessing stage which only deals with tokens and the parsing stage
which generates bytecode.
|
|
This makes enum elements scoped which is actually quite useful as I
prefer the namespacing that enum's give in C++.
|
|
|
|
With error checking!
|
|
Uses std::optional in case file doesn't exist.
|
|
|
|
|
|
|
|
Note that this is basically the same as the previous version,
excluding the fact that it uses C++ idioms more and does a bit better
in error checking.
|
|
One thing I've realised is that even methods such as this require
error tracking. I won't implement it in the tokenise method as it's
not related to consuming the string per se but instead in the main method.
|
|
I made the escape sequence parsing occur here instead of leaving it to
the main tokenise_buffer function as I think it's better suited here.
|
|
Note the overall size of this function in comparison to the C version,
as well as its clarity.
Of course, it is doing allocations in the background through
std::string which requires more profiling if I want to make this super
efficientâ„¢ but honestly the assembler just needs to work, whereas the
runtime needs to be fast.
|
|
|
|
The implementation for tokenise_symbol is already a lot nicer to look
at and add to due to string/string_view operator overloading of ==.
Furthermore, error handling through pair<> instead of making some
custom structure which essentially does the same thing is already
making me happy for this rewrite.
|
|
Essentially a refactor of the C formed lexer into C++ style. I can
already see some benefits from doing this, in particular speed of
prototyping.
|
|
Best language to use as it's already compatible with the headers I'm
using and can pretty neatly enter the build system while also using
the functions I've built for converting to and from bytecode!
|
|
A token_stream being constructed on the spot has different
used/available properties to a fully constructed one: a fully
constructed token stream uses available to hold the total number of
tokens and used as an internal iterator, while one that is still being
constructed uses the semantics of a standard darr.
Furthermore, some loops didn't divide by ~sizeof(token_t)~ which lead
to iteration over bound errors.
|
|
|
|
Also error now points to the correct place in the file.
|
|
|
|
We have distinct functions for the use blocks and the macro blocks,
which each generate wholesale new token streams via `token_copy` so we
don't run into weird errors around ownership of the internal strings
of each token.
Furthermore, process_presults now uses the stream index in each
presult to report errors when stuff goes wrong.
|
|
So when a presult_t is constructed it holds an index to where it was
constructed in terms of the token stream. This will be useful when
implementing an error checker in the preprocessing or result parsing
stages.
|
|
So %USE <STRING> is the expected call pattern, so there's an error if
there isn't a string after %USE.
The other two errors are file I/O errors i.e. nonexistent files or
errors in parsing the other file. We don't report specifics about the
other file, that should be up to the user to check themselves.
|
|
Forgot to increment buffer->used and memcpy call was just incorrect.
|
|
This essentially just copies the internal string of the token into a
new buffer.
|
|
|
|
Doesn't do much, invalid for most operations.
|
|
Preprocessor handles macros and macro blocks by working at the token
level, not doing any high level parsing or instruction making.
Essentially every macro is recorded in a registry, recording the name
and the tokens assigned to it. Then for every caller it just inserts
the tokens inline, creating a new stream and freeing the old one. It
leaves actual high level parsing to `parse_next` and
`process_presults`.
|
|
|
|
|
|
|
|
Lots to refactor and test
|
|
This is mostly so labels get to have digits. This won't affect number
tokens as that happens before symbols.
|
|
|
|
Was used in a previous fix but not necessary anymore
|
|
Set the program structure correctly with a header using the parsed
global instruction.
|
|
Creates a jump address to the label delegated by "global" so program
starts at that point.
|
|
|
|
|
|
Lexer now will straight away attempt to eat up any type or later
portions of an opcode rather than leaving everything but the root.
This means checking for type in the parser is a direct check against
the name rather than prefixed with a dot.
Checks are a bit more strong to cause more tokens to go straight to
symbol rather than getting checked after one routine in at on the
parser side.
|
|
Makes more sense, don't need to fiddle around with strings as much in
the parser due to this!
|
|
Not necessary when you can just push the relevant word onto the stack
then just do OP_JUMP_STACK.
|
|
Essentially a presult_t contains one of these:
1) A label construction, which stores the label symbol into
`label` (PRES_LABEL)
2) An instruction that calls upon a label, storing the instruction
in `instruction` and the label name in `label` (PRES_LABEL_ADDRESS)
3) An instruction that uses a relative address offset, storing the
instruction in `instruction` and the offset wanted into
`relative_address` (PRES_RELATIVE_ADDRESS)
4) An instruction that requires no further processing, storing the
instruction into `instruction` (PRES_COMPLETE_INSTRUCTION)
In the processing stage, we resolve all calls by iterating one by one
and maintaining an absolute instruction address. Pretty nice, lots
more machinery involved in parsing now.
|
|
|