aboutsummaryrefslogtreecommitdiff
path: root/asm
AgeCommit message (Collapse)Author
2024-04-14Implemented tokenise_literal_char (tokenise_char_literal)Aryadev Chavali
I made the escape sequence parsing occur here instead of leaving it to the main tokenise_buffer function as I think it's better suited here.
2024-04-14Implemented tokenise_literal_hexAryadev Chavali
Note the overall size of this function in comparison to the C version, as well as its clarity. Of course, it is doing allocations in the background through std::string which requires more profiling if I want to make this super efficientâ„¢ but honestly the assembler just needs to work, whereas the runtime needs to be fast.
2024-04-14Implemented tokenise_literal_number (tokenise_number)Aryadev Chavali
2024-04-14Started implementing lexer in lexer.cppAryadev Chavali
The implementation for tokenise_symbol is already a lot nicer to look at and add to due to string/string_view operator overloading of ==. Furthermore, error handling through pair<> instead of making some custom structure which essentially does the same thing is already making me happy for this rewrite.
2024-04-14Wrote a new lexer API in C++Aryadev Chavali
Essentially a refactor of the C formed lexer into C++ style. I can already see some benefits from doing this, in particular speed of prototyping.
2024-04-14Start writing assembler in C++Aryadev Chavali
Best language to use as it's already compatible with the headers I'm using and can pretty neatly enter the build system while also using the functions I've built for converting to and from bytecode!
2024-04-14fix! loops in preprocess_use_blocks iterate to the wrong bound0.0.1Aryadev Chavali
A token_stream being constructed on the spot has different used/available properties to a fully constructed one: a fully constructed token stream uses available to hold the total number of tokens and used as an internal iterator, while one that is still being constructed uses the semantics of a standard darr. Furthermore, some loops didn't divide by ~sizeof(token_t)~ which lead to iteration over bound errors.
2023-11-29Cleaned up logs in assembler/parserAryadev Chavali
2023-11-29Fixed incorrect free of tokens in error for preprocess_use_blocksAryadev Chavali
Also error now points to the correct place in the file.
2023-11-29Report some stats of the actual program when workingAryadev Chavali
2023-11-29Refactored preprocessor to preprocess_(use|macro)_blocks and process_presultsAryadev Chavali
We have distinct functions for the use blocks and the macro blocks, which each generate wholesale new token streams via `token_copy` so we don't run into weird errors around ownership of the internal strings of each token. Furthermore, process_presults now uses the stream index in each presult to report errors when stuff goes wrong.
2023-11-29Refactored presult_t to include a stream pointerAryadev Chavali
So when a presult_t is constructed it holds an index to where it was constructed in terms of the token stream. This will be useful when implementing an error checker in the preprocessing or result parsing stages.
2023-11-29Added parse errors for %USE callsAryadev Chavali
So %USE <STRING> is the expected call pattern, so there's an error if there isn't a string after %USE. The other two errors are file I/O errors i.e. nonexistent files or errors in parsing the other file. We don't report specifics about the other file, that should be up to the user to check themselves.
2023-11-29Fixed tokenise_string_literalAryadev Chavali
Forgot to increment buffer->used and memcpy call was just incorrect.
2023-11-29Added function to copy tokensAryadev Chavali
This essentially just copies the internal string of the token into a new buffer.
2023-11-29Added TOKEN_PP_USE to lexer with implementationAryadev Chavali
2023-11-11Added string literals in tokeniserAryadev Chavali
Doesn't do much, invalid for most operations.
2023-11-08Added a preprocessing routine in assemblerAryadev Chavali
Preprocessor handles macros and macro blocks by working at the token level, not doing any high level parsing or instruction making. Essentially every macro is recorded in a registry, recording the name and the tokens assigned to it. Then for every caller it just inserts the tokens inline, creating a new stream and freeing the old one. It leaves actual high level parsing to `parse_next` and `process_presults`.
2023-11-08Added log in assembler for reading a certain number of bytesAryadev Chavali
2023-11-08Lexer symbols now recognise macro constants and referencesAryadev Chavali
2023-11-06Current work on preprocessorAryadev Chavali
2023-11-05Current work on preprocessor implementationAryadev Chavali
Lots to refactor and test
2023-11-03Symbols may now include digits in lexerAryadev Chavali
This is mostly so labels get to have digits. This won't affect number tokens as that happens before symbols.
2023-11-03Removed tabs from VERBOSE logs in asm/main.cAryadev Chavali
2023-11-03Fixed bug where labels were off by oneAryadev Chavali
Was used in a previous fix but not necessary anymore
2023-11-03Refactor assembler to use prog_t structureAryadev Chavali
Set the program structure correctly with a header using the parsed global instruction.
2023-11-03Added a start address (equivalent to `main`) to assemblerAryadev Chavali
Creates a jump address to the label delegated by "global" so program starts at that point.
2023-11-02Better logs for assemblerAryadev Chavali
2023-11-02Implemented CALL(_STACK) and RET on the assemblerAryadev Chavali
2023-11-02Made lexer more error prone so parser is lessAryadev Chavali
Lexer now will straight away attempt to eat up any type or later portions of an opcode rather than leaving everything but the root. This means checking for type in the parser is a direct check against the name rather than prefixed with a dot. Checks are a bit more strong to cause more tokens to go straight to symbol rather than getting checked after one routine in at on the parser side.
2023-11-02Made separate tokens for JUMP_ABS and JUMP_STACKAryadev Chavali
Makes more sense, don't need to fiddle around with strings as much in the parser due to this!
2023-11-02Removed instruction OP_JUMP_REGISTERAryadev Chavali
Not necessary when you can just push the relevant word onto the stack then just do OP_JUMP_STACK.
2023-11-02Created a preprocessing unit presult_t and a function to process themAryadev Chavali
Essentially a presult_t contains one of these: 1) A label construction, which stores the label symbol into `label` (PRES_LABEL) 2) An instruction that calls upon a label, storing the instruction in `instruction` and the label name in `label` (PRES_LABEL_ADDRESS) 3) An instruction that uses a relative address offset, storing the instruction in `instruction` and the offset wanted into `relative_address` (PRES_RELATIVE_ADDRESS) 4) An instruction that requires no further processing, storing the instruction into `instruction` (PRES_COMPLETE_INSTRUCTION) In the processing stage, we resolve all calls by iterating one by one and maintaining an absolute instruction address. Pretty nice, lots more machinery involved in parsing now.
2023-11-02Started work on preprocessing jump addressesAryadev Chavali
2023-11-01Implemented MALLOC_STACK and SUB in the assemblerAryadev Chavali
2023-11-01Implemented stack versions of MGET and MSET in assemblerAryadev Chavali
2023-11-01Implemented OP_MSIZE into lexer/parser of ASMAryadev Chavali
2023-11-01Implemented lexer and parser for new memory management instructionsAryadev Chavali
2023-11-01Add MULT to lexer and parser for assemblerAryadev Chavali
2023-11-01Fixed bug where comparators wouldn't be parsed correctlyAryadev Chavali
This is because comparators may apply to signed types, so I need to use the right parsing function.
2023-11-01Clearer VERBOSE messagesAryadev Chavali
2023-11-01Parser now uses updated lexerAryadev Chavali
Much simpler, uses a switch case which is a much faster method of doing the parsing. Though roughly equivalent in terms of LOC, I feel that this is more extensible
2023-11-01Lexer now returns more descriptive tokensAryadev Chavali
More useful tokens, in particular for each opcode possible. This makes parsing a simpler task to reason as now we're just checking against an enum rather than doing a string check in linear time. It makes more sense to do this at the tokeniser as the local data from the buffer will be in the cache most likely as the buffer is contiguously allocated. While it will always be slow to do linear time checks on strings, when doing it at the parser we're having to check strings that may be allocated in a variety of different places. This means caching becomes a harder task, but with this approach we're less likely to have cache misses as long as the buffer stays there.
2023-10-31Allow hex literals for numbersAryadev Chavali
As strto(ul|ll) allow the parsing of hex literals of the form `0x`, we allow lexing of hex literals which start with `x`. They're lexed into C hex literals which work for strtol.
2023-10-31Use standardised signed version of word type from base.hAryadev Chavali
2023-10-31Moved inst module to libAryadev Chavali
As it has no dependencies on vm specifically, and it's more necessary for any vendors who wish to target the virtual machine, it makes more sense for inst to be a lib module rather than a vm module.
2023-10-31asm/main logs are now indented and look prettierAryadev Chavali
2023-10-31Lexer now returns errors on failureAryadev Chavali
Currently only for invalid character literals, but still a possible problem.
2023-10-31parse_word deals with characters nowAryadev Chavali
Just takes the character literally as a number.
2023-10-31Changed asm/parser instruction push-reg->push.regAryadev Chavali