ovm - ARCHIVED - A stack based virtual machine to act as a target for other programming languages

Age	Commit message (Collapse)	Author
2024-04-14	asm/main now tokenises and prints the tokens of a given file	Aryadev Chavali
	With error checking!
2024-04-14	Implemented a function to read a file in full	Aryadev Chavali
	Uses std::optional in case file doesn't exist.
2024-04-14	asm/main now prints usage	Aryadev Chavali

2024-04-14	Implemented cstr functions.	Aryadev Chavali

2024-04-14	Implemented overload for ostream and token as well as constructors for token	Aryadev Chavali

2024-04-14	Implemented tokenise_buffer	Aryadev Chavali
	Note that this is basically the same as the previous version, excluding the fact that it uses C++ idioms more and does a bit better in error checking.
2024-04-14	Implemented tokenise_literal_string	Aryadev Chavali
	One thing I've realised is that even methods such as this require error tracking. I won't implement it in the tokenise method as it's not related to consuming the string per se but instead in the main method.
2024-04-14	Implemented tokenise_literal_char (tokenise_char_literal)	Aryadev Chavali
	I made the escape sequence parsing occur here instead of leaving it to the main tokenise_buffer function as I think it's better suited here.
2024-04-14	Implemented tokenise_literal_hex	Aryadev Chavali
	Note the overall size of this function in comparison to the C version, as well as its clarity. Of course, it is doing allocations in the background through std::string which requires more profiling if I want to make this super efficient™ but honestly the assembler just needs to work, whereas the runtime needs to be fast.
2024-04-14	Implemented tokenise_literal_number (tokenise_number)	Aryadev Chavali

2024-04-14	Started implementing lexer in lexer.cpp	Aryadev Chavali
	The implementation for tokenise_symbol is already a lot nicer to look at and add to due to string/string_view operator overloading of ==. Furthermore, error handling through pair<> instead of making some custom structure which essentially does the same thing is already making me happy for this rewrite.
2024-04-14	Wrote a new lexer API in C++	Aryadev Chavali
	Essentially a refactor of the C formed lexer into C++ style. I can already see some benefits from doing this, in particular speed of prototyping.
2024-04-14	Start writing assembler in C++	Aryadev Chavali
	Best language to use as it's already compatible with the headers I'm using and can pretty neatly enter the build system while also using the functions I've built for converting to and from bytecode!
2024-04-14	fix! loops in preprocess_use_blocks iterate to the wrong bound0.0.1	Aryadev Chavali
	A token_stream being constructed on the spot has different used/available properties to a fully constructed one: a fully constructed token stream uses available to hold the total number of tokens and used as an internal iterator, while one that is still being constructed uses the semantics of a standard darr. Furthermore, some loops didn't divide by ~sizeof(token_t)~ which lead to iteration over bound errors.
2023-11-29	Cleaned up logs in assembler/parser	Aryadev Chavali

2023-11-29	Fixed incorrect free of tokens in error for preprocess_use_blocks	Aryadev Chavali
	Also error now points to the correct place in the file.
2023-11-29	Report some stats of the actual program when working	Aryadev Chavali

2023-11-29	Refactored preprocessor to preprocess_(use\|macro)_blocks and process_presults	Aryadev Chavali
	We have distinct functions for the use blocks and the macro blocks, which each generate wholesale new token streams via `token_copy` so we don't run into weird errors around ownership of the internal strings of each token. Furthermore, process_presults now uses the stream index in each presult to report errors when stuff goes wrong.
2023-11-29	Refactored presult_t to include a stream pointer	Aryadev Chavali
	So when a presult_t is constructed it holds an index to where it was constructed in terms of the token stream. This will be useful when implementing an error checker in the preprocessing or result parsing stages.
2023-11-29	Added parse errors for %USE calls	Aryadev Chavali
	So %USE <STRING> is the expected call pattern, so there's an error if there isn't a string after %USE. The other two errors are file I/O errors i.e. nonexistent files or errors in parsing the other file. We don't report specifics about the other file, that should be up to the user to check themselves.
2023-11-29	Fixed tokenise_string_literal	Aryadev Chavali
	Forgot to increment buffer->used and memcpy call was just incorrect.
2023-11-29	Added function to copy tokens	Aryadev Chavali
	This essentially just copies the internal string of the token into a new buffer.
2023-11-29	Added TOKEN_PP_USE to lexer with implementation	Aryadev Chavali

2023-11-11	Added string literals in tokeniser	Aryadev Chavali
	Doesn't do much, invalid for most operations.
2023-11-08	Added a preprocessing routine in assembler	Aryadev Chavali
	Preprocessor handles macros and macro blocks by working at the token level, not doing any high level parsing or instruction making. Essentially every macro is recorded in a registry, recording the name and the tokens assigned to it. Then for every caller it just inserts the tokens inline, creating a new stream and freeing the old one. It leaves actual high level parsing to `parse_next` and `process_presults`.
2023-11-08	Added log in assembler for reading a certain number of bytes	Aryadev Chavali

2023-11-08	Lexer symbols now recognise macro constants and references	Aryadev Chavali

2023-11-06	Current work on preprocessor	Aryadev Chavali

2023-11-05	Current work on preprocessor implementation	Aryadev Chavali
	Lots to refactor and test
2023-11-03	Symbols may now include digits in lexer	Aryadev Chavali
	This is mostly so labels get to have digits. This won't affect number tokens as that happens before symbols.
2023-11-03	Removed tabs from VERBOSE logs in asm/main.c	Aryadev Chavali

2023-11-03	Fixed bug where labels were off by one	Aryadev Chavali
	Was used in a previous fix but not necessary anymore
2023-11-03	Refactor assembler to use prog_t structure	Aryadev Chavali
	Set the program structure correctly with a header using the parsed global instruction.
2023-11-03	Added a start address (equivalent to `main`) to assembler	Aryadev Chavali
	Creates a jump address to the label delegated by "global" so program starts at that point.
2023-11-02	Better logs for assembler	Aryadev Chavali

2023-11-02	Implemented CALL(_STACK) and RET on the assembler	Aryadev Chavali

2023-11-02	Made lexer more error prone so parser is less	Aryadev Chavali
	Lexer now will straight away attempt to eat up any type or later portions of an opcode rather than leaving everything but the root. This means checking for type in the parser is a direct check against the name rather than prefixed with a dot. Checks are a bit more strong to cause more tokens to go straight to symbol rather than getting checked after one routine in at on the parser side.
2023-11-02	Made separate tokens for JUMP_ABS and JUMP_STACK	Aryadev Chavali
	Makes more sense, don't need to fiddle around with strings as much in the parser due to this!
2023-11-02	Removed instruction OP_JUMP_REGISTER	Aryadev Chavali
	Not necessary when you can just push the relevant word onto the stack then just do OP_JUMP_STACK.
2023-11-02	Created a preprocessing unit presult_t and a function to process them	Aryadev Chavali
	Essentially a presult_t contains one of these: 1) A label construction, which stores the label symbol into `label` (PRES_LABEL) 2) An instruction that calls upon a label, storing the instruction in `instruction` and the label name in `label` (PRES_LABEL_ADDRESS) 3) An instruction that uses a relative address offset, storing the instruction in `instruction` and the offset wanted into `relative_address` (PRES_RELATIVE_ADDRESS) 4) An instruction that requires no further processing, storing the instruction into `instruction` (PRES_COMPLETE_INSTRUCTION) In the processing stage, we resolve all calls by iterating one by one and maintaining an absolute instruction address. Pretty nice, lots more machinery involved in parsing now.
2023-11-02	Started work on preprocessing jump addresses	Aryadev Chavali

2023-11-01	Implemented MALLOC_STACK and SUB in the assembler	Aryadev Chavali

2023-11-01	Implemented stack versions of MGET and MSET in assembler	Aryadev Chavali

2023-11-01	Implemented OP_MSIZE into lexer/parser of ASM	Aryadev Chavali

2023-11-01	Implemented lexer and parser for new memory management instructions	Aryadev Chavali

2023-11-01	Add MULT to lexer and parser for assembler	Aryadev Chavali

2023-11-01	Fixed bug where comparators wouldn't be parsed correctly	Aryadev Chavali
	This is because comparators may apply to signed types, so I need to use the right parsing function.
2023-11-01	Clearer VERBOSE messages	Aryadev Chavali

2023-11-01	Parser now uses updated lexer	Aryadev Chavali
	Much simpler, uses a switch case which is a much faster method of doing the parsing. Though roughly equivalent in terms of LOC, I feel that this is more extensible
2023-11-01	Lexer now returns more descriptive tokens	Aryadev Chavali
	More useful tokens, in particular for each opcode possible. This makes parsing a simpler task to reason as now we're just checking against an enum rather than doing a string check in linear time. It makes more sense to do this at the tokeniser as the local data from the buffer will be in the cache most likely as the buffer is contiguously allocated. While it will always be slow to do linear time checks on strings, when doing it at the parser we're having to check strings that may be allocated in a variety of different places. This means caching becomes a harder task, but with this approach we're less likely to have cache misses as long as the buffer stays there.