Commit Graph

334 Commits

Author SHA1 Message Date
Aryadev Chavali
940dd2021e Fix issue with use_blocks not being preprocessed 2024-04-15 05:34:41 +06:30
Aryadev Chavali
7a6275c0a1 fix memory leak through vec.clear
vec.clear() doesn't delete pointers (unless they're smart) so I need
to do it myself.
2024-04-15 05:34:02 +06:30
Aryadev Chavali
8d3951a871 Implemented preprocesser function. 2024-04-15 05:08:55 +06:30
Aryadev Chavali
1e1a13e741 Default constructor for pp_err_t 2024-04-15 05:08:40 +06:30
Aryadev Chavali
0e5c934072 preprocess_* now uses const references to tokens
They copy and construct new token vectors and just read the token
inputs.
2024-04-15 05:08:07 +06:30
Aryadev Chavali
9ca93786af Updated main.cpp for changes to lexer 2024-04-15 05:07:16 +06:30
Aryadev Chavali
ec87245724 Implemented preprocess_const_blocks
Once again quite similar to preprocess_macro_blocks but shorter,
easier to use and easier to read. (76 vs 109)
2024-04-15 04:55:51 +06:30
Aryadev Chavali
81efc9006d Implement printing of pp_err_t
Another great thing for C++: the ability to tell it how to print
structures the way I want.  In C it's either:
1) Write a function to print the structure out (preferably to a file
pointer)
2) Write a function to return a string (allocated on the heap) which
represents it

Both are not fun to write, whereas it's much easier to write this.
2024-04-15 04:55:51 +06:30
Aryadev Chavali
929e5a3d0d Implement constructors for pp_err_t 2024-04-15 04:55:51 +06:30
Aryadev Chavali
0a93ad5a8a Implement preprocess_use_blocks
While being very similar in style to the C version, it takes 27 lines
of code less to implement it due to the niceties of C++ (41 lines vs
68).
2024-04-15 04:55:51 +06:30
Aryadev Chavali
f661438c93 Moved read_file to a general base library 2024-04-15 04:55:51 +06:30
Aryadev Chavali
0385d4bb8d Fix some off by one errors in lexer 2024-04-15 04:43:58 +06:30
Aryadev Chavali
f01d64b5f4 lexer now produces a vector of heap allocated tokens
This removes the problem of possibly expensive copies occurring due to
working with tokens produced from the lexer (that C++ just... does):
now we hold pointers where the copy operator is a lot easier to use.

I want expensive stuff to be done by me and for a reason: I want to
be holding the shotgun.
2024-04-15 04:42:24 +06:30
Aryadev Chavali
062ed12278 Rewrote preprocesser API
This C++ rewrite allows me to rewrite the actual API of the system.
In particular, I'm no longer restricting myself to just using enums
then figuring out a way to get proper error logging later down the
line (through tracking tokens in the buffer internally, for example).

Instead I can now design error structures which hold references to the
token they occurred on as well as possible lexical errors (if they're
a FILE_LEXICAL_ERROR which occurs due to the ~%USE~ macro).  This
means it's a lot easier to write error logging now at the top level.
2024-04-15 04:37:43 +06:30
Aryadev Chavali
72ef40e671 parser -> preprocesser + parser
I've decided to split the module parsing into two modules, one for the
preprocessing stage which only deals with tokens and the parsing stage
which generates bytecode.
2024-04-14 17:25:28 +06:30
Aryadev Chavali
86e9d51ab0 enum -> enum class in lexer
This makes enum elements scoped which is actually quite useful as I
prefer the namespacing that enum's give in C++.
2024-04-14 17:17:51 +06:30
Aryadev Chavali
86aca9a596 Added static assert to lexer in case of opcode changes 2024-04-14 17:12:24 +06:30
Aryadev Chavali
e5ef0292e7 asm/main now tokenises and prints the tokens of a given file
With error checking!
2024-04-14 17:11:48 +06:30
Aryadev Chavali
d368a49f56 Implemented a function to read a file in full
Uses std::optional in case file doesn't exist.
2024-04-14 17:10:23 +06:30
Aryadev Chavali
98d4f73134 asm/main now prints usage 2024-04-14 17:09:49 +06:30
Aryadev Chavali
44305138b0 Implemented cstr functions. 2024-04-14 17:05:56 +06:30
Aryadev Chavali
e55871195a Implemented overload for ostream and token as well as constructors for token 2024-04-14 17:05:52 +06:30
Aryadev Chavali
a8f605c89b Implemented tokenise_buffer
Note that this is basically the same as the previous version,
excluding the fact that it uses C++ idioms more and does a bit better
in error checking.
2024-04-14 17:04:15 +06:30
Aryadev Chavali
7a9e646d39 Implemented tokenise_literal_string
One thing I've realised is that even methods such as this require
error tracking.  I won't implement it in the tokenise method as it's
not related to consuming the string per se but instead in the main method.
2024-04-14 17:02:45 +06:30
Aryadev Chavali
50e9a4eef5 Implemented tokenise_literal_char (tokenise_char_literal)
I made the escape sequence parsing occur here instead of leaving it to
the main tokenise_buffer function as I think it's better suited here.
2024-04-14 17:01:35 +06:30
Aryadev Chavali
3c46fde66a Implemented tokenise_literal_hex
Note the overall size of this function in comparison to the C version,
as well as its clarity.

Of course, it is doing allocations in the background through
std::string which requires more profiling if I want to make this super
efficient™ but honestly the assembler just needs to work, whereas the
runtime needs to be fast.
2024-04-14 16:57:46 +06:30
Aryadev Chavali
4f8f511168 Implemented tokenise_literal_number (tokenise_number) 2024-04-14 16:56:58 +06:30
Aryadev Chavali
585aff1cbb Started implementing lexer in lexer.cpp
The implementation for tokenise_symbol is already a lot nicer to look
at and add to due to string/string_view operator overloading of ==.
Furthermore, error handling through pair<> instead of making some
custom structure which essentially does the same thing is already
making me happy for this rewrite.
2024-04-14 16:56:43 +06:30
Aryadev Chavali
e7a09c0de4 Wrote a new lexer API in C++
Essentially a refactor of the C formed lexer into C++ style.  I can
already see some benefits from doing this, in particular speed of
prototyping.
2024-04-14 16:52:58 +06:30
Aryadev Chavali
1e6f60a869 Added C++ dir locals 2024-04-14 16:49:37 +06:30
Aryadev Chavali
3b912495de Created custom functions to convert (h)words to and from bytecode format
Instead of using endian.h that is not portable AND doesn't work with
C++, I'll just write my own using a forced union based type punning
trick.

I've decided to use little endian for the format as well: it seems to
be used by most desktop computers so it should make these functions
faster to run for most CPUs.
2024-04-14 03:54:54 +06:30
Aryadev Chavali
e12f364669 Merge branch 'master' into asm-rewrite-cpp 2024-04-14 03:54:17 +06:30
Aryadev Chavali
0ebbf3ca75 Start writing assembler in C++
Best language to use as it's already compatible with the headers I'm
using and can pretty neatly enter the build system while also using
the functions I've built for converting to and from bytecode!
2024-04-14 02:45:48 +06:30
Aryadev Chavali
aa78a66e7b Documented lib/darr.h 2024-04-14 02:45:30 +06:30
Aryadev Chavali
340eb164cf Moved struct definitions lib/inst.h -> lib/prog.h
This means if I write the new assembler in another language I only
need to FFI this header rather than all the functions as well which
may not be as useful.
2024-04-14 02:45:30 +06:30
Aryadev Chavali
b7a40f4ab0 Documented lib/darr.h 2024-04-14 02:36:30 +06:30
Aryadev Chavali
1cd31a2702 Moved struct definitions lib/inst.h -> lib/prog.h
This means if I write the new assembler in another language I only
need to FFI this header rather than all the functions as well which
may not be as useful.
2024-04-14 02:35:09 +06:30
Aryadev Chavali
99a81ce95d Added todo to rewrite assembler in a different language 2024-04-14 02:34:40 +06:30
Aryadev Chavali
38a24f172f Finished todo on importing another file 2024-04-14 02:34:28 +06:30
Aryadev Chavali
4e9eb0a42e fix! loops in preprocess_use_blocks iterate to the wrong bound
A token_stream being constructed on the spot has different
used/available properties to a fully constructed one: a fully
constructed token stream uses available to hold the total number of
tokens and used as an internal iterator, while one that is still being
constructed uses the semantics of a standard darr.

Furthermore, some loops didn't divide by ~sizeof(token_t)~ which lead
to iteration over bound errors.
2024-04-14 02:00:17 +06:30
Aryadev Chavali
e2667eda65 Fix problems with running programs due to mismatched endian
Basically ensure we're converting to big endian when writing bytecode
and converting from big endian when reading bytecode.
2024-04-12 17:34:17 +06:30
Aryadev Chavali
72585772ef Fixing build problems due to endian.h
Have to define _DEFAULT_SOURCE before you can use the endian
conversion functions.  As most standard library headers use
features.h, and _DEFAULT_SOURCE must be defined before features.h is
included, we have to include base.h before other headers.
2024-04-12 17:32:58 +06:30
Aryadev Chavali
a8a2c50a8f Reworking todos on library linking 2024-04-09 21:24:46 +06:30
Aryadev Chavali
d478522d60 Some rewording of spec.org 2024-04-09 21:23:54 +06:30
Aryadev Chavali
33e1d2ab72 Added some TODOs to lib/inst.c to enforce endian 2024-04-09 21:23:30 +06:30
Aryadev Chavali
d256e06f51 Mid-work through documenting darr.h 2024-04-09 21:21:12 +06:30
Aryadev Chavali
84028dab79 Done TODO: Comment coverage > lib > base.h
Pretty simple
2024-04-09 15:15:00 +06:30
Aryadev Chavali
9d4e56c441 Fixed code in vm_pop_hword DWORD -> DHWORD
Though practically this would work, as the storage for the half word is
not limited in any way, nevertheless it isn't syntactically right and
it's better to fix now.
2024-04-09 15:13:51 +06:30
Aryadev Chavali
afb48b65b9 Completed TODO: Rigid Endian
Just used the endian.h functions to convert host endian to and from
big endian.
2024-04-09 15:11:42 +06:30
Aryadev Chavali
6df6dce153 Added todo to force an endian convention
I've flip flopped a bit on this but I believe the virtual machine
bytecode format must have a convention on endianness.  This is because
of the issue stated in the TODO which may very well happen.
2024-04-09 15:10:26 +06:30