461 Commits

Author SHA1 Message Date
Aryadev Chavali
7a9e646d39 Implemented tokenise_literal_string
One thing I've realised is that even methods such as this require
error tracking.  I won't implement it in the tokenise method as it's
not related to consuming the string per se but instead in the main method.
2024-04-14 17:02:45 +06:30
Aryadev Chavali
50e9a4eef5 Implemented tokenise_literal_char (tokenise_char_literal)
I made the escape sequence parsing occur here instead of leaving it to
the main tokenise_buffer function as I think it's better suited here.
2024-04-14 17:01:35 +06:30
Aryadev Chavali
3c46fde66a Implemented tokenise_literal_hex
Note the overall size of this function in comparison to the C version,
as well as its clarity.

Of course, it is doing allocations in the background through
std::string which requires more profiling if I want to make this super
efficient™ but honestly the assembler just needs to work, whereas the
runtime needs to be fast.
2024-04-14 16:57:46 +06:30
Aryadev Chavali
4f8f511168 Implemented tokenise_literal_number (tokenise_number) 2024-04-14 16:56:58 +06:30
Aryadev Chavali
585aff1cbb Started implementing lexer in lexer.cpp
The implementation for tokenise_symbol is already a lot nicer to look
at and add to due to string/string_view operator overloading of ==.
Furthermore, error handling through pair<> instead of making some
custom structure which essentially does the same thing is already
making me happy for this rewrite.
2024-04-14 16:56:43 +06:30
Aryadev Chavali
e7a09c0de4 Wrote a new lexer API in C++
Essentially a refactor of the C formed lexer into C++ style.  I can
already see some benefits from doing this, in particular speed of
prototyping.
2024-04-14 16:52:58 +06:30
Aryadev Chavali
1e6f60a869 Added C++ dir locals 2024-04-14 16:49:37 +06:30
Aryadev Chavali
3b912495de Created custom functions to convert (h)words to and from bytecode format
Instead of using endian.h that is not portable AND doesn't work with
C++, I'll just write my own using a forced union based type punning
trick.

I've decided to use little endian for the format as well: it seems to
be used by most desktop computers so it should make these functions
faster to run for most CPUs.
2024-04-14 03:54:54 +06:30
Aryadev Chavali
e12f364669 Merge branch 'master' into asm-rewrite-cpp 2024-04-14 03:54:17 +06:30
Aryadev Chavali
0ebbf3ca75 Start writing assembler in C++
Best language to use as it's already compatible with the headers I'm
using and can pretty neatly enter the build system while also using
the functions I've built for converting to and from bytecode!
2024-04-14 02:45:48 +06:30
Aryadev Chavali
aa78a66e7b Documented lib/darr.h 2024-04-14 02:45:30 +06:30
Aryadev Chavali
340eb164cf Moved struct definitions lib/inst.h -> lib/prog.h
This means if I write the new assembler in another language I only
need to FFI this header rather than all the functions as well which
may not be as useful.
2024-04-14 02:45:30 +06:30
Aryadev Chavali
b7a40f4ab0 Documented lib/darr.h 2024-04-14 02:36:30 +06:30
Aryadev Chavali
1cd31a2702 Moved struct definitions lib/inst.h -> lib/prog.h
This means if I write the new assembler in another language I only
need to FFI this header rather than all the functions as well which
may not be as useful.
2024-04-14 02:35:09 +06:30
Aryadev Chavali
99a81ce95d Added todo to rewrite assembler in a different language 2024-04-14 02:34:40 +06:30
Aryadev Chavali
38a24f172f Finished todo on importing another file 2024-04-14 02:34:28 +06:30
Aryadev Chavali
4e9eb0a42e fix! loops in preprocess_use_blocks iterate to the wrong bound
A token_stream being constructed on the spot has different
used/available properties to a fully constructed one: a fully
constructed token stream uses available to hold the total number of
tokens and used as an internal iterator, while one that is still being
constructed uses the semantics of a standard darr.

Furthermore, some loops didn't divide by ~sizeof(token_t)~ which lead
to iteration over bound errors.
2024-04-14 02:00:17 +06:30
Aryadev Chavali
e2667eda65 Fix problems with running programs due to mismatched endian
Basically ensure we're converting to big endian when writing bytecode
and converting from big endian when reading bytecode.
2024-04-12 17:34:17 +06:30
Aryadev Chavali
72585772ef Fixing build problems due to endian.h
Have to define _DEFAULT_SOURCE before you can use the endian
conversion functions.  As most standard library headers use
features.h, and _DEFAULT_SOURCE must be defined before features.h is
included, we have to include base.h before other headers.
2024-04-12 17:32:58 +06:30
Aryadev Chavali
a8a2c50a8f Reworking todos on library linking 2024-04-09 21:24:46 +06:30
Aryadev Chavali
d478522d60 Some rewording of spec.org 2024-04-09 21:23:54 +06:30
Aryadev Chavali
33e1d2ab72 Added some TODOs to lib/inst.c to enforce endian 2024-04-09 21:23:30 +06:30
Aryadev Chavali
d256e06f51 Mid-work through documenting darr.h 2024-04-09 21:21:12 +06:30
Aryadev Chavali
84028dab79 Done TODO: Comment coverage > lib > base.h
Pretty simple
2024-04-09 15:15:00 +06:30
Aryadev Chavali
9d4e56c441 Fixed code in vm_pop_hword DWORD -> DHWORD
Though practically this would work, as the storage for the half word is
not limited in any way, nevertheless it isn't syntactically right and
it's better to fix now.
2024-04-09 15:13:51 +06:30
Aryadev Chavali
afb48b65b9 Completed TODO: Rigid Endian
Just used the endian.h functions to convert host endian to and from
big endian.
2024-04-09 15:11:42 +06:30
Aryadev Chavali
6df6dce153 Added todo to force an endian convention
I've flip flopped a bit on this but I believe the virtual machine
bytecode format must have a convention on endianness.  This is because
of the issue stated in the TODO which may very well happen.
2024-04-09 15:10:26 +06:30
Aryadev Chavali
9250a2a838 Added better documentation to TODO list 2024-04-08 04:44:10 +06:30
Aryadev Chavali
9bf1d123b8 Changed limit for examples/factorial.asm
Did some analysis and found that 21! takes above 64 bit integers to
store hence set the limit to 20 instead.
2024-04-07 03:40:19 +06:30
Aryadev Chavali
04a27bcfec Use a limit on $I rather than on $B for examples/fib.asm 2023-11-29 23:14:01 +00:00
Aryadev Chavali
af142e71ff Fixed issues with getting and setting words for heap pages
Because I was using the hword macros instead of word macros, this
causes truncation of bytes when I didn't want it.
2023-11-29 23:10:32 +00:00
Aryadev Chavali
70c8a03939 Fixed logs in vm/runtime
Just changing some messages and the format of heap printing
2023-11-29 23:10:17 +00:00
Aryadev Chavali
60588129b4 Cleaned up logs in assembler/parser 2023-11-29 23:09:51 +00:00
Aryadev Chavali
fa3ecc0073 Easier to read documentation in examples 2023-11-29 17:00:39 +00:00
Aryadev Chavali
6a34fd2d2e Fixed incorrect free of tokens in error for preprocess_use_blocks
Also error now points to the correct place in the file.
2023-11-29 16:58:26 +00:00
Aryadev Chavali
fd1e6d96f6 Report some stats of the actual program when working 2023-11-29 15:46:44 +00:00
Aryadev Chavali
16dcc88a53 Refactored preprocessor to preprocess_(use|macro)_blocks and process_presults
We have distinct functions for the use blocks and the macro blocks,
which each generate wholesale new token streams via `token_copy` so we
don't run into weird errors around ownership of the internal strings
of each token.

Furthermore, process_presults now uses the stream index in each
presult to report errors when stuff goes wrong.
2023-11-29 15:43:53 +00:00
Aryadev Chavali
48d304056a Refactored presult_t to include a stream pointer
So when a presult_t is constructed it holds an index to where it was
constructed in terms of the token stream.  This will be useful when
implementing an error checker in the preprocessing or result parsing
stages.
2023-11-29 15:43:41 +00:00
Aryadev Chavali
4cee61fc9e Added parse errors for %USE calls
So %USE <STRING> is the expected call pattern, so there's an error if
there isn't a string after %USE.

The other two errors are file I/O errors i.e. nonexistent files or
errors in parsing the other file.  We don't report specifics about the
other file, that should be up to the user to check themselves.
2023-11-29 15:40:14 +00:00
Aryadev Chavali
9b8936ea02 Fixed tokenise_string_literal
Forgot to increment buffer->used and memcpy call was just incorrect.
2023-11-29 15:39:37 +00:00
Aryadev Chavali
ac70d4031c Added function to copy tokens
This essentially just copies the internal string of the token into a
new buffer.
2023-11-29 15:38:57 +00:00
Aryadev Chavali
1cba5ccd8d Added TOKEN_PP_USE to lexer with implementation 2023-11-29 15:38:41 +00:00
Aryadev Chavali
cad92bf3ba Moved preprocessor>Constants to Completed and started work on %USE 2023-11-29 15:37:57 +00:00
Aryadev Chavali
691069fa45 Added todo for preprocessor "%MACRO"
This is different to "%CONST" in that it can take token parameters and
use them.  This allows the construction of user code at compile time,
which can be very useful for a variety of use cases.
2023-11-29 15:36:52 +00:00
Aryadev Chavali
f1fde81b82 Added todo for preprocessor "%USE" blocks
Essentially importing another file *literally* into the file.  This
would happen before parse results are gathered, similar to how
"%CONST" is implemented currently.
2023-11-29 15:36:02 +00:00
Aryadev Chavali
456b9f38f2 Cleaned up todos standard library a bit more 2023-11-29 15:35:44 +00:00
Aryadev Chavali
c9f684cc7d Added string literals in tokeniser
Doesn't do much, invalid for most operations.
2023-11-11 10:16:37 +00:00
Aryadev Chavali
bd6fb54e31 Use constants in examples where possible
Stuff like numeric limits can be codified in constants which act self
documenting.
2023-11-09 08:52:28 +00:00
Aryadev Chavali
f896ad2cb7 Mark off constants as done in TODO.org 2023-11-09 08:52:07 +00:00
Aryadev Chavali
1935277716 Makefile now assembles and interprets instruction-test.asm example first 2023-11-08 18:16:53 +00:00