111 Commits

Author SHA1 Message Date
Aryadev Chavali
0ebbf3ca75 Start writing assembler in C++
Best language to use as it's already compatible with the headers I'm
using and can pretty neatly enter the build system while also using
the functions I've built for converting to and from bytecode!
2024-04-14 02:45:48 +06:30
Aryadev Chavali
4e9eb0a42e fix! loops in preprocess_use_blocks iterate to the wrong bound
A token_stream being constructed on the spot has different
used/available properties to a fully constructed one: a fully
constructed token stream uses available to hold the total number of
tokens and used as an internal iterator, while one that is still being
constructed uses the semantics of a standard darr.

Furthermore, some loops didn't divide by ~sizeof(token_t)~ which lead
to iteration over bound errors.
2024-04-14 02:00:17 +06:30
Aryadev Chavali
60588129b4 Cleaned up logs in assembler/parser 2023-11-29 23:09:51 +00:00
Aryadev Chavali
6a34fd2d2e Fixed incorrect free of tokens in error for preprocess_use_blocks
Also error now points to the correct place in the file.
2023-11-29 16:58:26 +00:00
Aryadev Chavali
fd1e6d96f6 Report some stats of the actual program when working 2023-11-29 15:46:44 +00:00
Aryadev Chavali
16dcc88a53 Refactored preprocessor to preprocess_(use|macro)_blocks and process_presults
We have distinct functions for the use blocks and the macro blocks,
which each generate wholesale new token streams via `token_copy` so we
don't run into weird errors around ownership of the internal strings
of each token.

Furthermore, process_presults now uses the stream index in each
presult to report errors when stuff goes wrong.
2023-11-29 15:43:53 +00:00
Aryadev Chavali
48d304056a Refactored presult_t to include a stream pointer
So when a presult_t is constructed it holds an index to where it was
constructed in terms of the token stream.  This will be useful when
implementing an error checker in the preprocessing or result parsing
stages.
2023-11-29 15:43:41 +00:00
Aryadev Chavali
4cee61fc9e Added parse errors for %USE calls
So %USE <STRING> is the expected call pattern, so there's an error if
there isn't a string after %USE.

The other two errors are file I/O errors i.e. nonexistent files or
errors in parsing the other file.  We don't report specifics about the
other file, that should be up to the user to check themselves.
2023-11-29 15:40:14 +00:00
Aryadev Chavali
9b8936ea02 Fixed tokenise_string_literal
Forgot to increment buffer->used and memcpy call was just incorrect.
2023-11-29 15:39:37 +00:00
Aryadev Chavali
ac70d4031c Added function to copy tokens
This essentially just copies the internal string of the token into a
new buffer.
2023-11-29 15:38:57 +00:00
Aryadev Chavali
1cba5ccd8d Added TOKEN_PP_USE to lexer with implementation 2023-11-29 15:38:41 +00:00
Aryadev Chavali
c9f684cc7d Added string literals in tokeniser
Doesn't do much, invalid for most operations.
2023-11-11 10:16:37 +00:00
Aryadev Chavali
cb2416554b Added a preprocessing routine in assembler
Preprocessor handles macros and macro blocks by working at the token
level, not doing any high level parsing or instruction making.
Essentially every macro is recorded in a registry, recording the name
and the tokens assigned to it.  Then for every caller it just inserts
the tokens inline, creating a new stream and freeing the old one.  It
leaves actual high level parsing to `parse_next` and
`process_presults`.
2023-11-08 18:15:26 +00:00
Aryadev Chavali
253bebb467 Added log in assembler for reading a certain number of bytes 2023-11-08 18:14:59 +00:00
Aryadev Chavali
642a8ae944 Lexer symbols now recognise macro constants and references 2023-11-08 18:14:41 +00:00
Aryadev Chavali
6e524569c3 Current work on preprocessor 2023-11-06 08:16:15 +00:00
Aryadev Chavali
4ae6c05276 Current work on preprocessor implementation
Lots to refactor and test
2023-11-05 16:21:09 +00:00
Aryadev Chavali
e9eead1177 Symbols may now include digits in lexer
This is mostly so labels get to have digits.  This won't affect number
tokens as that happens before symbols.
2023-11-03 21:50:55 +00:00
Aryadev Chavali
e6f580ba56 Removed tabs from VERBOSE logs in asm/main.c 2023-11-03 21:50:44 +00:00
Aryadev Chavali
3fde04e1d2 Fixed bug where labels were off by one
Was used in a previous fix but not necessary anymore
2023-11-03 21:22:02 +00:00
Aryadev Chavali
b8f6232bb2 Refactor assembler to use prog_t structure
Set the program structure correctly with a header using the parsed
global instruction.
2023-11-03 21:15:30 +00:00
Aryadev Chavali
b5a1582976 Added a start address (equivalent to main) to assembler
Creates a jump address to the label delegated by "global" so program
starts at that point.
2023-11-03 19:01:31 +00:00
Aryadev Chavali
6dfc4ceaeb Better logs for assembler 2023-11-02 23:29:43 +00:00
Aryadev Chavali
6c4469958e Implemented CALL(_STACK) and RET on the assembler 2023-11-02 23:29:23 +00:00
Aryadev Chavali
bd39c2b283 Made lexer more error prone so parser is less
Lexer now will straight away attempt to eat up any type or later
portions of an opcode rather than leaving everything but the root.
This means checking for type in the parser is a direct check against
the name rather than prefixed with a dot.

Checks are a bit more strong to cause more tokens to go straight to
symbol rather than getting checked after one routine in at on the
parser side.
2023-11-02 23:29:07 +00:00
Aryadev Chavali
9afeed6d61 Made separate tokens for JUMP_ABS and JUMP_STACK
Makes more sense, don't need to fiddle around with strings as much in
the parser due to this!
2023-11-02 20:54:26 +00:00
Aryadev Chavali
114fb82990 Removed instruction OP_JUMP_REGISTER
Not necessary when you can just push the relevant word onto the stack
then just do OP_JUMP_STACK.
2023-11-02 20:41:36 +00:00
Aryadev Chavali
4990d93a1c Created a preprocessing unit presult_t and a function to process them
Essentially a presult_t contains one of these:

1) A label construction, which stores the label symbol into
`label` (PRES_LABEL)

2) An instruction that calls upon a label, storing the instruction
in `instruction` and the label name in `label` (PRES_LABEL_ADDRESS)

3) An instruction that uses a relative address offset, storing the
instruction in `instruction` and the offset wanted into
`relative_address` (PRES_RELATIVE_ADDRESS)

4) An instruction that requires no further processing, storing the
instruction into `instruction` (PRES_COMPLETE_INSTRUCTION)

In the processing stage, we resolve all calls by iterating one by one
and maintaining an absolute instruction address.  Pretty nice, lots
more machinery involved in parsing now.
2023-11-02 20:31:55 +00:00
Aryadev Chavali
d5e311c9d4 Started work on preprocessing jump addresses 2023-11-02 20:31:22 +00:00
Aryadev Chavali
740627b12d Implemented MALLOC_STACK and SUB in the assembler 2023-11-01 22:56:40 +00:00
Aryadev Chavali
90e04542a2 Implemented stack versions of MGET and MSET in assembler 2023-11-01 22:09:39 +00:00
Aryadev Chavali
44125d7ad9 Implemented OP_MSIZE into lexer/parser of ASM 2023-11-01 21:47:19 +00:00
Aryadev Chavali
7564938113 Implemented lexer and parser for new memory management instructions 2023-11-01 21:40:25 +00:00
Aryadev Chavali
83678ad29a Add MULT to lexer and parser for assembler 2023-11-01 18:09:00 +00:00
Aryadev Chavali
57e6923279 Fixed bug where comparators wouldn't be parsed correctly
This is because comparators may apply to signed types, so I need to
use the right parsing function.
2023-11-01 17:55:54 +00:00
Aryadev Chavali
6d35283ef0 Clearer VERBOSE messages 2023-11-01 15:22:47 +00:00
Aryadev Chavali
6a270eda1e Parser now uses updated lexer
Much simpler, uses a switch case which is a much faster method of
doing the parsing.  Though roughly equivalent in terms of LOC, I feel
that this is more extensible
2023-11-01 15:09:56 +00:00
Aryadev Chavali
93d234cd48 Lexer now returns more descriptive tokens
More useful tokens, in particular for each opcode possible.  This
makes parsing a simpler task to reason as now we're just checking
against an enum rather than doing a string check in linear time.

It makes more sense to do this at the tokeniser as the local data from
the buffer will be in the cache most likely as the buffer is
contiguously allocated.  While it will always be slow to do linear
time checks on strings, when doing it at the parser we're having to
check strings that may be allocated in a variety of different places.
This means caching becomes a harder task, but with this approach we're
less likely to have cache misses as long as the buffer stays there.
2023-11-01 15:09:47 +00:00
Aryadev Chavali
0f0a1c7699 Allow hex literals for numbers
As strto(ul|ll) allow the parsing of hex literals of the form `0x`, we
allow lexing of hex literals which start with `x`.

They're lexed into C hex literals which work for strtol.
2023-10-31 22:27:53 +00:00
Aryadev Chavali
7817b5acc9 Use standardised signed version of word type from base.h 2023-10-31 21:24:50 +00:00
Aryadev Chavali
5d800d4366 Moved inst module to lib
As it has no dependencies on vm specifically, and it's more necessary
for any vendors who wish to target the virtual machine, it makes more
sense for inst to be a lib module rather than a vm module.
2023-10-31 21:14:14 +00:00
Aryadev Chavali
7ca8f2c644 asm/main logs are now indented and look prettier 2023-10-31 20:39:49 +00:00
Aryadev Chavali
75dc36cd19 Lexer now returns errors on failure
Currently only for invalid character literals, but still a possible
problem.
2023-10-31 20:39:26 +00:00
Aryadev Chavali
fa640f13e8 parse_word deals with characters now
Just takes the character literally as a number.
2023-10-31 20:38:03 +00:00
Aryadev Chavali
228f548bd9 Changed asm/parser instruction push-reg->push.reg 2023-10-31 20:37:11 +00:00
Aryadev Chavali
157c79d53c Added a "usage" message and colours for assembler
Prints useful and pretty messages when verbose being at least 1.
2023-10-29 16:59:31 +00:00
Aryadev Chavali
1c0bd20cba Introduce error reporting in asm/main
Pretty simple implementation, I've stopped printing the tokens cos I
think the lexer is done.
2023-10-28 18:22:18 +01:00
Aryadev Chavali
eac8cbf1da asm/parser supports all opcodes, introduced parse errors
Introduced some functions to parse differing types of opcodes.  Use
the same style of a.b.c... for namespacing or type specification for
certain opcodes.  Bit hacky and not tested, but does work.

Parse errors can be reported with an exact location using the token
column, line.
2023-10-28 18:21:09 +01:00
Aryadev Chavali
191fe5c6b8 Ignore comments (using semicolons) in lexer
Easier to do it here than at the parser.
2023-10-28 18:19:33 +01:00
Aryadev Chavali
d2429aa549 Introduced a column and line for each token
Accurate error reporting can be introduced using this.
2023-10-28 18:19:30 +01:00