Much simpler, uses a switch case which is a much faster method of
doing the parsing. Though roughly equivalent in terms of LOC, I feel
that this is more extensible
More useful tokens, in particular for each opcode possible. This
makes parsing a simpler task to reason as now we're just checking
against an enum rather than doing a string check in linear time.
It makes more sense to do this at the tokeniser as the local data from
the buffer will be in the cache most likely as the buffer is
contiguously allocated. While it will always be slow to do linear
time checks on strings, when doing it at the parser we're having to
check strings that may be allocated in a variety of different places.
This means caching becomes a harder task, but with this approach we're
less likely to have cache misses as long as the buffer stays there.
A negative number under 2s complement can never be equal to its
positive as the top bit *must* be on. If two numbers are equivalent
bit-by-bit then they are equal for both signed and unsigned numbers.
Anything other than char (which can just use print.byte to print the
hex) and byte (which prints hexes anyway), all other types may be
forced to print a hex rather than a number if PRINT_HEX is 1.
As strto(ul|ll) allow the parsing of hex literals of the form `0x`, we
allow lexing of hex literals which start with `x`.
They're lexed into C hex literals which work for strtol.
I've made a single macro which defines a function through some common
metric, removing code duplication. Not particularly readable per se,
but using a macro expansion in your IDE allows one to inspect the code.
These new members are just signed versions of the previous members.
This makes type punning and usage for signed versions easier than
before (no need for memcpy).
As it has no dependencies on vm specifically, and it's more necessary
for any vendors who wish to target the virtual machine, it makes more
sense for inst to be a lib module rather than a vm module.
Comparing signed and unsigned versions of numbers. Same for EQ as
well.
Notice the irregular pattern of BYTE, CHAR, INT, HWORD,LONG,WORD as
OPCODE_IS_TYPE requires the subcodes to be surrounded by BYTE and
WORD.
Provides calling conventions, ensures parser and lexer are working
correctly. Will be updated as more instructions are introduced and
supported in the assembler.
Introduced some functions to parse differing types of opcodes. Use
the same style of a.b.c... for namespacing or type specification for
certain opcodes. Bit hacky and not tested, but does work.
Parse errors can be reported with an exact location using the token
column, line.
Though we deal with unsigned numbers internally, it should be possible
to read and manipulate negative numbers through 2s complement. Later
on we'll add support for signed operations via 2s complement, so this
should be allowed.