Cleaned up todos standard library a bit more
This commit is contained in:
134
todo.org
134
todo.org
@@ -2,39 +2,6 @@
|
|||||||
#+author: Aryadev Chavali
|
#+author: Aryadev Chavali
|
||||||
#+date: 2023-11-02
|
#+date: 2023-11-02
|
||||||
|
|
||||||
* TODO Standard library :ASM:VM:
|
|
||||||
I should start considering this and how a user may use it. Should it
|
|
||||||
be an option in the VM and/or assembler binaries (i.e. a flag) or
|
|
||||||
something the user has to specify in their source files?
|
|
||||||
|
|
||||||
Something to consider is /static/ and /dynamic/ "linking" i.e.:
|
|
||||||
+ Static linking: assembler inserts all used library definitions into
|
|
||||||
the bytecode output directly
|
|
||||||
+ We could insert all of it at the start of the bytecode file, and
|
|
||||||
with [[*Start points][Start points]] this won't interfere with
|
|
||||||
user code
|
|
||||||
+ 2023-11-03: Finishing the Start point feature has made these
|
|
||||||
features more tenable. A program header which is compiled and
|
|
||||||
interpreted in bytecode works wonders.
|
|
||||||
+ Furthermore library code will have fixed program addresses (always
|
|
||||||
at the start) so we'll know at start of assembler runtime where to
|
|
||||||
resolve standard library subroutine calls
|
|
||||||
+ Virtual machine needs no changes to do this
|
|
||||||
+ Virtual machine has fixed program storage for library code, and
|
|
||||||
assembler makes jump references specifically for this program
|
|
||||||
storage (dynamic linking)
|
|
||||||
+ When assembling subroutine calls, just need to put references to
|
|
||||||
this library storage (some kind of shared state between VM and
|
|
||||||
assembler to know what these references are)
|
|
||||||
+ VM needs to manage a ROM of some kind for library code
|
|
||||||
+ How do we ensure assembled links to subroutine calls don't
|
|
||||||
conflict with user code jumps?
|
|
||||||
+ Possibility: most significant bit of a program address is
|
|
||||||
reserved such that if 0 it refers to user code and if 1 it
|
|
||||||
refers to library code
|
|
||||||
+ 63 bit references user code (not a lot of loss in precision)
|
|
||||||
+ Easy to check if a reference is a library reference or a user
|
|
||||||
code reference by checking "sign bit" (negativity)
|
|
||||||
* TODO Preprocessing directives :ASM:
|
* TODO Preprocessing directives :ASM:
|
||||||
Like in FASM or NASM where we can give certain helpful instructions to
|
Like in FASM or NASM where we can give certain helpful instructions to
|
||||||
the assembler. I'd use the ~%~ symbol to designate preprocessor
|
the assembler. I'd use the ~%~ symbol to designate preprocessor
|
||||||
@@ -70,6 +37,107 @@ constant potentially
|
|||||||
#+end_src
|
#+end_src
|
||||||
which when referred to (by ~$print-1~) would insert the bytecode given
|
which when referred to (by ~$print-1~) would insert the bytecode given
|
||||||
inline.
|
inline.
|
||||||
|
* TODO Standard library :ASM:VM:
|
||||||
|
I should start considering this and how a user may use it. Should it
|
||||||
|
be an option in the VM and/or assembler binaries (i.e. a flag) or
|
||||||
|
something the user has to specify in their source files?
|
||||||
|
|
||||||
|
Something to consider is /static/ and /dynamic/ "linking" i.e.:
|
||||||
|
+ Static linking: assembler inserts all used library definitions into
|
||||||
|
the bytecode output directly
|
||||||
|
+ We could insert all of it at the start of the bytecode file, and
|
||||||
|
with [[*Start points][Start points]] this won't interfere with
|
||||||
|
user code
|
||||||
|
+ 2023-11-03: Finishing the Start point feature has made these
|
||||||
|
features more tenable. A program header which is compiled and
|
||||||
|
interpreted in bytecode works wonders.
|
||||||
|
+ Furthermore library code will have fixed program addresses (always
|
||||||
|
at the start) so we'll know at start of assembler runtime where to
|
||||||
|
resolve standard library subroutine calls
|
||||||
|
+ Virtual machine needs no changes to do this
|
||||||
|
+ Dynamic linking: virtual machine has fixed program storage for
|
||||||
|
library code (a ROM), and assembler makes jump references
|
||||||
|
specifically for this program storage
|
||||||
|
+ When assembling subroutine calls, just need to put references to
|
||||||
|
this library storage (some kind of shared state between VM and
|
||||||
|
assembler to know what these references are)
|
||||||
|
+ VM needs to manage a ROM of some kind for library code
|
||||||
|
+ How do we ensure assembled links to subroutine calls don't
|
||||||
|
conflict with user code jumps?
|
||||||
|
+ Possibility: most significant bit of a program address is
|
||||||
|
reserved such that if 0 it refers to user code and if 1 it
|
||||||
|
refers to library code
|
||||||
|
+ 63 bit references user code (not a lot of loss in precision)
|
||||||
|
+ Easy to check if a reference is a library reference or a user
|
||||||
|
code reference by checking "sign bit" (negativity)
|
||||||
|
** TODO Dynamic Linking
|
||||||
|
The address operand of every program control instruction (~CALL~,
|
||||||
|
~JUMP~, ~JUMP.IF~) has a specific encoding if the standard library is
|
||||||
|
dynamically linked:
|
||||||
|
+ If the most significant bit is 0, the remaining 63 bits encode an
|
||||||
|
absolute address within the program
|
||||||
|
+ Otherwise, the address encodes a standard library subroutine. The
|
||||||
|
bits within the address follow this schema:
|
||||||
|
+ The next 15 bits (7 from the most significant byte, then 8 from
|
||||||
|
the next byte) represent the specific module where the subroutine
|
||||||
|
is defined (over 32767 possible library values)
|
||||||
|
+ The remaining 48 bits (6 bytes) encode the absolute program
|
||||||
|
address in the bytecode of that specific module for the start of
|
||||||
|
the subroutine (over 281 *trillion* values)
|
||||||
|
|
||||||
|
The assembler will automatically encode this based on "%USE" calls and
|
||||||
|
the name of the subroutines called.
|
||||||
|
|
||||||
|
On the virtual machine, there is a storage location (similar to the
|
||||||
|
ROM of real machines) which stores the bytecode for modules of the
|
||||||
|
standard library, indexed by the module number. This means, on
|
||||||
|
deserialising the address into the proper components, the VM can refer
|
||||||
|
to the module bytecode then jump to the correct address.
|
||||||
|
|
||||||
|
2023-11-09: I'll need a way to run library code in the current program
|
||||||
|
system in the runtime. It currently doesn't support jumps or work in
|
||||||
|
programs outside of the main one unfortunately. Any proper work done
|
||||||
|
in this area requires some proper refactoring.
|
||||||
|
|
||||||
|
2023-11-09: Constants or inline macros need to be reconfigured for
|
||||||
|
this to work: at parse time, we work out the inlines directly which
|
||||||
|
means compiling bytecode with "standard library" macros will not work
|
||||||
|
as they won't be in the token stream. Either we don't allow
|
||||||
|
preprocessor work in the standard library at all (which is bad cos we
|
||||||
|
can't then set standard limits or other useful things) or we insert
|
||||||
|
them into the registries at parse time for use in program parsing
|
||||||
|
(which not only requires assembler refactoring to figure out what
|
||||||
|
libraries are used (to pull definitions from) but also requires making
|
||||||
|
macros "recognisable" in bytecode because they're essentially
|
||||||
|
invisible).
|
||||||
|
|
||||||
|
* TODO Explicit symbols in bytecode :VM:ASM:
|
||||||
|
A problem, arising mainly from the standard library, is that symbols
|
||||||
|
such as constants/macros or subroutines aren't explicit in the
|
||||||
|
bytecode: the assembler parses them away into absolute addresses and
|
||||||
|
standard bytecode. They aren't exposed at all in the bytecode, which
|
||||||
|
means any resolution for "linking" with other assembled objects
|
||||||
|
becomes a hassle.
|
||||||
|
|
||||||
|
Constants and macros currently compile down to just base instructions,
|
||||||
|
which means the symbols representing them (the "names") are compiled
|
||||||
|
down to an absolute equivalent:
|
||||||
|
+ macros and constants compile to the tokens supplied, feeding the
|
||||||
|
parser
|
||||||
|
+ labels and relative addresses are compiled to absolute program
|
||||||
|
addresses, dealt with in the parser, constructing tokens
|
||||||
|
|
||||||
|
In either case once the code has been compiled, there is no memory of
|
||||||
|
symbols within it.
|
||||||
|
|
||||||
|
For user space programs one could figure out a way to decompose the
|
||||||
|
bytecode into "symbols", currently, as they must be present in the
|
||||||
|
bytecode, which means they have an absolute address in the program,
|
||||||
|
hence it's pretty easy to figure out when a program control
|
||||||
|
instruction uses a label.
|
||||||
|
|
||||||
|
However, for something like "using multiple files" or the standard
|
||||||
|
library some further thought is needed. Therefore
|
||||||
* Completed
|
* Completed
|
||||||
** DONE Write a label/jump system :ASM:
|
** DONE Write a label/jump system :ASM:
|
||||||
Essentially a user should be able to write arbitrary labels (maybe
|
Essentially a user should be able to write arbitrary labels (maybe
|
||||||
|
|||||||
Reference in New Issue
Block a user