Cleaned up todos standard library a bit more

2023-11-29 15:35:44 +00:00
parent c9f684cc7d
commit 456b9f38f2
1 changed files with 101 additions and 33 deletions
--- a/todo.org
+++ b/todo.org
@@ -2,39 +2,6 @@
 #+author: Aryadev Chavali
 #+date: 2023-11-02

-* TODO Standard library :ASM:VM:
-I should start considering this and how a user may use it.  Should it
-be an option in the VM and/or assembler binaries (i.e. a flag) or
-something the user has to specify in their source files?
-
-Something to consider is /static/ and /dynamic/ "linking" i.e.:
-+ Static linking: assembler inserts all used library definitions into
-  the bytecode output directly
-  + We could insert all of it at the start of the bytecode file, and
-    with [[*Start points][Start points]] this won't interfere with
-    user code
-    + 2023-11-03: Finishing the Start point feature has made these
-      features more tenable.  A program header which is compiled and
-      interpreted in bytecode works wonders.
-  + Furthermore library code will have fixed program addresses (always
-    at the start) so we'll know at start of assembler runtime where to
-    resolve standard library subroutine calls
-  + Virtual machine needs no changes to do this
-+ Virtual machine has fixed program storage for library code, and
-  assembler makes jump references specifically for this program
-  storage (dynamic linking)
-  + When assembling subroutine calls, just need to put references to
-    this library storage (some kind of shared state between VM and
-    assembler to know what these references are)
-  + VM needs to manage a ROM of some kind for library code
-  + How do we ensure assembled links to subroutine calls don't
-    conflict with user code jumps?
-    + Possibility: most significant bit of a program address is
-      reserved such that if 0 it refers to user code and if 1 it
-      refers to library code
-    + 63 bit references user code (not a lot of loss in precision)
-    + Easy to check if a reference is a library reference or a user
-      code reference by checking "sign bit" (negativity)
 * TODO Preprocessing directives :ASM:
 Like in FASM or NASM where we can give certain helpful instructions to
 the assembler.  I'd use the ~%~ symbol to designate preprocessor
@@ -70,6 +37,107 @@ constant potentially
 #+end_src
 which when referred to (by ~$print-1~) would insert the bytecode given
 inline.
+* TODO Standard library :ASM:VM:
+I should start considering this and how a user may use it.  Should it
+be an option in the VM and/or assembler binaries (i.e. a flag) or
+something the user has to specify in their source files?
+
+Something to consider is /static/ and /dynamic/ "linking" i.e.:
+ Static linking: assembler inserts all used library definitions into
+  the bytecode output directly
+  + We could insert all of it at the start of the bytecode file, and
+    with [[*Start points][Start points]] this won't interfere with
+    user code
+    + 2023-11-03: Finishing the Start point feature has made these
+      features more tenable.  A program header which is compiled and
+      interpreted in bytecode works wonders.
+  + Furthermore library code will have fixed program addresses (always
+    at the start) so we'll know at start of assembler runtime where to
+    resolve standard library subroutine calls
+  + Virtual machine needs no changes to do this
+ Dynamic linking: virtual machine has fixed program storage for
+  library code (a ROM), and assembler makes jump references
+  specifically for this program storage
+  + When assembling subroutine calls, just need to put references to
+    this library storage (some kind of shared state between VM and
+    assembler to know what these references are)
+  + VM needs to manage a ROM of some kind for library code
+  + How do we ensure assembled links to subroutine calls don't
+    conflict with user code jumps?
+    + Possibility: most significant bit of a program address is
+      reserved such that if 0 it refers to user code and if 1 it
+      refers to library code
+    + 63 bit references user code (not a lot of loss in precision)
+    + Easy to check if a reference is a library reference or a user
+      code reference by checking "sign bit" (negativity)
+** TODO Dynamic Linking
+The address operand of every program control instruction (~CALL~,
+~JUMP~, ~JUMP.IF~) has a specific encoding if the standard library is
+dynamically linked:
+ If the most significant bit is 0, the remaining 63 bits encode an
+  absolute address within the program
+ Otherwise, the address encodes a standard library subroutine.  The
+  bits within the address follow this schema:
+  + The next 15 bits (7 from the most significant byte, then 8 from
+    the next byte) represent the specific module where the subroutine
+    is defined (over 32767 possible library values)
+  + The remaining 48 bits (6 bytes) encode the absolute program
+    address in the bytecode of that specific module for the start of
+    the subroutine (over 281 *trillion* values)
+
+The assembler will automatically encode this based on "%USE" calls and
+the name of the subroutines called.
+
+On the virtual machine, there is a storage location (similar to the
+ROM of real machines) which stores the bytecode for modules of the
+standard library, indexed by the module number.  This means, on
+deserialising the address into the proper components, the VM can refer
+to the module bytecode then jump to the correct address.
+
+2023-11-09: I'll need a way to run library code in the current program
+system in the runtime.  It currently doesn't support jumps or work in
+programs outside of the main one unfortunately.  Any proper work done
+in this area requires some proper refactoring.
+
+2023-11-09: Constants or inline macros need to be reconfigured for
+this to work: at parse time, we work out the inlines directly which
+means compiling bytecode with "standard library" macros will not work
+as they won't be in the token stream.  Either we don't allow
+preprocessor work in the standard library at all (which is bad cos we
+can't then set standard limits or other useful things) or we insert
+them into the registries at parse time for use in program parsing
+(which not only requires assembler refactoring to figure out what
+libraries are used (to pull definitions from) but also requires making
+macros "recognisable" in bytecode because they're essentially
+invisible).
+
+* TODO Explicit symbols in bytecode :VM:ASM:
+A problem, arising mainly from the standard library, is that symbols
+such as constants/macros or subroutines aren't explicit in the
+bytecode: the assembler parses them away into absolute addresses and
+standard bytecode.  They aren't exposed at all in the bytecode, which
+means any resolution for "linking" with other assembled objects
+becomes a hassle.
+
+Constants and macros currently compile down to just base instructions,
+which means the symbols representing them (the "names") are compiled
+down to an absolute equivalent:
+ macros and constants compile to the tokens supplied, feeding the
+  parser
+ labels and relative addresses are compiled to absolute program
+  addresses, dealt with in the parser, constructing tokens
+
+In either case once the code has been compiled, there is no memory of
+symbols within it.
+
+For user space programs one could figure out a way to decompose the
+bytecode into "symbols", currently, as they must be present in the
+bytecode, which means they have an absolute address in the program,
+hence it's pretty easy to figure out when a program control
+instruction uses a label.
+
+However, for something like "using multiple files" or the standard
+library some further thought is needed.  Therefore
 * Completed
 ** DONE Write a label/jump system :ASM:
 Essentially a user should be able to write arbitrary labels (maybe