arl.org: Lots of thinking

This commit is contained in:
2026-01-28 07:03:12 +00:00
parent 85f5502681
commit 42447e5bd8

129
arl.org
View File

@@ -13,20 +13,19 @@ world!". Should be relatively straightforward.
Before we get into generating C code and then compiling it, it might
be worth translating the parsed ARL code into a generic IR.
The IR should be much more primitive in its semantics, and force clear
requirements of the platform we're compiling to. This way, at the
code generator stage we can figure out:
The IR should be primitive in its semantics but should still
encapsulate the intention behind the original ARL code. This should
allow us to find a set of minimum requirements for target compilation:
- what can we reasonably use from the target platform to satisfy
requirements?
supporting the primitive IR?
- what do we need to hand-roll on the target in order to make this
work?
Essentially, we want to write a virtual machine, and translate ARL
code into bytecode for that VM. Goals:
- Easier to optimise IR bytecode than the AST of our original program
- Easier to imagine translations from that IR bytecode into target
platform code
*** TODO Minimal IR representation
- Type checking
- Optimiser (stretch)
We need the following clear items in our IR:
- Static type values
- Static type variables (possible DeBrujin numbering or other such
@@ -35,29 +34,117 @@ We need the following clear items in our IR:
- Strongly typed primitive operators (numeric, strings, I/O) with
packed arguments
Read about [[https://en.wikipedia.org/wiki/Three-address_code][TAC]].
*** TODO IR Compiler
We should have a rough grouping between AST objects and this IR. As
ARL is Forth-like, we can use the stack semantics to generate this IR
as we walk the AST in a linear manner.
as we walk the AST in a linear manner. In practice this should almost
look like emulating a really small subset of the ARL language itself
and executing the program in that small subset.
Looking at how
[[https://en.wikipedia.org/wiki/Three-address_code][TAC]] works, I
think it may be a good idea to do something like that for our IR.
Essentially we should our AST into a sequence of really simple
bindings, with the final expression being a reference to some binding.
This also simplifies type checking to just verifying each little
binding and operation.
*** Examples
**** Basic example
Consider the following ARL code:
#+begin_src text
34 35 +
#+end_src
When we walk through this code:
When we walk through the above code:
- 34 (an integer) is pushed onto the stack
- 35 (an integer) is pushed onto the stack
- + is encountered
- Pop two values off the stack and verify their type against the
contract for "+" (something like (-> i32 i32 i32))
- Generate IR, something like ~prim-add(34, 35)~
*** TODO Consider optimisers
Certainly we should perform optimisations on the IR itself before
passing it over to the code generator. Currently we haven't got much
in the way of optimisations to consider, but it may be worth
considering.
- ~+~ primitive is encountered
- Type check the top two values of the stack; they should be
integral.
- ~a b +~ should correspond to ~a + b~ so the IR expression should
pack the arguments in that order: ~prim-add(34,35)~.
- Bind the generated IR expression to some unique name, say ~v1~.
- Ensure this works with type checking; looking up ~v1~'s type
should give you the output type of the "+" operator (integer).
- Push ~v1~ onto the stack.
The final state of the stack should be something like ~[v1]~ where
~v1=prim-add(34,35)~. The final state of the stack, along with the
bindings we form, is the IR, to pass over to the later stages of the
compiler.
**** Slightly more complex example
Let's look at a slightly more complex program:
#+begin_src text
34 35 + 70 swap -
#+end_src
- 34 (integer) pushed
- 35 (integer) pushed
- ~+~ primitive:
- As stated previously, the final state of this primitive gives us
the name ~v1~ on the stack with the association
~v1=prim-add(34,35)~.
- 70 (integer) pushed
- ~swap~ primitive:
- Requires two values on the stack, but we care little about their
types. Just swaps their order on the stack.
- We /could/ introduce generics here to make the input/output
relation ship explicit (forall T, U swap:-(-> (T U) (U T))), but
at the same time we can just as easily get away with a type hole
(essentially some kind of ~any~). Up to debate.
- We do not generate IR for this primitive as it simply isn't
necessary. Instead we perform the swap on our IR stack and
continue. The ~swap~ primitive is "transparent" in the final IR.
- In this situation, the stack goes from ~[v1, 70]~ to
~[70, v1]~
- ~-~ primitive:
- Type checks the top two values of the stack (which are both
integers)
- ~a b -~ should correspond to ~a - b~, thus the corresponding IR
expression should be ~prim-sub(70,v1)~
- Associate IR expression with name ~v2~,
- Push ~v2~ onto the stack.
The final state of the IR should be:
- Stack: ~[v2]~
- Bindings:
- ~v1=prim-add(34,35)~
- ~v2=prim-sub(70,v1)~
Notice how some primitives generate IR, while others manipulate IR
themselves? They almost seem like macros!
Another thing of note is how the final state of the stack is a single
item in this case; an IR expression representing the entire program.
When we introduce code level bindings we won't have such nice outputs,
but it is certainly something to consider.
**** Hello world! example
For our hello world:
#+begin_src text
"Hello, world!\n" putstr
#+end_src
- "Hello, world!\n" (string) pushed
- "putstr" primitive:
- Type check the top of the stack (should be a string)
- Generate IR ~prim-putstr("Hello, world!\n")~
- Associate with name ~v1~ and push it onto the stack
Much simpler than our
*** TODO IR level type checking
During IR compilation, the following should be type checked:
- use of callables (primitives, user defined when implemented)
- variable assignment (when implemented)
- variable use (when implemented)
- definition of callables (when implemented)
We want to ensure no statement is unsound.
**** TODO Primitive types
Define the primitive types of the IR. Remember, simplicity is key,
but we need to mirror what we're getting on the ARL side.
**** TODO Type contracts for callables
Define how we can type check arguments on the stack against the types
a callable expects for its inputs. In the same vein, we also need to
figure out the type of whatever is pushed onto the stack by the
callable.
** TODO Code generator
[[file:src/arl/target-c/]]