arl.org: Lots of thinking
This commit is contained in:
129
arl.org
129
arl.org
@@ -13,20 +13,19 @@ world!". Should be relatively straightforward.
|
|||||||
Before we get into generating C code and then compiling it, it might
|
Before we get into generating C code and then compiling it, it might
|
||||||
be worth translating the parsed ARL code into a generic IR.
|
be worth translating the parsed ARL code into a generic IR.
|
||||||
|
|
||||||
The IR should be much more primitive in its semantics, and force clear
|
The IR should be primitive in its semantics but should still
|
||||||
requirements of the platform we're compiling to. This way, at the
|
encapsulate the intention behind the original ARL code. This should
|
||||||
code generator stage we can figure out:
|
allow us to find a set of minimum requirements for target compilation:
|
||||||
- what can we reasonably use from the target platform to satisfy
|
- what can we reasonably use from the target platform to satisfy
|
||||||
requirements?
|
supporting the primitive IR?
|
||||||
- what do we need to hand-roll on the target in order to make this
|
- what do we need to hand-roll on the target in order to make this
|
||||||
work?
|
work?
|
||||||
|
|
||||||
Essentially, we want to write a virtual machine, and translate ARL
|
Essentially, we want to write a virtual machine, and translate ARL
|
||||||
code into bytecode for that VM. Goals:
|
code into bytecode for that VM. Goals:
|
||||||
- Easier to optimise IR bytecode than the AST of our original program
|
- Type checking
|
||||||
- Easier to imagine translations from that IR bytecode into target
|
- Optimiser (stretch)
|
||||||
platform code
|
|
||||||
*** TODO Minimal IR representation
|
|
||||||
We need the following clear items in our IR:
|
We need the following clear items in our IR:
|
||||||
- Static type values
|
- Static type values
|
||||||
- Static type variables (possible DeBrujin numbering or other such
|
- Static type variables (possible DeBrujin numbering or other such
|
||||||
@@ -35,29 +34,117 @@ We need the following clear items in our IR:
|
|||||||
- Strongly typed primitive operators (numeric, strings, I/O) with
|
- Strongly typed primitive operators (numeric, strings, I/O) with
|
||||||
packed arguments
|
packed arguments
|
||||||
|
|
||||||
Read about [[https://en.wikipedia.org/wiki/Three-address_code][TAC]].
|
|
||||||
*** TODO IR Compiler
|
|
||||||
We should have a rough grouping between AST objects and this IR. As
|
We should have a rough grouping between AST objects and this IR. As
|
||||||
ARL is Forth-like, we can use the stack semantics to generate this IR
|
ARL is Forth-like, we can use the stack semantics to generate this IR
|
||||||
as we walk the AST in a linear manner.
|
as we walk the AST in a linear manner. In practice this should almost
|
||||||
|
look like emulating a really small subset of the ARL language itself
|
||||||
|
and executing the program in that small subset.
|
||||||
|
|
||||||
|
Looking at how
|
||||||
|
[[https://en.wikipedia.org/wiki/Three-address_code][TAC]] works, I
|
||||||
|
think it may be a good idea to do something like that for our IR.
|
||||||
|
Essentially we should our AST into a sequence of really simple
|
||||||
|
bindings, with the final expression being a reference to some binding.
|
||||||
|
|
||||||
|
This also simplifies type checking to just verifying each little
|
||||||
|
binding and operation.
|
||||||
|
*** Examples
|
||||||
|
**** Basic example
|
||||||
Consider the following ARL code:
|
Consider the following ARL code:
|
||||||
#+begin_src text
|
#+begin_src text
|
||||||
34 35 +
|
34 35 +
|
||||||
#+end_src
|
#+end_src
|
||||||
|
|
||||||
When we walk through this code:
|
When we walk through the above code:
|
||||||
- 34 (an integer) is pushed onto the stack
|
- 34 (an integer) is pushed onto the stack
|
||||||
- 35 (an integer) is pushed onto the stack
|
- 35 (an integer) is pushed onto the stack
|
||||||
- + is encountered
|
- ~+~ primitive is encountered
|
||||||
- Pop two values off the stack and verify their type against the
|
- Type check the top two values of the stack; they should be
|
||||||
contract for "+" (something like (-> i32 i32 i32))
|
integral.
|
||||||
- Generate IR, something like ~prim-add(34, 35)~
|
- ~a b +~ should correspond to ~a + b~ so the IR expression should
|
||||||
*** TODO Consider optimisers
|
pack the arguments in that order: ~prim-add(34,35)~.
|
||||||
Certainly we should perform optimisations on the IR itself before
|
- Bind the generated IR expression to some unique name, say ~v1~.
|
||||||
passing it over to the code generator. Currently we haven't got much
|
- Ensure this works with type checking; looking up ~v1~'s type
|
||||||
in the way of optimisations to consider, but it may be worth
|
should give you the output type of the "+" operator (integer).
|
||||||
considering.
|
- Push ~v1~ onto the stack.
|
||||||
|
|
||||||
|
The final state of the stack should be something like ~[v1]~ where
|
||||||
|
~v1=prim-add(34,35)~. The final state of the stack, along with the
|
||||||
|
bindings we form, is the IR, to pass over to the later stages of the
|
||||||
|
compiler.
|
||||||
|
**** Slightly more complex example
|
||||||
|
Let's look at a slightly more complex program:
|
||||||
|
#+begin_src text
|
||||||
|
34 35 + 70 swap -
|
||||||
|
#+end_src
|
||||||
|
- 34 (integer) pushed
|
||||||
|
- 35 (integer) pushed
|
||||||
|
- ~+~ primitive:
|
||||||
|
- As stated previously, the final state of this primitive gives us
|
||||||
|
the name ~v1~ on the stack with the association
|
||||||
|
~v1=prim-add(34,35)~.
|
||||||
|
- 70 (integer) pushed
|
||||||
|
- ~swap~ primitive:
|
||||||
|
- Requires two values on the stack, but we care little about their
|
||||||
|
types. Just swaps their order on the stack.
|
||||||
|
- We /could/ introduce generics here to make the input/output
|
||||||
|
relation ship explicit (forall T, U swap:-(-> (T U) (U T))), but
|
||||||
|
at the same time we can just as easily get away with a type hole
|
||||||
|
(essentially some kind of ~any~). Up to debate.
|
||||||
|
- We do not generate IR for this primitive as it simply isn't
|
||||||
|
necessary. Instead we perform the swap on our IR stack and
|
||||||
|
continue. The ~swap~ primitive is "transparent" in the final IR.
|
||||||
|
- In this situation, the stack goes from ~[v1, 70]~ to
|
||||||
|
~[70, v1]~
|
||||||
|
- ~-~ primitive:
|
||||||
|
- Type checks the top two values of the stack (which are both
|
||||||
|
integers)
|
||||||
|
- ~a b -~ should correspond to ~a - b~, thus the corresponding IR
|
||||||
|
expression should be ~prim-sub(70,v1)~
|
||||||
|
- Associate IR expression with name ~v2~,
|
||||||
|
- Push ~v2~ onto the stack.
|
||||||
|
|
||||||
|
The final state of the IR should be:
|
||||||
|
- Stack: ~[v2]~
|
||||||
|
- Bindings:
|
||||||
|
- ~v1=prim-add(34,35)~
|
||||||
|
- ~v2=prim-sub(70,v1)~
|
||||||
|
|
||||||
|
Notice how some primitives generate IR, while others manipulate IR
|
||||||
|
themselves? They almost seem like macros!
|
||||||
|
|
||||||
|
Another thing of note is how the final state of the stack is a single
|
||||||
|
item in this case; an IR expression representing the entire program.
|
||||||
|
When we introduce code level bindings we won't have such nice outputs,
|
||||||
|
but it is certainly something to consider.
|
||||||
|
**** Hello world! example
|
||||||
|
For our hello world:
|
||||||
|
#+begin_src text
|
||||||
|
"Hello, world!\n" putstr
|
||||||
|
#+end_src
|
||||||
|
- "Hello, world!\n" (string) pushed
|
||||||
|
- "putstr" primitive:
|
||||||
|
- Type check the top of the stack (should be a string)
|
||||||
|
- Generate IR ~prim-putstr("Hello, world!\n")~
|
||||||
|
- Associate with name ~v1~ and push it onto the stack
|
||||||
|
|
||||||
|
Much simpler than our
|
||||||
|
*** TODO IR level type checking
|
||||||
|
During IR compilation, the following should be type checked:
|
||||||
|
- use of callables (primitives, user defined when implemented)
|
||||||
|
- variable assignment (when implemented)
|
||||||
|
- variable use (when implemented)
|
||||||
|
- definition of callables (when implemented)
|
||||||
|
|
||||||
|
We want to ensure no statement is unsound.
|
||||||
|
**** TODO Primitive types
|
||||||
|
Define the primitive types of the IR. Remember, simplicity is key,
|
||||||
|
but we need to mirror what we're getting on the ARL side.
|
||||||
|
**** TODO Type contracts for callables
|
||||||
|
Define how we can type check arguments on the stack against the types
|
||||||
|
a callable expects for its inputs. In the same vein, we also need to
|
||||||
|
figure out the type of whatever is pushed onto the stack by the
|
||||||
|
callable.
|
||||||
** TODO Code generator
|
** TODO Code generator
|
||||||
[[file:src/arl/target-c/]]
|
[[file:src/arl/target-c/]]
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user