6.2 KiB
ARL - Issue tracker
TODO Write a minimum working transpiler
We need to be able to compile the following file: examples/hello-world.arl. All it does is print "Hello, world!". Should be relatively straightforward.
DONE Read file
DONE Parser
TODO Intermediate representation (Virtual Machine)
Before we get into generating C code and then compiling it, it might be worth translating the parsed ARL code into a generic IR.
The IR should be primitive in its semantics but should still encapsulate the intention behind the original ARL code. This should allow us to find a set of minimum requirements for target compilation:
- what can we reasonably use from the target platform to satisfy supporting the primitive IR?
- what do we need to hand-roll on the target in order to make this work?
Essentially, we want to write a virtual machine, and translate ARL code into bytecode for that VM. Goals:
- Type checking
- Optimiser (stretch)
We need the following clear items in our IR:
- Static type values
- Static type variables (possible DeBrujin numbering or other such mechanism to abstract naming away and leave it to the target to generate effectively)
- Strongly typed primitive operators (numeric, strings, I/O) with packed arguments
We should have a rough grouping between AST objects and this IR. As ARL is Forth-like, we can use the stack semantics to generate this IR as we walk the AST in a linear manner. In practice this should almost look like emulating a really small subset of the ARL language itself and executing the program in that small subset.
Looking at how TAC works, I think it may be a good idea to do something like that for our IR. Essentially we should our AST into a sequence of really simple bindings, with the final expression being a reference to some binding.
This also simplifies type checking to just verifying each little binding and operation.
Examples
Basic example
Consider the following ARL code:
34 35 +
When we walk through the above code:
- 34 (an integer) is pushed onto the stack
- 35 (an integer) is pushed onto the stack
-
+primitive is encountered- Type check the top two values of the stack; they should be integral.
a b +should correspond toa + bso the IR expression should pack the arguments in that order:prim-add(34,35).-
Bind the generated IR expression to some unique name, say
v1.- Ensure this works with type checking; looking up
v1's type should give you the output type of the "+" operator (integer).
- Ensure this works with type checking; looking up
- Push
v1onto the stack.
The final state of the stack should be something like [v1] where
v1=prim-add(34,35). The final state of the stack, along with the
bindings we form, is the IR, to pass over to the later stages of the
compiler.
Slightly more complex example
Let's look at a slightly more complex program:
34 35 + 70 swap -
- 34 (integer) pushed
- 35 (integer) pushed
-
+primitive:- As stated previously, the final state of this primitive gives us
the name
v1on the stack with the associationv1=prim-add(34,35).
- As stated previously, the final state of this primitive gives us
the name
- 70 (integer) pushed
-
swapprimitive:- Requires two values on the stack, but we care little about their types. Just swaps their order on the stack.
- We could introduce generics here to make the input/output
relation ship explicit (forall T, U swap:-(-> (T U) (U T))), but
at the same time we can just as easily get away with a type hole
(essentially some kind of
any). Up to debate. - We do not generate IR for this primitive as it simply isn't
necessary. Instead we perform the swap on our IR stack and
continue. The
swapprimitive is "transparent" in the final IR. - In this situation, the stack goes from
[v1, 70]to[70, v1]
-
-primitive:- Type checks the top two values of the stack (which are both integers)
a b -should correspond toa - b, thus the corresponding IR expression should beprim-sub(70,v1)- Associate IR expression with name
v2, - Push
v2onto the stack.
The final state of the IR should be:
- Stack:
[v2] -
Bindings:
v1=prim-add(34,35)v2=prim-sub(70,v1)
Notice how some primitives generate IR, while others manipulate IR themselves? They almost seem like macros!
Another thing of note is how the final state of the stack is a single item in this case; an IR expression representing the entire program. When we introduce code level bindings we won't have such nice outputs, but it is certainly something to consider.
Hello world! example
For our hello world:
"Hello, world!\n" putstr
- "Hello, world!\n" (string) pushed
-
"putstr" primitive:
- Type check the top of the stack (should be a string)
- Generate IR
prim-putstr("Hello, world!\n") - Associate with name
v1and push it onto the stack
Much simpler than our
TODO IR level type checking
During IR compilation, the following should be type checked:
- use of callables (primitives, user defined when implemented)
- variable assignment (when implemented)
- variable use (when implemented)
- definition of callables (when implemented)
We want to ensure no statement is unsound.
TODO Primitive types
Define the primitive types of the IR. Remember, simplicity is key, but we need to mirror what we're getting on the ARL side.
TODO Type contracts for callables
Define how we can type check arguments on the stack against the types a callable expects for its inputs. In the same vein, we also need to figure out the type of whatever is pushed onto the stack by the callable.
TODO Use SSA for user level bindings
Static single-assignment form is something we should use when we introduce for user level bindings.
TODO Code generator
This should take the IR translated from the AST generated by the parser, and write equivalent C code.
After we've generated the C code, we need to call a C compiler on it to generate a binary. GCC and Clang allow passing source code through stdin, so we don't even need to write to disk first which is nice.