#+title: ARL - Issue tracker #+date: 2026-01-23 * TODO Write a minimum working transpiler We need to be able to compile the following file: [[file:examples/hello-world.arl]]. All it does is print "Hello, world!". Should be relatively straightforward. ** DONE Read file ** DONE Parser ** TODO Intermediate representation (Virtual Machine) [[file:src/arl/vm/]] Before we get into generating C code and then compiling it, it might be worth translating the parsed ARL code into a generic IR. The IR should be primitive in its semantics but should still encapsulate the intention behind the original ARL code. This should allow us to find a set of minimum requirements for target compilation: - what can we reasonably use from the target platform to satisfy supporting the primitive IR? - what do we need to hand-roll on the target in order to make this work? Essentially, we want to write a virtual machine, and translate ARL code into bytecode for that VM. Goals: - Type checking - Optimiser (stretch) We need the following clear items in our IR: - Static type values - Static type variables (possible DeBrujin numbering or other such mechanism to abstract naming away and leave it to the target to generate effectively) - Strongly typed primitive operators (numeric, strings, I/O) with packed arguments We should have a rough grouping between AST objects and this IR. As ARL is Forth-like, we can use the stack semantics to generate this IR as we walk the AST in a linear manner. In practice this should almost look like emulating a really small subset of the ARL language itself and executing the program in that small subset. Looking at how [[https://en.wikipedia.org/wiki/Three-address_code][TAC]] works, I think it may be a good idea to do something like that for our IR. Essentially we should our AST into a sequence of really simple bindings, with the final expression being a reference to some binding. This also simplifies type checking to just verifying each little binding and operation. *** Examples **** Basic example Consider the following ARL code: #+begin_src text 34 35 + #+end_src When we walk through the above code: - 34 (an integer) is pushed onto the stack - 35 (an integer) is pushed onto the stack - ~+~ primitive is encountered - Type check the top two values of the stack; they should be integral. - ~a b +~ should correspond to ~a + b~ so the IR expression should pack the arguments in that order: ~prim-add(34,35)~. - Bind the generated IR expression to some unique name, say ~v1~. - Ensure this works with type checking; looking up ~v1~'s type should give you the output type of the "+" operator (integer). - Push ~v1~ onto the stack. The final state of the stack should be something like ~[v1]~ where ~v1=prim-add(34,35)~. The final state of the stack, along with the bindings we form, is the IR, to pass over to the later stages of the compiler. **** Slightly more complex example Let's look at a slightly more complex program: #+begin_src text 34 35 + 70 swap - #+end_src - 34 (integer) pushed - 35 (integer) pushed - ~+~ primitive: - As stated previously, the final state of this primitive gives us the name ~v1~ on the stack with the association ~v1=prim-add(34,35)~. - 70 (integer) pushed - ~swap~ primitive: - Requires two values on the stack, but we care little about their types. Just swaps their order on the stack. - We /could/ introduce generics here to make the input/output relation ship explicit (forall T, U swap:-(-> (T U) (U T))), but at the same time we can just as easily get away with a type hole (essentially some kind of ~any~). Up to debate. - We do not generate IR for this primitive as it simply isn't necessary. Instead we perform the swap on our IR stack and continue. The ~swap~ primitive is "transparent" in the final IR. - In this situation, the stack goes from ~[v1, 70]~ to ~[70, v1]~ - ~-~ primitive: - Type checks the top two values of the stack (which are both integers) - ~a b -~ should correspond to ~a - b~, thus the corresponding IR expression should be ~prim-sub(70,v1)~ - Associate IR expression with name ~v2~, - Push ~v2~ onto the stack. The final state of the IR should be: - Stack: ~[v2]~ - Bindings: - ~v1=prim-add(34,35)~ - ~v2=prim-sub(70,v1)~ Notice how some primitives generate IR, while others manipulate IR themselves? They almost seem like macros! Another thing of note is how the final state of the stack is a single item in this case; an IR expression representing the entire program. When we introduce code level bindings we won't have such nice outputs, but it is certainly something to consider. **** Hello world! example For our hello world: #+begin_src text "Hello, world!\n" putstr #+end_src - "Hello, world!\n" (string) pushed - "putstr" primitive: - Type check the top of the stack (should be a string) - Generate IR ~prim-putstr("Hello, world!\n")~ - Associate with name ~v1~ and push it onto the stack Much simpler than our *** TODO IR level type checking During IR compilation, the following should be type checked: - use of callables (primitives, user defined when implemented) - variable assignment (when implemented) - variable use (when implemented) - definition of callables (when implemented) We want to ensure no statement is unsound. **** TODO Primitive types Define the primitive types of the IR. Remember, simplicity is key, but we need to mirror what we're getting on the ARL side. **** TODO Type contracts for callables Define how we can type check arguments on the stack against the types a callable expects for its inputs. In the same vein, we also need to figure out the type of whatever is pushed onto the stack by the callable. ** TODO Code generator [[file:src/arl/target-c/]] This should take the IR translated from the AST generated by the parser, and write equivalent C code. After we've generated the C code, we need to call a C compiler on it to generate a binary. GCC and Clang allow passing source code through stdin, so we don't even need to write to disk first which is nice.