Files
arl/arl.org

6.0 KiB

ARL - Issue tracker

TODO Write a minimum working transpiler

We need to be able to compile the following file: examples/hello-world.arl. All it does is print "Hello, world!". Should be relatively straightforward.

DONE Read file

DONE Parser

TODO Intermediate representation (Virtual Machine)

src/arl/vm/

Before we get into generating C code and then compiling it, it might be worth translating the parsed ARL code into a generic IR.

The IR should be primitive in its semantics but should still encapsulate the intention behind the original ARL code. This should allow us to find a set of minimum requirements for target compilation:

  • what can we reasonably use from the target platform to satisfy supporting the primitive IR?
  • what do we need to hand-roll on the target in order to make this work?

Essentially, we want to write a virtual machine, and translate ARL code into bytecode for that VM. Goals:

  • Type checking
  • Optimiser (stretch)

We need the following clear items in our IR:

  • Static type values
  • Static type variables (possible DeBrujin numbering or other such mechanism to abstract naming away and leave it to the target to generate effectively)
  • Strongly typed primitive operators (numeric, strings, I/O) with packed arguments

We should have a rough grouping between AST objects and this IR. As ARL is Forth-like, we can use the stack semantics to generate this IR as we walk the AST in a linear manner. In practice this should almost look like emulating a really small subset of the ARL language itself and executing the program in that small subset.

Looking at how TAC works, I think it may be a good idea to do something like that for our IR. Essentially we should our AST into a sequence of really simple bindings, with the final expression being a reference to some binding.

This also simplifies type checking to just verifying each little binding and operation.

Examples

Basic example

Consider the following ARL code:

34 35 +

When we walk through the above code:

  • 34 (an integer) is pushed onto the stack
  • 35 (an integer) is pushed onto the stack
  • + primitive is encountered

    • Type check the top two values of the stack; they should be integral.
    • a b + should correspond to a + b so the IR expression should pack the arguments in that order: prim-add(34,35).
    • Bind the generated IR expression to some unique name, say v1.

      • Ensure this works with type checking; looking up v1's type should give you the output type of the "+" operator (integer).
    • Push v1 onto the stack.

The final state of the stack should be something like [v1] where v1=prim-add(34,35). The final state of the stack, along with the bindings we form, is the IR, to pass over to the later stages of the compiler.

Slightly more complex example

Let's look at a slightly more complex program:

34 35 + 70 swap -
  • 34 (integer) pushed
  • 35 (integer) pushed
  • + primitive:

    • As stated previously, the final state of this primitive gives us the name v1 on the stack with the association v1=prim-add(34,35).
  • 70 (integer) pushed
  • swap primitive:

    • Requires two values on the stack, but we care little about their types. Just swaps their order on the stack.
    • We could introduce generics here to make the input/output relation ship explicit (forall T, U swap:-(-> (T U) (U T))), but at the same time we can just as easily get away with a type hole (essentially some kind of any). Up to debate.
    • We do not generate IR for this primitive as it simply isn't necessary. Instead we perform the swap on our IR stack and continue. The swap primitive is "transparent" in the final IR.
    • In this situation, the stack goes from [v1, 70] to [70, v1]
  • - primitive:

    • Type checks the top two values of the stack (which are both integers)
    • a b - should correspond to a - b, thus the corresponding IR expression should be prim-sub(70,v1)
    • Associate IR expression with name v2,
    • Push v2 onto the stack.

The final state of the IR should be:

  • Stack: [v2]
  • Bindings:

    • v1=prim-add(34,35)
    • v2=prim-sub(70,v1)

Notice how some primitives generate IR, while others manipulate IR themselves? They almost seem like macros!

Another thing of note is how the final state of the stack is a single item in this case; an IR expression representing the entire program. When we introduce code level bindings we won't have such nice outputs, but it is certainly something to consider.

Hello world! example

For our hello world:

"Hello, world!\n" putstr
  • "Hello, world!\n" (string) pushed
  • "putstr" primitive:

    • Type check the top of the stack (should be a string)
    • Generate IR prim-putstr("Hello, world!\n")
    • Associate with name v1 and push it onto the stack

Much simpler than our

TODO IR level type checking

During IR compilation, the following should be type checked:

  • use of callables (primitives, user defined when implemented)
  • variable assignment (when implemented)
  • variable use (when implemented)
  • definition of callables (when implemented)

We want to ensure no statement is unsound.

TODO Primitive types

Define the primitive types of the IR. Remember, simplicity is key, but we need to mirror what we're getting on the ARL side.

TODO Type contracts for callables

Define how we can type check arguments on the stack against the types a callable expects for its inputs. In the same vein, we also need to figure out the type of whatever is pushed onto the stack by the callable.

TODO Code generator

src/arl/target-c/

This should take the IR translated from the AST generated by the parser, and write equivalent C code.

After we've generated the C code, we need to call a C compiler on it to generate a binary. GCC and Clang allow passing source code through stdin, so we don't even need to write to disk first which is nice.