Compare commits

..

11 Commits

Author SHA1 Message Date
Aryadev Chavali
00786d9cb9 Simple arl-mode plugin for Emacs 2026-02-01 19:25:13 +00:00
Aryadev Chavali
b391fe9a74 lexer/token: update tokeniser to recognise puts 2026-01-29 15:49:20 +00:00
Aryadev Chavali
12e82e64d0 examples/hello-world: putstr -> puts 2026-01-29 15:40:30 +00:00
Aryadev Chavali
9b55b9ec32 arl.org: rewrite parser bit 2026-01-29 05:42:16 +00:00
Aryadev Chavali
6545bd1302 dir-locals: make examples by default 2026-01-29 05:19:51 +00:00
Aryadev Chavali
c11b69092d arl.org: massive updates 2026-01-29 05:19:40 +00:00
Aryadev Chavali
82b96e23d5 Adjusted README 2026-01-29 04:18:07 +00:00
Aryadev Chavali
0fdfbef1de Makefile: added recipe to run examples 2026-01-29 04:17:53 +00:00
Aryadev Chavali
6f6b747540 lib/vec | lexer: Clean up comments 2026-01-29 04:13:41 +00:00
Aryadev Chavali
321b06ca0d Makefile: modules is now dynamically found 2026-01-29 04:12:46 +00:00
Aryadev Chavali
d46ee32775 main: split off reading and usage function to its own unit 2026-01-29 04:12:33 +00:00
13 changed files with 228 additions and 219 deletions

View File

@@ -1,6 +1,6 @@
;;; Directory Local Variables -*- no-byte-compile: t -*-
;;; For more information see (info "(emacs) Directory Variables")
((nil . ((compile-command . "make MODE=debug -k")
((nil . ((compile-command . "make -k MODE=debug examples")
(+license/license-choice . "MIT License")))
(c-mode . ((mode . clang-format))))

View File

@@ -3,8 +3,8 @@ CC=cc
DIST=build
OUT=$(DIST)/arl.out
MODULES=. lib lexer
UNITS=main lib/vec lib/sv lexer/token lexer/lexer
MODULES=$(shell cd include/arl; find . -type 'd' -printf "%f\n")
UNITS=main cli lib/vec lib/sv lexer/token lexer/lexer
OBJECTS:=$(patsubst %,$(DIST)/%.o, $(UNITS))
LDFLAGS=
@@ -39,7 +39,7 @@ clangd: compile_commands.json
compile_commands.json: Makefile
bear -- $(MAKE) -B MODE=debug
.PHONY: run clean
.PHONY: run clean examples
ARGS=
run: $(OUT)
./$^ $(ARGS)
@@ -47,5 +47,9 @@ run: $(OUT)
clean:
rm -rf $(DIST)
examples: $(OUT)
@echo "Example: Hello World"
./$^ examples/hello-world.arl
DEPS:=$(patsubst %,$(DEPDIR)/%.d, $(UNITS))
include $(wildcard $(DEPS))

14
README
View File

@@ -6,13 +6,12 @@
│ /_/ \_\_| \_\_____| │
└───────────────────────┘
Similar to Forth. Compiles to C.
Native speed with simple semantics.
Similar to Forth.
-----
Goals
-----
- Complete operational transpiler to C
- Complete operational transpiler, with C as a provisional working target
- Ability to reuse compiled code (as object code) in top level ARL code.
- Static type system with informative errors
@@ -44,3 +43,12 @@ $ make DIST=<folder>
Similarly, the general flags used in the C compiler may be set via the CFLAGS
variable, with linking arguments set via the LDFLAGS variable.
------------------
Usage instructions
------------------
Once built, simply use the built binary like so:
$ ./build/arl.out <filename>
Alternatively, you can run the examples automatically via the Makefile:
$ make examples

195
arl.org
View File

@@ -1,161 +1,64 @@
#+title: ARL - Issue tracker
#+date: 2026-01-23
#+filetags: arl
* TODO Write a minimum working transpiler
We need to be able to compile the following file:
[[file:examples/hello-world.arl]]. All it does is print "Hello,
world!". Should be relatively straightforward.
** Stages
We need the following stages in our MVP transpiler:
- Source code reading (read bytes from a file)
- Parse raw bytes into tokens (Lexer)
- Interpret tokens into a classical AST (Parser)
- Stack effect and type analysis of the AST for soundness
- Translate AST into C code (Codegen)
- Compile C code into native executable (Target)
It's a Eulerian Path from the source code to the native executable.
** DONE Read file
** DONE Parser
** TODO Intermediate representation (Virtual Machine)
[[file:src/arl/vm/]]
** DONE Lexer
[[file:src/lexer/]]
[[file:include/arl/lexer/]]
** WIP Parser
[[file:src/parser/]]
[[file:include/arl/parser/]]
Before we get into generating C code and then compiling it, it might
be worth translating the parsed ARL code into a generic IR.
We need to generate some form of AST from the token stream. This
should be a little more advanced than our initial stream,
distinguishing between
- Literal values
- Primitive calls
- References to otherwise undefined words (may be defined through
import or later on)
** TODO Stack effect/type analysis
[[file:src/analysis/]]
[[file:include/arl/analysis/]]
The IR should be primitive in its semantics but should still
encapsulate the intention behind the original ARL code. This should
allow us to find a set of minimum requirements for target compilation:
- what can we reasonably use from the target platform to satisfy
supporting the primitive IR?
- what do we need to hand-roll on the target in order to make this
work?
Given the AST, we need to verify the soundness of it with regards to
types and the stack. We have this idea of "stack effects" attached to
every node in the AST; literals push values onto the stack and pop
nothing, while operations may pop some operands and push some values.
Essentially, we want to write a virtual machine, and translate ARL
code into bytecode for that VM. Goals:
- Type checking
- Optimiser (stretch)
We need a way to:
- Codify the stack effects of each type of AST node
- Infer the total stack effect from a sequence of nodes
We need the following clear items in our IR:
- Static type values
- Static type variables (possible DeBrujin numbering or other such
mechanism to abstract naming away and leave it to the target to
generate effectively)
- Strongly typed primitive operators (numeric, strings, I/O) with
packed arguments
We should have a rough grouping between AST objects and this IR. As
ARL is Forth-like, we can use the stack semantics to generate this IR
as we walk the AST in a linear manner. In practice this should almost
look like emulating a really small subset of the ARL language itself
and executing the program in that small subset.
Looking at how
[[https://en.wikipedia.org/wiki/Three-address_code][TAC]] works, I
think it may be a good idea to do something like that for our IR.
Essentially we should our AST into a sequence of really simple
bindings, with the final expression being a reference to some binding.
This also simplifies type checking to just verifying each little
binding and operation.
*** Examples
**** Basic example
Consider the following ARL code:
#+begin_src text
34 35 +
#+end_src
When we walk through the above code:
- 34 (an integer) is pushed onto the stack
- 35 (an integer) is pushed onto the stack
- ~+~ primitive is encountered
- Type check the top two values of the stack; they should be
integral.
- ~a b +~ should correspond to ~a + b~ so the IR expression should
pack the arguments in that order: ~prim-add(34,35)~.
- Bind the generated IR expression to some unique name, say ~v1~.
- Ensure this works with type checking; looking up ~v1~'s type
should give you the output type of the "+" operator (integer).
- Push ~v1~ onto the stack.
The final state of the stack should be something like ~[v1]~ where
~v1=prim-add(34,35)~. The final state of the stack, along with the
bindings we form, is the IR, to pass over to the later stages of the
compiler.
**** Slightly more complex example
Let's look at a slightly more complex program:
#+begin_src text
34 35 + 70 swap -
#+end_src
- 34 (integer) pushed
- 35 (integer) pushed
- ~+~ primitive:
- As stated previously, the final state of this primitive gives us
the name ~v1~ on the stack with the association
~v1=prim-add(34,35)~.
- 70 (integer) pushed
- ~swap~ primitive:
- Requires two values on the stack, but we care little about their
types. Just swaps their order on the stack.
- We /could/ introduce generics here to make the input/output
relation ship explicit (forall T, U swap:-(-> (T U) (U T))), but
at the same time we can just as easily get away with a type hole
(essentially some kind of ~any~). Up to debate.
- We do not generate IR for this primitive as it simply isn't
necessary. Instead we perform the swap on our IR stack and
continue. The ~swap~ primitive is "transparent" in the final IR.
- In this situation, the stack goes from ~[v1, 70]~ to
~[70, v1]~
- ~-~ primitive:
- Type checks the top two values of the stack (which are both
integers)
- ~a b -~ should correspond to ~a - b~, thus the corresponding IR
expression should be ~prim-sub(70,v1)~
- Associate IR expression with name ~v2~,
- Push ~v2~ onto the stack.
The final state of the IR should be:
- Stack: ~[v2]~
- Bindings:
- ~v1=prim-add(34,35)~
- ~v2=prim-sub(70,v1)~
Notice how some primitives generate IR, while others manipulate IR
themselves? They almost seem like macros!
Another thing of note is how the final state of the stack is a single
item in this case; an IR expression representing the entire program.
When we introduce code level bindings we won't have such nice outputs,
but it is certainly something to consider.
**** Hello world! example
For our hello world:
#+begin_src text
"Hello, world!\n" putstr
#+end_src
- "Hello, world!\n" (string) pushed
- "putstr" primitive:
- Type check the top of the stack (should be a string)
- Generate IR ~prim-putstr("Hello, world!\n")~
- Associate with name ~v1~ and push it onto the stack
Much simpler than our
*** TODO IR level type checking
During IR compilation, the following should be type checked:
- use of callables (primitives, user defined when implemented)
- variable assignment (when implemented)
- variable use (when implemented)
- definition of callables (when implemented)
We want to ensure no statement is unsound.
**** TODO Primitive types
Define the primitive types of the IR. Remember, simplicity is key,
but we need to mirror what we're getting on the ARL side.
**** TODO Type contracts for callables
Define how we can type check arguments on the stack against the types
a callable expects for its inputs. In the same vein, we also need to
figure out the type of whatever is pushed onto the stack by the
callable.
*** TODO Use SSA for user level bindings
[[https://en.wikipedia.org/wiki/Static_single-assignment_form][Static
single-assignment form]] is something we should use when we introduce
for user level bindings.
These stack effects work in tandem with our type analysis. Stack
shape analysis tells us what operands are being fed into primitives,
while the type analysis will tell us if the operands are well formed
for the primitives.
** TODO Code generator
[[file:src/arl/target-c/]]
[[file:src/codegen/]]
[[file:include/arl/codegen/]]
This should take the IR translated from the AST generated by the
parser, and write equivalent C code.
This should take the AST generated by the parser (which should already
have been analysed), and write equivalent C code.
** TODO Target compilation
[[file:src/target/]]
[[file:include/arl/target/]]
After we've generated the C code, we need to call a C compiler on it
to generate a binary. GCC and Clang allow passing source code through
stdin, so we don't even need to write to disk first which is nice.
=gcc= and =clang= take C code via /stdin/, so we don't need to write
the C code to disk - we can just leave it as a buffer of bytes. So
we'll call the compilers and feed the generated code from the previous
stage into it via stdin.

View File

@@ -1 +1 @@
"Hello, world!\n" putstr
"Hello, world!\n" puts

37
extensions/arl-mode.el Normal file
View File

@@ -0,0 +1,37 @@
;;; arl-mode.el --- ARL mode for Emacs -*- lexical-binding: t; -*-
;; Copyright (C) 2026 Aryadev Chavali
;; Author: Aryadev Chavali <aryadev@aryadevchavali.com>
;; Keywords:
;; Copyright (C) 2026 Aryadev Chavali
;; This program is distributed in the hope that it will be useful, but WITHOUT
;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
;; FOR A PARTICULAR PURPOSE. See the MIT License for details.
;; You may distribute and modify this code under the terms of the MIT License,
;; which you should have received a copy of along with this program. If not,
;; please go to <https://opensource.org/license/MIT>.
;;; Commentary:
;;
;;; Code:
(defvar arl-mode-comments '(?\; ";;" ("#|" . "|#")))
(defvar arl-mode-keywords '("if" "then" "else"))
(defvar arl-mode-expressions '(("\".*\"" . font-lock-string-face)))
(defvar arl-mode-automode-list '("\\.arl"))
(define-derived-mode arl-mode
arl-mode-comments
arl-mode-keywords
arl-mode-expressions
arl-mode-automode-list
nil)
(provide 'arl-mode)
;;; arl-mode.el ends here

31
include/arl/cli.h Normal file
View File

@@ -0,0 +1,31 @@
/* cli.h: CLI helpers
* Created: 2026-01-29
* Author: Aryadev Chavali
* License: See end of file
* Commentary:
*/
#ifndef CLI_H
#define CLI_H
#include <stdio.h>
#include <arl/lib/sv.h>
int read_file(const char *filename, sv_t *ret);
int read_pipe(FILE *pipe, sv_t *ret);
void usage(FILE *fp);
#endif
/* Copyright (C) 2026 Aryadev Chavali
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
* FOR A PARTICULAR PURPOSE. See the MIT License for details.
* You may distribute and modify this code under the terms of the MIT License,
* which you should have received a copy of along with this program. If not,
* please go to <https://opensource.org/license/MIT>.
*/

View File

@@ -25,7 +25,7 @@ typedef enum
/// Known symbols which later stages would benefit from.
typedef enum
{
TOKEN_KNOWN_PUTSTR,
TOKEN_KNOWN_PUTS,
NUM_TOKEN_KNOWNS,
} token_known_t;

83
src/cli.c Normal file
View File

@@ -0,0 +1,83 @@
/* cli.c:
* Created: 2026-01-29
* Author: Aryadev Chavali
* License: See end of file
* Commentary: See /include/arl/cli.h
*/
#include <stdlib.h>
#include <string.h>
#include <arl/cli.h>
#include <arl/lib/vec.h>
int read_file(const char *filename, sv_t *ret)
{
// NOTE: Stupidly simple. Presumes the file is NOT three pipes in a trench
// coat.
FILE *fp = fopen(filename, "rb");
if (!fp)
return 1;
fseek(fp, 0, SEEK_END);
ret->size = ftell(fp);
fseek(fp, 0, SEEK_SET);
ret->data = calloc(1, ret->size + 1);
fread(ret->data, ret->size, 1, fp);
fclose(fp);
ret->data[ret->size] = '\0';
return 0;
}
int read_pipe(FILE *pipe, sv_t *ret)
{
// NOTE: We can't read an entire pipe at once like we did for read_file. So
// let's read in buffered chunks, with a vector to keep them contiguous.
vec_t contents = {0};
char buffer[1024];
while (!feof(pipe))
{
size_t bytes_read = fread(buffer, 1, sizeof(buffer), pipe);
vec_append(&contents, buffer, bytes_read);
}
ret->size = contents.size;
// Get that null terminator in, but only after we've recorded the actual size
// of what's been read.
vec_append_byte(&contents, '\0');
if (contents.not_inlined)
{
// Take the heap pointer from us.
ret->data = vec_data(&contents);
}
else
{
// vec_data(&contents) is stack allocated; can't carry that out of this
// function!
ret->data = calloc(1, contents.size);
memcpy(ret->data, vec_data(&contents), contents.size);
}
return 0;
}
void usage(FILE *fp)
{
fprintf(fp, "Usage: arl [FILE]\n"
"Compiles [FILE] as ARL source code.\n"
" [FILE]: File to compile.\n"
"If FILE is \"--\", then read from stdin.\n");
}
/* Copyright (C) 2026 Aryadev Chavali
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
* FOR A PARTICULAR PURPOSE. See the MIT License for details.
* You may distribute and modify this code under the terms of the MIT License,
* which you should have received a copy of along with this program. If not,
* please go to <https://opensource.org/license/MIT>.
*/

View File

@@ -2,7 +2,7 @@
* Created: 2026-01-22
* Author: Aryadev Chavali
* License: See end of file
* Commentary: See /include/arl/lexr/lexr.h
* Commentary: See /include/arl/lexer/lexer.h
*/
#include <ctype.h>

View File

@@ -13,8 +13,8 @@ const char *token_known_to_cstr(token_known_t known)
{
switch (known)
{
case TOKEN_KNOWN_PUTSTR:
return "putstr";
case TOKEN_KNOWN_PUTS:
return "puts";
default:
FAIL("Unexpected TOKEN_KNOWN value: %d\n", known);
}

View File

@@ -2,7 +2,7 @@
* Created: 2026-01-22
* Author: Aryadev Chavali
* License: See end of file
* Commentary:
* Commentary: See /include/arl/lib/vec.h
Taken from prick_vec.h: see https://github.com/oreodave/prick.
*/

View File

@@ -18,64 +18,7 @@
#include <arl/lib/sv.h>
#include <arl/lib/vec.h>
int read_file(const char *filename, sv_t *ret)
{
// NOTE: Stupidly simple. Presumes the file is NOT three pipes in a trench
// coat.
FILE *fp = fopen(filename, "rb");
if (!fp)
return 1;
fseek(fp, 0, SEEK_END);
ret->size = ftell(fp);
fseek(fp, 0, SEEK_SET);
ret->data = calloc(1, ret->size + 1);
fread(ret->data, ret->size, 1, fp);
fclose(fp);
ret->data[ret->size] = '\0';
return 0;
}
int read_pipe(FILE *pipe, sv_t *ret)
{
// NOTE: We can't read an entire pipe at once like we did for read_file. So
// let's read in buffered chunks, with a vector to keep them contiguous.
vec_t contents = {0};
char buffer[1024];
while (!feof(pipe))
{
size_t bytes_read = fread(buffer, 1, sizeof(buffer), pipe);
vec_append(&contents, buffer, bytes_read);
}
ret->size = contents.size;
// Get that null terminator in, but only after we've recorded the actual size
// of what's been read.
vec_append_byte(&contents, '\0');
if (contents.not_inlined)
{
// Take the heap pointer from us.
ret->data = vec_data(&contents);
}
else
{
// vec_data(&contents) is stack allocated; can't carry that out of this
// function!
ret->data = calloc(1, contents.size);
memcpy(ret->data, vec_data(&contents), contents.size);
}
return 0;
}
void usage(FILE *fp)
{
fprintf(fp, "Usage: arl [FILE]\n"
"Compiles [FILE] as ARL source code.\n"
" [FILE]: File to compile.\n"
"If FILE is \"--\", then read from stdin.\n");
}
#include <arl/cli.h>
int main(int argc, char *argv[])
{
@@ -127,11 +70,11 @@ int main(int argc, char *argv[])
goto end;
}
LOG("Lexed %lu tokens\n", tokens.vec.size / sizeof(token_t));
#if VERBOSE_LOGS
LOG("Lexed %lu tokens ", tokens.vec.size / sizeof(token_t));
token_stream_print(stdout, &tokens);
#endif
printf("\n");
#endif
end:
if (contents.data)