Simple arl-mode plugin for Emacs

lexer/token: update tokeniser to recognise puts
examples/hello-world: putstr -> puts
2026-02-01 19:25:13 +00:00 · 2026-01-29 15:49:20 +00:00 · 2026-01-29 15:40:30 +00:00 · 2026-01-29 05:42:16 +00:00 · 2026-01-29 05:19:51 +00:00 · 2026-01-29 05:19:40 +00:00
13 changed files with 228 additions and 219 deletions
--- a/.dir-locals.el
+++ b/.dir-locals.el
@@ -1,6 +1,6 @@
 ;;; Directory Local Variables            -*- no-byte-compile: t -*-
 ;;; For more information see (info "(emacs) Directory Variables")

-((nil      . ((compile-command . "make MODE=debug -k")
+((nil      . ((compile-command . "make -k MODE=debug examples")
              (+license/license-choice . "MIT License")))
 (c-mode   . ((mode . clang-format))))
--- a/10
+++ b/10
@@ -3,8 +3,8 @@ CC=cc
 DIST=build
 OUT=$(DIST)/arl.out

-MODULES=. lib lexer
-UNITS=main lib/vec lib/sv lexer/token lexer/lexer
+MODULES=$(shell cd include/arl; find . -type 'd' -printf "%f\n")
+UNITS=main cli lib/vec lib/sv lexer/token lexer/lexer
 OBJECTS:=$(patsubst %,$(DIST)/%.o, $(UNITS))

 LDFLAGS=
@@ -39,7 +39,7 @@ clangd: compile_commands.json
 compile_commands.json: Makefile
 	bear -- $(MAKE) -B MODE=debug

-.PHONY: run clean
+.PHONY: run clean examples
 ARGS=
 run: $(OUT)
 	./$^ $(ARGS)
@@ -47,5 +47,9 @@ run: $(OUT)
 clean:
 	rm -rf $(DIST)

+examples: $(OUT)
+	@echo "Example: Hello World"
+	./$^ examples/hello-world.arl
+
 DEPS:=$(patsubst %,$(DEPDIR)/%.d, $(UNITS))
 include $(wildcard $(DEPS))
--- a/14
+++ b/14
@@ -6,13 +6,12 @@
 │ /_/   \_\_| \_\_____| │
 └───────────────────────┘

-Similar to Forth.  Compiles to C.
-Native speed with simple semantics.
+Similar to Forth.

 -----
 Goals
 -----
- Complete operational transpiler to C
+- Complete operational transpiler, with C as a provisional working target
 - Ability to reuse compiled code (as object code) in top level ARL code.
 - Static type system with informative errors

@@ -44,3 +43,12 @@ $ make DIST=<folder>

 Similarly, the general flags used in the C compiler may be set via the CFLAGS
 variable, with linking arguments set via the LDFLAGS variable.
+
+------------------
+Usage instructions
+------------------
+Once built, simply use the built binary like so:
+$ ./build/arl.out <filename>
+
+Alternatively, you can run the examples automatically via the Makefile:
+$ make examples
--- a/arl.org
+++ b/arl.org
@@ -1,161 +1,64 @@
 #+title: ARL - Issue tracker
 #+date: 2026-01-23
+#+filetags: arl

 * TODO Write a minimum working transpiler
 We need to be able to compile the following file:
 [[file:examples/hello-world.arl]].  All it does is print "Hello,
 world!".  Should be relatively straightforward.
+** Stages
+We need the following stages in our MVP transpiler:
+- Source code reading (read bytes from a file)
+- Parse raw bytes into tokens (Lexer)
+- Interpret tokens into a classical AST (Parser)
+- Stack effect and type analysis of the AST for soundness
+- Translate AST into C code (Codegen)
+- Compile C code into native executable (Target)
+
+It's a Eulerian Path from the source code to the native executable.
 ** DONE Read file
-** DONE Parser
-** TODO Intermediate representation (Virtual Machine)
-[[file:src/arl/vm/]]
+** DONE Lexer
+[[file:src/lexer/]]
+[[file:include/arl/lexer/]]
+** WIP Parser
+[[file:src/parser/]]
+[[file:include/arl/parser/]]

-Before we get into generating C code and then compiling it, it might
-be worth translating the parsed ARL code into a generic IR.
+We need to generate some form of AST from the token stream.  This
+should be a little more advanced than our initial stream,
+distinguishing between
+- Literal values
+- Primitive calls
+- References to otherwise undefined words (may be defined through
+  import or later on)
+** TODO Stack effect/type analysis
+[[file:src/analysis/]]
+[[file:include/arl/analysis/]]

-The IR should be primitive in its semantics but should still
-encapsulate the intention behind the original ARL code.  This should
-allow us to find a set of minimum requirements for target compilation:
- what can we reasonably use from the target platform to satisfy
-  supporting the primitive IR?
- what do we need to hand-roll on the target in order to make this
-  work?
+Given the AST, we need to verify the soundness of it with regards to
+types and the stack.  We have this idea of "stack effects" attached to
+every node in the AST; literals push values onto the stack and pop
+nothing, while operations may pop some operands and push some values.

-Essentially, we want to write a virtual machine, and translate ARL
-code into bytecode for that VM.  Goals:
- Type checking
- Optimiser (stretch)
+We need a way to:
+- Codify the stack effects of each type of AST node
+- Infer the total stack effect from a sequence of nodes

-We need the following clear items in our IR:
- Static type values
- Static type variables (possible DeBrujin numbering or other such
-  mechanism to abstract naming away and leave it to the target to
-  generate effectively)
- Strongly typed primitive operators (numeric, strings, I/O) with
-  packed arguments
-
-We should have a rough grouping between AST objects and this IR.  As
-ARL is Forth-like, we can use the stack semantics to generate this IR
-as we walk the AST in a linear manner.  In practice this should almost
-look like emulating a really small subset of the ARL language itself
-and executing the program in that small subset.
-
-Looking at how
-[[https://en.wikipedia.org/wiki/Three-address_code][TAC]] works, I
-think it may be a good idea to do something like that for our IR.
-Essentially we should our AST into a sequence of really simple
-bindings, with the final expression being a reference to some binding.
-
-This also simplifies type checking to just verifying each little
-binding and operation.
-
-*** Examples
-**** Basic example
-Consider the following ARL code:
-#+begin_src text
-34 35 +
-#+end_src
-
-When we walk through the above code:
- 34 (an integer) is pushed onto the stack
- 35 (an integer) is pushed onto the stack
- ~+~ primitive is encountered
-  - Type check the top two values of the stack; they should be
-    integral.
-  - ~a b +~ should correspond to ~a + b~ so the IR expression should
-    pack the arguments in that order: ~prim-add(34,35)~.
-  - Bind the generated IR expression to some unique name, say ~v1~.
-    - Ensure this works with type checking; looking up ~v1~'s type
-      should give you the output type of the "+" operator (integer).
-  - Push ~v1~ onto the stack.
-
-The final state of the stack should be something like ~[v1]~ where
-~v1=prim-add(34,35)~.  The final state of the stack, along with the
-bindings we form, is the IR, to pass over to the later stages of the
-compiler.
-**** Slightly more complex example
-Let's look at a slightly more complex program:
-#+begin_src text
-34 35 + 70 swap -
-#+end_src
- 34 (integer) pushed
- 35 (integer) pushed
- ~+~ primitive:
-  - As stated previously, the final state of this primitive gives us
-    the name ~v1~ on the stack with the association
-    ~v1=prim-add(34,35)~.
- 70 (integer) pushed
- ~swap~ primitive:
-  - Requires two values on the stack, but we care little about their
-    types.  Just swaps their order on the stack.
-  - We /could/ introduce generics here to make the input/output
-    relation ship explicit (forall T, U swap:-(-> (T U) (U T))), but
-    at the same time we can just as easily get away with a type hole
-    (essentially some kind of ~any~).  Up to debate.
-  - We do not generate IR for this primitive as it simply isn't
-    necessary.  Instead we perform the swap on our IR stack and
-    continue.  The ~swap~ primitive is "transparent" in the final IR.
-  - In this situation, the stack goes from ~[v1, 70]~ to
-    ~[70, v1]~
- ~-~ primitive:
-  - Type checks the top two values of the stack (which are both
-    integers)
-  - ~a b -~ should correspond to ~a - b~, thus the corresponding IR
-    expression should be ~prim-sub(70,v1)~
-  - Associate IR expression with name ~v2~,
-  - Push ~v2~ onto the stack.
-
-The final state of the IR should be:
- Stack: ~[v2]~
- Bindings:
-  - ~v1=prim-add(34,35)~
-  - ~v2=prim-sub(70,v1)~
-
-Notice how some primitives generate IR, while others manipulate IR
-themselves?  They almost seem like macros!
-
-Another thing of note is how the final state of the stack is a single
-item in this case; an IR expression representing the entire program.
-When we introduce code level bindings we won't have such nice outputs,
-but it is certainly something to consider.
-**** Hello world! example
-For our hello world:
-#+begin_src text
-"Hello, world!\n" putstr
-#+end_src
- "Hello, world!\n" (string) pushed
- "putstr" primitive:
-  - Type check the top of the stack (should be a string)
-  - Generate IR ~prim-putstr("Hello, world!\n")~
-  - Associate with name ~v1~ and push it onto the stack
-
-Much simpler than our
-*** TODO IR level type checking
-During IR compilation, the following should be type checked:
- use of callables (primitives, user defined when implemented)
- variable assignment (when implemented)
- variable use (when implemented)
- definition of callables (when implemented)
-
-We want to ensure no statement is unsound.
-**** TODO Primitive types
-Define the primitive types of the IR.  Remember, simplicity is key,
-but we need to mirror what we're getting on the ARL side.
-**** TODO Type contracts for callables
-Define how we can type check arguments on the stack against the types
-a callable expects for its inputs.  In the same vein, we also need to
-figure out the type of whatever is pushed onto the stack by the
-callable.
-*** TODO Use SSA for user level bindings
-[[https://en.wikipedia.org/wiki/Static_single-assignment_form][Static
-single-assignment form]] is something we should use when we introduce
-for user level bindings.
+These stack effects work in tandem with our type analysis.  Stack
+shape analysis tells us what operands are being fed into primitives,
+while the type analysis will tell us if the operands are well formed
+for the primitives.
 ** TODO Code generator
-[[file:src/arl/target-c/]]
+[[file:src/codegen/]]
+[[file:include/arl/codegen/]]

-This should take the IR translated from the AST generated by the
-parser, and write equivalent C code.
+This should take the AST generated by the parser (which should already
+have been analysed), and write equivalent C code.
+** TODO Target compilation
+[[file:src/target/]]
+[[file:include/arl/target/]]

-After we've generated the C code, we need to call a C compiler on it
-to generate a binary.  GCC and Clang allow passing source code through
-stdin, so we don't even need to write to disk first which is nice.
+=gcc= and =clang= take C code via /stdin/, so we don't need to write
+the C code to disk - we can just leave it as a buffer of bytes.  So
+we'll call the compilers and feed the generated code from the previous
+stage into it via stdin.
--- a/examples/hello-world.arl
+++ b/examples/hello-world.arl
@@ -1 +1 @@
-"Hello, world!\n" putstr
+"Hello, world!\n" puts
--- a/extensions/arl-mode.el
+++ b/extensions/arl-mode.el
@@ -0,0 +1,37 @@
+;;; arl-mode.el --- ARL mode for Emacs               -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2026  Aryadev Chavali
+
+;; Author: Aryadev Chavali <aryadev@aryadevchavali.com>
+;; Keywords:
+
+;; Copyright (C) 2026 Aryadev Chavali
+
+;; This program is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+;; FOR A PARTICULAR PURPOSE.  See the MIT License for details.
+
+;; You may distribute and modify this code under the terms of the MIT License,
+;; which you should have received a copy of along with this program.  If not,
+;; please go to <https://opensource.org/license/MIT>.
+
+;;; Commentary:
+
+;;
+
+;;; Code:
+
+(defvar arl-mode-comments '(?\; ";;" ("#|" . "|#")))
+(defvar arl-mode-keywords '("if" "then" "else"))
+(defvar arl-mode-expressions '(("\".*\"" . font-lock-string-face)))
+(defvar arl-mode-automode-list '("\\.arl"))
+
+(define-derived-mode arl-mode
+  arl-mode-comments
+  arl-mode-keywords
+  arl-mode-expressions
+  arl-mode-automode-list
+  nil)
+
+(provide 'arl-mode)
+;;; arl-mode.el ends here
--- a/include/arl/cli.h
+++ b/include/arl/cli.h
@@ -0,0 +1,31 @@
+/* cli.h: CLI helpers
+ * Created: 2026-01-29
+ * Author: Aryadev Chavali
+ * License: See end of file
+ * Commentary:
+ */
+
+#ifndef CLI_H
+#define CLI_H
+
+#include <stdio.h>
+
+#include <arl/lib/sv.h>
+
+int read_file(const char *filename, sv_t *ret);
+int read_pipe(FILE *pipe, sv_t *ret);
+void usage(FILE *fp);
+
+#endif
+
+/* Copyright (C) 2026 Aryadev Chavali
+
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ * FOR A PARTICULAR PURPOSE.  See the MIT License for details.
+
+ * You may distribute and modify this code under the terms of the MIT License,
+ * which you should have received a copy of along with this program.  If not,
+ * please go to <https://opensource.org/license/MIT>.
+
+ */
--- a/include/arl/lexer/token.h
+++ b/include/arl/lexer/token.h
@@ -25,7 +25,7 @@ typedef enum
 /// Known symbols which later stages would benefit from.
 typedef enum
 {
-  TOKEN_KNOWN_PUTSTR,
+  TOKEN_KNOWN_PUTS,
  NUM_TOKEN_KNOWNS,
 } token_known_t;

--- a/src/cli.c
+++ b/src/cli.c
@@ -0,0 +1,83 @@
+/* cli.c:
+ * Created: 2026-01-29
+ * Author: Aryadev Chavali
+ * License: See end of file
+ * Commentary: See /include/arl/cli.h
+ */
+
+#include <stdlib.h>
+#include <string.h>
+
+#include <arl/cli.h>
+#include <arl/lib/vec.h>
+
+int read_file(const char *filename, sv_t *ret)
+{
+  // NOTE: Stupidly simple.  Presumes the file is NOT three pipes in a trench
+  // coat.
+  FILE *fp = fopen(filename, "rb");
+  if (!fp)
+    return 1;
+
+  fseek(fp, 0, SEEK_END);
+  ret->size = ftell(fp);
+  fseek(fp, 0, SEEK_SET);
+  ret->data = calloc(1, ret->size + 1);
+  fread(ret->data, ret->size, 1, fp);
+  fclose(fp);
+
+  ret->data[ret->size] = '\0';
+  return 0;
+}
+
+int read_pipe(FILE *pipe, sv_t *ret)
+{
+  // NOTE: We can't read an entire pipe at once like we did for read_file.  So
+  // let's read in buffered chunks, with a vector to keep them contiguous.
+  vec_t contents = {0};
+  char buffer[1024];
+  while (!feof(pipe))
+  {
+    size_t bytes_read = fread(buffer, 1, sizeof(buffer), pipe);
+    vec_append(&contents, buffer, bytes_read);
+  }
+
+  ret->size = contents.size;
+  // Get that null terminator in, but only after we've recorded the actual size
+  // of what's been read.
+  vec_append_byte(&contents, '\0');
+
+  if (contents.not_inlined)
+  {
+    // Take the heap pointer from us.
+    ret->data = vec_data(&contents);
+  }
+  else
+  {
+    // vec_data(&contents) is stack allocated; can't carry that out of this
+    // function!
+    ret->data = calloc(1, contents.size);
+    memcpy(ret->data, vec_data(&contents), contents.size);
+  }
+  return 0;
+}
+
+void usage(FILE *fp)
+{
+  fprintf(fp, "Usage: arl [FILE]\n"
+              "Compiles [FILE] as ARL source code.\n"
+              "  [FILE]: File to compile.\n"
+              "If FILE is \"--\", then read from stdin.\n");
+}
+
+/* Copyright (C) 2026 Aryadev Chavali
+
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ * FOR A PARTICULAR PURPOSE.  See the MIT License for details.
+
+ * You may distribute and modify this code under the terms of the MIT License,
+ * which you should have received a copy of along with this program.  If not,
+ * please go to <https://opensource.org/license/MIT>.
+
+ */
--- a/src/lexer/lexer.c
+++ b/src/lexer/lexer.c
@@ -2,7 +2,7 @@
 * Created: 2026-01-22
 * Author: Aryadev Chavali
 * License: See end of file
- * Commentary: See /include/arl/lexr/lexr.h
+ * Commentary: See /include/arl/lexer/lexer.h
 */

 #include <ctype.h>
--- a/src/lexer/token.c
+++ b/src/lexer/token.c
@@ -13,8 +13,8 @@ const char *token_known_to_cstr(token_known_t known)
 {
  switch (known)
  {
-  case TOKEN_KNOWN_PUTSTR:
-    return "putstr";
+  case TOKEN_KNOWN_PUTS:
+    return "puts";
  default:
    FAIL("Unexpected TOKEN_KNOWN value: %d\n", known);
  }
--- a/src/lib/vec.c
+++ b/src/lib/vec.c
@@ -2,7 +2,7 @@
 * Created: 2026-01-22
 * Author: Aryadev Chavali
 * License: See end of file
- * Commentary:
+ * Commentary: See /include/arl/lib/vec.h

 Taken from prick_vec.h: see https://github.com/oreodave/prick.
 */
--- a/src/main.c
+++ b/src/main.c
@@ -18,64 +18,7 @@
 #include <arl/lib/sv.h>
 #include <arl/lib/vec.h>

-int read_file(const char *filename, sv_t *ret)
-{
-  // NOTE: Stupidly simple.  Presumes the file is NOT three pipes in a trench
-  // coat.
-  FILE *fp = fopen(filename, "rb");
-  if (!fp)
-    return 1;
-
-  fseek(fp, 0, SEEK_END);
-  ret->size = ftell(fp);
-  fseek(fp, 0, SEEK_SET);
-  ret->data = calloc(1, ret->size + 1);
-  fread(ret->data, ret->size, 1, fp);
-  fclose(fp);
-
-  ret->data[ret->size] = '\0';
-  return 0;
-}
-
-int read_pipe(FILE *pipe, sv_t *ret)
-{
-  // NOTE: We can't read an entire pipe at once like we did for read_file.  So
-  // let's read in buffered chunks, with a vector to keep them contiguous.
-  vec_t contents = {0};
-  char buffer[1024];
-  while (!feof(pipe))
-  {
-    size_t bytes_read = fread(buffer, 1, sizeof(buffer), pipe);
-    vec_append(&contents, buffer, bytes_read);
-  }
-
-  ret->size = contents.size;
-  // Get that null terminator in, but only after we've recorded the actual size
-  // of what's been read.
-  vec_append_byte(&contents, '\0');
-
-  if (contents.not_inlined)
-  {
-    // Take the heap pointer from us.
-    ret->data = vec_data(&contents);
-  }
-  else
-  {
-    // vec_data(&contents) is stack allocated; can't carry that out of this
-    // function!
-    ret->data = calloc(1, contents.size);
-    memcpy(ret->data, vec_data(&contents), contents.size);
-  }
-  return 0;
-}
-
-void usage(FILE *fp)
-{
-  fprintf(fp, "Usage: arl [FILE]\n"
-              "Compiles [FILE] as ARL source code.\n"
-              "  [FILE]: File to compile.\n"
-              "If FILE is \"--\", then read from stdin.\n");
-}
+#include <arl/cli.h>

 int main(int argc, char *argv[])
 {
@@ -127,11 +70,11 @@ int main(int argc, char *argv[])
    goto end;
  }

-  LOG("Lexed %lu tokens\n", tokens.vec.size / sizeof(token_t));
 #if VERBOSE_LOGS
+  LOG("Lexed %lu tokens ", tokens.vec.size / sizeof(token_t));
  token_stream_print(stdout, &tokens);
-#endif
  printf("\n");
+#endif

 end:
  if (contents.data)
Author	SHA1	Message	Date
Aryadev Chavali	00786d9cb9	Simple arl-mode plugin for Emacs	2026-02-01 19:25:13 +00:00
Aryadev Chavali	b391fe9a74	lexer/token: update tokeniser to recognise puts	2026-01-29 15:49:20 +00:00
Aryadev Chavali	12e82e64d0	examples/hello-world: putstr -> puts	2026-01-29 15:40:30 +00:00
Aryadev Chavali	9b55b9ec32	arl.org: rewrite parser bit	2026-01-29 05:42:16 +00:00
Aryadev Chavali	6545bd1302	dir-locals: make examples by default	2026-01-29 05:19:51 +00:00
Aryadev Chavali	c11b69092d	arl.org: massive updates	2026-01-29 05:19:40 +00:00
Aryadev Chavali	82b96e23d5	Adjusted README	2026-01-29 04:18:07 +00:00
Aryadev Chavali	0fdfbef1de	Makefile: added recipe to run examples	2026-01-29 04:17:53 +00:00
Aryadev Chavali	6f6b747540	lib/vec \| lexer: Clean up comments	2026-01-29 04:13:41 +00:00
Aryadev Chavali	321b06ca0d	Makefile: modules is now dynamically found	2026-01-29 04:12:46 +00:00
Aryadev Chavali	d46ee32775	main: split off reading and usage function to its own unit	2026-01-29 04:12:33 +00:00