Files
alisp/alisp.org

9.9 KiB

Alisp

Tasks

WIP Reader system

We need to design a reader system. The big idea: given a "stream" of data, we can break out expressions from it. An expression could be either an atomic value or a container.

The natural method is doing this one at a time (the runtime provides a read function to do this), but we can also convert an entire stream into expressions by consuming it fully. So the principle function here is read: stream -> expr.

DONE Design streams

A stream needs to be able to provide characters for us to interpret in our parsing. Lisp is an LL(1) grammar so we only really need one character lookup, but seeking is very useful.

A stream could represent a file (using a FILE pointer), an IO stream (again using FILE pointer but something that could yield interminable data), or just a string. We need to be able to encode all of these as streams.

If it's a string, we can just read the entire thing as memory and work from there. If it's a seekable FILE pointer (i.e. we can technically do random access), just use MMAP to read the thing into memory. If it's a non-seekable FILE pointer, we'll need to read a chunk at a time. We'll have a vector that caches the data as we read it maybe, allowing us to do random access, but only read chunks as and when required.

Since they're all differing interfaces, we'll need an abstraction so parsing isn't as concerned with the specifics of the underlying data stream. We can use a tagged union of data structures representing the different underlying stream types, then generate abstract functions that provide common functionality.

2025-08-29: A really basic interface that makes the parse stage a bit easier. We're not going to do anything more advanced than the API i.e. no parsing.

DONE Design the tagged union
DONE Design the API

DONE Design what a "parser function" would look like

The general function is something like stream -> T | Err. What other state do we need to encode?

DONE Write a parser for integers

DONE Write a parser for symbols

DONE Write a parser for lists

TODO Write a parser for strings

Requires *Design Strings for the Lisp to be complete first.

TODO Write a parser for vectors

WIP Write the general parser

Unit tests   tests

TODO Test streams

DONE Test file init

test/test_stream.c::void stream_test_file(void)

DONE Test successful init from real files

Ensure stream_size is 0 i.e. we don't read anything on creation. Also ensure stream_eoc is false.

DONE Test failed init from fake files
DONE Test peeking and next

test/test_stream.c::void stream_test_peek_next(void)

  • Peeking with bad streams ('\0' return)
  • Peeking with good streams (no effect on position)
  • Next with bad streams ('\0' return, no effect on position)
  • Next with good streams (effects position)
  • Peeking after next (should just work)
DONE Test seeking

test/test_stream.c::void stream_test_seek(void)

  • Seeking forward/backward on a bad stream (should stop at 0)
  • Seeking forward/backward too far (should clamp)
  • Seeking forward/backward zero sum via relative index (stream_seek)
DONE Test substring

test/test_stream.c::void stream_test_substr(void)

  • Substr on bad stream (NULL sv)
  • Substr on bad position/size (NULL sv)
  • Substr relative/absolute (good SV)
TODO Test till

test/test_stream.c::void stream_test_till(void)

  • till on a bad stream (NULL SV)
  • till on an ended stream (NULL SV)
  • till on a stream with no items in search string (eoc)
  • till on a stream with all items in search string (no effect)
  • till on a stream with prefix being all search string (no effect)
  • till on a stream with suffix being all search string (stops at suffix)
TODO Test while

test/test_stream.c::void stream_test_while(void)

  • while on a bad stream (NULL SV)
  • while on an ended stream (NULL SV)
  • while on a stream with no items in search string (no effect)
  • while on a stream with all items in search string (eoc)
  • while on a stream with prefix being all search string (effect)
  • while on a stream with suffix being all search string (no effect)
TODO Test line_col

test/test_stream.c::void stream_test_line_col(void)

  • line_col on bad stream (no effect on args)
  • line_col on eoc stream (should go right to the end)
  • line_col on random points in a stream

DONE Test system registration of allocated units

In particular, does clean up work as we expect? Do we have situations where we may double free or not clean up something we should've?

String views   sv_t

include/alisp/sv.h::/// String Views

DONE sv_substr

Takes an index and a size, returns a string view to that substring.

DONE sv_chop_left and sv_chop_right

Super obvious.

TODO Design Strings for the Lisp   api

We have sv_t so our basic C API is done. We just need pluggable functions to construct and deconstruct strings as lisps.

Design   design

TODO Design Big Integers   api

We currently have 62 bit integers implemented via immediate values embedded in a pointer. We need to be able to support even bigger integers. How do we do this?

TODO Design garbage collection scheme   gc

Really, regardless of what I do, we need to have some kind of garbage collection header on whatever managed objects we allocate.

Firstly, the distinction between managed and unmanaged objects:

  • Managed objects are allocations that are generated as part of evaluating user code i.e. strings, vectors, conses that are all made as part of evaluating code.
  • Unmanaged objects are allocations we do as part of the runtime. These are things that we expect to have near infinite lifetimes (such as the symbol table, vector of allocated objects, etc).

We need to perform garbage collection against the managed objects, and leave the unmanaged objects to the runtime.

TODO Mark stage

We need to mark all objects that are currently accessible from the environment. This means we need to have a root environment which we mark all our accessible objects from. Any objects that aren't marked by this obviously are inaccessible, so we can then sweep them.

How do we store this mark on our managed objects? I think the simplest approach would be to allocate an extra 8 bytes just before any managed object we allocate i.e. [8 byte buffer] <object>. Then, during the mark phase, we can walk back those 8 bytes and inspect/mutate the mark.

TODO Sweep

Once we've marked all objects that are accessible, we need to investigate all the objects that aren't. We do have this which provides a global map of all the stuff we've allocated so far ([[file:alisp.h::void sys_register(sys_t *, lisp_t *);][sys_register]] is used to add to this, and any managed object is expected to register).

We can iterate through the map and collect all the unmarked objects. What do we do with these?

  1. They are technically freestanding objects allocated through calloc, so we could just free them.
  2. Manage some collection of previous allocations to reuse in our next allocation.

Option (1) is obvious and relatively clean to setup in our current idea:

  • Say at index I we have an object that is unmarked
  • Free the associated object at index I
  • Swap the end of the array with the cell at index I, then decrement the size of the container

This is an O(1) time operation.

Option (2) is also relatively straightforward, but we need another counter in order to make it work:

  • Say at index I we have an object that is unmarked
  • Swap the end of the array with the cell at index I, then decrement the size of the container
TODO Use previous allocations if they're free to use

This way, instead of deleting the memory or forgetting about it, we can reuse it. We need to be really careful to make sure our ref(X) is actually precise, we don't want to trample on the user's hard work.

If we implement our "free cells" as a linked list, we'll essentially need to take items out of it when we decide to set it back up in the system. Similarly, if we classify something as unused during the sweep, we can add it to the free linked list.

Question: should this be separate linked lists for each container type (i.e. one for conses, one for vectors) or just one big one? The main task for these free lists is just "can I get a cell or nah?". We'll analyse the time complexity of this task

Former approach time complexity:

  • O(1) time to get a free cell since we just need to check the first item of the relevant free list (or if it's NIL, we know already)
  • O(1) worst case time if there isn't a free cell

Latter approach time complexity:

  • Since we have get_tag it's O(1) time to check the type of the container.
  • Therefore, it would be worst case O(n) if the cell type we need is only at the end of the list, or if there isn't any cell of the type we need.

Former approach is better time complexity wise, but latter is way better in terms of simplicity of code. Must deliberate.

TODO Capitalise symbols (TBD)   optimisation design

Should we capitalise symbols? This way, we limit the symbol table's possible options a bit (potentially we could design a better hashing algorithm?) and it would be kinda like an actual Lisp.

TODO Consider reader macros

Common Lisp has so-called "reader macros" which allows users to write Lisp code that affects further Lisp code reading. It's quite powerful.

Scheme doesn't have it. Should we implement this?

Completed

DONE Test value constructors and destructors   test

Test if make_int works with as_int, intern with as_sym. Latter will require a symbol table.

DONE Test containers constructors and destructors   test

Test if make_vec works with as_vec, cons with as_cons AND CAR, CDR.

We may need to think of effective ways to deal with NILs in car and cdr. Maybe make functions as well as the macros so I can choose between them?

DONE Write more tests