242 lines
11 KiB
Org Mode
242 lines
11 KiB
Org Mode
#+title: Alisp
|
|
#+author: Aryadev Chavali
|
|
#+date: 2025-08-20
|
|
#+filetags: :alisp:
|
|
|
|
* Tasks
|
|
** String views :strings:
|
|
[[file:include/alisp/sv.h::/// String Views]]
|
|
*** DONE sv_substr
|
|
Takes an index and a size, returns a string view to that substring.
|
|
*** DONE sv_chop_left and sv_chop_right
|
|
Super obvious.
|
|
*** TODO Design Strings for the Lisp :api:
|
|
We have ~sv_t~ so our basic C API is done. We just need pluggable
|
|
functions to construct and deconstruct strings as lisps.
|
|
** Reader system :reader:
|
|
We need to design a reader system. The big idea: given a "stream" of
|
|
data, we can break out expressions from it. An expression could be
|
|
either an atomic value or a container.
|
|
|
|
The natural method is doing this one at a time (the runtime provides a
|
|
~read~ function to do this), but we can also convert an entire stream
|
|
into expressions by consuming it fully. So the principle function
|
|
here is ~read: stream -> expr~.
|
|
*** DONE Design streams
|
|
A stream needs to be able to provide characters for us to interpret in
|
|
our parsing. Lisp is an LL(1) grammar so we only really need one
|
|
character lookup, but seeking is very useful.
|
|
|
|
A stream could represent a file (using a FILE pointer), an IO stream
|
|
(again using FILE pointer but something that could yield interminable
|
|
data), or just a string. We need to be able to encode all of these as
|
|
streams.
|
|
|
|
If it's a string, we can just read the entire thing as memory and work
|
|
from there. If it's a seekable FILE pointer (i.e. we can technically
|
|
do random access), just use MMAP to read the thing into memory. If
|
|
it's a non-seekable FILE pointer, we'll need to read a chunk at a
|
|
time. We'll have a vector that caches the data as we read it maybe,
|
|
allowing us to do random access, but only read chunks as and when
|
|
required.
|
|
|
|
Since they're all differing interfaces, we'll need an abstraction so
|
|
parsing isn't as concerned with the specifics of the underlying data
|
|
stream. We can use a tagged union of data structures representing the
|
|
different underlying stream types, then generate abstract functions
|
|
that provide common functionality.
|
|
|
|
2025-08-29: A really basic interface that makes the parse stage a bit
|
|
easier. We're not going to do anything more advanced than the API
|
|
i.e. no parsing.
|
|
**** DONE Design the tagged union
|
|
**** DONE Design the API
|
|
*** DONE Design what a "parser function" would look like
|
|
The general function is something like ~stream -> T | Err~. What
|
|
other state do we need to encode?
|
|
*** DONE Write a parser for integers
|
|
*** DONE Write a parser for symbols
|
|
*** DONE Write a parser for lists
|
|
*** DONE Write a parser for vectors
|
|
*** TODO Write a parser for strings
|
|
Requires [[*Design Strings for the Lisp]] to be complete first.
|
|
*** TODO Write the general parser
|
|
** Design :design:
|
|
*** TODO Design Big Integers :api:
|
|
We currently have 62 bit integers implemented via immediate values
|
|
embedded in a pointer. We need to be able to support even _bigger_
|
|
integers. How do we do this?
|
|
*** TODO Capitalise symbols (TBD) :optimisation:
|
|
Should we capitalise symbols? This way, we limit the symbol table's
|
|
possible options a bit (potentially we could design a better hashing
|
|
algorithm?) and it would be kinda like an actual Lisp.
|
|
*** TODO Consider reader macros :reader:
|
|
Common Lisp has so-called "reader macros" which allows users to write
|
|
Lisp code that affects further Lisp code reading. It's quite
|
|
powerful.
|
|
|
|
Scheme doesn't have it. Should we implement this?
|
|
** Allocator :allocator:
|
|
*** Some definitions
|
|
- Managed objects are allocations that are generated as part of
|
|
evaluating user code i.e. strings, vectors, conses that are all made
|
|
as part of evaluating code.
|
|
- Unmanaged objects are allocations we do as part of the runtime.
|
|
These are things that we expect to have near infinite lifetimes
|
|
(such as the symbol table, vector of allocated objects, etc).
|
|
*** DONE Design an allocator
|
|
We'll need an allocator for all our managed objects. Requirements:
|
|
- Stable pointers (memory that has already been allocated should be
|
|
free to utilise via the same pointer for the lifetime of the
|
|
allocator)
|
|
- Able to tag allocations as unused (i.e. "free") and able to reuse
|
|
these allocations
|
|
- This will link into the garbage collector, which should yield a
|
|
sequence of objects that were previously tagged as unfree and
|
|
should be "freed".
|
|
- Able to allocate all the managed types we have
|
|
**** DONE Design allocation data structures
|
|
**** DONE Design allocation methods for different lisp types
|
|
- Strings (when implemented)
|
|
***** DONE Conses
|
|
***** DONE Vectors
|
|
**** DONE Design allocation freeing method
|
|
*** TODO Design garbage collection scheme :gc:
|
|
Really, regardless of what I do, we need to have some kind of garbage
|
|
collection header on whatever managed objects we allocate. We need to
|
|
perform garbage collection against the managed objects, and leave the
|
|
unmanaged objects to the runtime.
|
|
**** TODO Mark stage
|
|
We need to mark all objects that are currently accessible from the
|
|
environment. This means we need to have a root environment which we
|
|
mark all our accessible objects from. Any objects that aren't marked
|
|
by this obviously are inaccessible, so we can then sweep them.
|
|
|
|
How do we store this mark on our managed objects? I think the
|
|
simplest approach would be to allocate an extra 8 bytes just before
|
|
any managed object we allocate i.e. [8 byte buffer] <object>. Then,
|
|
during the mark phase, we can walk back those 8 bytes and
|
|
inspect/mutate the mark.
|
|
**** TODO Sweep
|
|
Once we've marked all objects that are accessible, we need to
|
|
investigate all the objects that aren't. We do have
|
|
[[file:alisp.h::vec_t memory;][this]] which provides a global map of
|
|
all the stuff we've allocated so far ([[file:alisp.h::void
|
|
sys_register(sys_t *, lisp_t *);][sys_register]] is used to add to
|
|
this, and any managed object is expected to register).
|
|
|
|
We can iterate through the map and collect all the unmarked objects.
|
|
What do we do with these?
|
|
|
|
1) They are technically freestanding objects allocated through
|
|
~calloc~, so we could just free them.
|
|
2) Manage some collection of previous allocations to reuse in our next
|
|
allocation.
|
|
|
|
Option (1) is obvious and relatively clean to setup in our current
|
|
idea:
|
|
- Say at index I we have an object that is unmarked
|
|
- Free the associated object at index I
|
|
- Swap the end of the array with the cell at index I, then decrement
|
|
the size of the container
|
|
|
|
This is an O(1) time operation.
|
|
|
|
Option (2) is also relatively straightforward, but we need another
|
|
counter in order to make it work:
|
|
- Say at index I we have an object that is unmarked
|
|
- Swap the end of the array with the cell at index I, then decrement
|
|
the size of the container
|
|
**** TODO Use previous allocations if they're free to use
|
|
This way, instead of deleting the memory or forgetting about it, we
|
|
can reuse it. We need to be really careful to make sure our ref(X) is
|
|
actually precise, we don't want to trample on the user's hard work.
|
|
|
|
If we implement our "free cells" as a linked list, we'll essentially
|
|
need to take items out of it when we decide to set it back up in the
|
|
system. Similarly, if we classify something as unused during the
|
|
sweep, we can add it to the free linked list.
|
|
|
|
Question: should this be separate linked lists for each container type
|
|
(i.e. one for conses, one for vectors) or just one big one? The main
|
|
task for these free lists is just "can I get a cell or nah?". We'll
|
|
analyse the time complexity of this task
|
|
|
|
Former approach time complexity:
|
|
- O(1) time to get a free cell since we just need to check the first
|
|
item of the relevant free list (or if it's NIL, we know already)
|
|
- O(1) worst case time if there isn't a free cell
|
|
|
|
Latter approach time complexity:
|
|
- Since we have ~get_tag~ it's O(1) time to check the type of the
|
|
container.
|
|
- Therefore, it would be worst case O(n) if the cell type we need is
|
|
only at the end of the list, or if there isn't any cell of the type
|
|
we need.
|
|
|
|
Former approach is better time complexity wise, but latter is way
|
|
better in terms of simplicity of code. Must deliberate.
|
|
** Unit tests :tests:
|
|
*** TODO Test streams :streams:
|
|
**** DONE Test file init
|
|
[[file:test/test_stream.c::void stream_test_file(void)]]
|
|
***** DONE Test successful init from real files
|
|
Ensure stream_size is 0 i.e. we don't read anything on creation.
|
|
Also ensure stream_eoc is false.
|
|
***** DONE Test failed init from fake files
|
|
**** DONE Test peeking and next
|
|
[[file:test/test_stream.c::void stream_test_peek_next(void)]]
|
|
- Peeking with bad streams ('\0' return)
|
|
- Peeking with good streams (no effect on position)
|
|
- Next with bad streams ('\0' return, no effect on position)
|
|
- Next with good streams (effects position)
|
|
- Peeking after next (should just work)
|
|
**** DONE Test seeking
|
|
[[file:test/test_stream.c::void stream_test_seek(void)]]
|
|
- Seeking forward/backward on a bad stream (should stop at 0)
|
|
- Seeking forward/backward too far (should clamp)
|
|
- Seeking forward/backward zero sum via relative index (stream_seek)
|
|
**** DONE Test substring
|
|
[[file:test/test_stream.c::void stream_test_substr(void)]]
|
|
- Substr on bad stream (NULL sv)
|
|
- Substr on bad position/size (NULL sv)
|
|
- Substr relative/absolute (good SV)
|
|
**** TODO Test till
|
|
[[file:test/test_stream.c::void stream_test_till(void)]]
|
|
- till on a bad stream (NULL SV)
|
|
- till on an ended stream (NULL SV)
|
|
- till on a stream with no items in search string (eoc)
|
|
- till on a stream with all items in search string (no effect)
|
|
- till on a stream with prefix being all search string (no effect)
|
|
- till on a stream with suffix being all search string (stops at
|
|
suffix)
|
|
**** TODO Test while
|
|
[[file:test/test_stream.c::void stream_test_while(void)]]
|
|
- while on a bad stream (NULL SV)
|
|
- while on an ended stream (NULL SV)
|
|
- while on a stream with no items in search string (no effect)
|
|
- while on a stream with all items in search string (eoc)
|
|
- while on a stream with prefix being all search string (effect)
|
|
- while on a stream with suffix being all search string (no effect)
|
|
**** TODO Test line_col
|
|
[[file:test/test_stream.c::void stream_test_line_col(void)]]
|
|
- line_col on bad stream (no effect on args)
|
|
- line_col on eoc stream (should go right to the end)
|
|
- line_col on random points in a stream
|
|
*** TODO Test reader :reader:
|
|
*** DONE Test system registration of allocated units
|
|
In particular, does clean up work as we expect? Do we have situations
|
|
where we may double free or not clean up something we should've?
|
|
** Completed
|
|
*** DONE Test value constructors and destructors :test:
|
|
Test if ~make_int~ works with ~as_int,~ ~intern~ with ~as_sym~.
|
|
Latter will require a symbol table.
|
|
*** DONE Test containers constructors and destructors :test:
|
|
Test if ~make_vec~ works with ~as_vec~, ~cons~ with ~as_cons~ AND
|
|
~CAR~, ~CDR~.
|
|
|
|
We may need to think of effective ways to deal with NILs in ~car~ and
|
|
~cdr~. Maybe make functions as well as the macros so I can choose
|
|
between them?
|
|
*** DONE Write more tests
|