* tests: split of symtable testing into its own suite makes sense to be there, not in the lisp API * tests: Added string view suite sv_copy is the only function, but we may have others later. * tests: Meaningful and pretty logging for tests * tests: slight cleanliness * tests: c23 allows you to inline stack allocated arrays in struct decls * test: Added definition to make default testing less verbose TEST_VERBOSE is a preprocesser directive which TEST is dependent on. By default it is 0, in which case TEST simply fails if the condition is not true. Otherwise, a full log (as done previously) is made. * Makefile: added mode flag for full logs MODE=full will initialise a debug build with all logs, including test logs. Otherwise, MODE=debug just sets up standard debug build with main logs but no testing logs. MODE=release optimises and strips all logs. * tests: fix size of LISP_API_SUITE tests * test_lisp_api: int_test -> smi_test, added smi_oob_test * test_lisp_api: sym_test -> sym_fresh_test * test_lisp_api: added sym_unique_test * alisp.org: Added some tasks * symtable: sym_table_cleanup -> sym_table_free * lisp: split off lisp_free as it's own function lisp_free will do a shallow clean of any object, freeing its associated memory. It won't recur through any containers, nor will it freakout if you give it something that is constant (symbols, small integers, NIL, etc). * test_lisp_api: added sys_test * test_stream: basic skeleton * test_stream: implement stream_test_string * test_stream: Enable only stream_test_string * tests: enable STREAM_SUITE * sv: fix possible runtime issue with NULL SV's in sv_copy * alisp.org: add TODOs for all the tests required for streams * tests: Better suite creation While the previous method of in-lining a stack allocated array of tests into the suite struct declaration was nice, we had to update size manually. This macro will allow us to just append new tests to the suite without having to care for that. It generates a uniquely named variable for the test array, then uses that test array in the suite declaration. Nice and easy. * test: TEST_INIT macro as a prologue for any unit test * main: Put all variable declarations at start of main to ensure decl There is a chance that /end/ is jumped to without the FILE pointer or stream actually being declared. This deals with that. * stream: Make stream name a constant cstr We don't deal with the memory for it anyway. * stream: do not initialise file streams with a non-empty vector Because of the not_inlined trick, a 0 initialised SBO vector is completely valid to use if required. Future /vec_ensure/'s will deal with it appropriately. So there's no need to initialise the vector ahead of time like this. * test_stream: implement stream_test_file We might need to setup a prelude for initialising a file in the filesystem for testing here - not only does stream_test_file need it, but I see later tests requiring an equivalence check for files and strings (variants of a stream). * test_stream: setup prologue and epilogue as fake tests in the suite Standard old test functions, but they don't call TEST_INIT or TEST_PASSED. They're placed at the start and at the end of the test array. Those macros just do printing anyway, so they're not necessary. * tests: TEST_INIT -> TEST_START, TEST_PASSED -> TEST_END * tests: TEST_START only logs if TEST_VERBOSE is enabled. * test_lisp_api: "cons'" -> "conses" * alisp.org: Mark off completed stream_test_file * test_stream: implement stream_test_peek_next * test_stream: randomise filename Just to make sure it's not hardcoded or anything. * test_stream: make filename bigger, and increase the random alphabet * test: seed random number generator * test_stream: don't write null terminator to mock file * stream: stream_seek will do clamped movement if offset is invalid If a forward/backward offset is too big, we'll clamp to the edges of the file rather than failing completely. We return the number of bytes moved so callers can still validate, but the stream API can now deal with these situations a bit more effectively. * test_stream: implement stream_test_seek * stream: ensure stream_stop resets the FILE pointer if STREAM_TYPE_FILE * stream: stream_substr's call to stream_seek_forward refactored Following stream_seek_forward's own refactor, where we get offsets back instead of just a boolean, we should verify that offset. * main: put stream_stop before FILE pointer close As stream_stop requires a valid FILE pointer (fseek), we need to do it before we close the pipe. * test_vec: vec_test_substr -> vec_test_gen_substr * test_stream: implement stream_test_substr * alisp: add TODO for sv_t
217 lines
9.5 KiB
Org Mode
217 lines
9.5 KiB
Org Mode
#+title: Alisp
|
|
#+author: Aryadev Chavali
|
|
#+date: 2025-08-20
|
|
#+filetags: :alisp:
|
|
|
|
* Tasks
|
|
** WIP Reader system
|
|
We need to design a reader system. The big idea: given a "stream" of
|
|
data, we can break out expressions from it. An expression could be
|
|
either an atomic value or a container.
|
|
|
|
The natural method is doing this one at a time (the runtime provides a
|
|
~read~ function to do this), but we can also convert an entire stream
|
|
into expressions by consuming it fully. So the principle function
|
|
here is ~read: stream -> expr~.
|
|
*** DONE Design streams
|
|
A stream needs to be able to provide characters for us to interpret in
|
|
our parsing. Lisp is an LL(1) grammar so we only really need one
|
|
character lookup, but seeking is very useful.
|
|
|
|
A stream could represent a file (using a FILE pointer), an IO stream
|
|
(again using FILE pointer but something that could yield interminable
|
|
data), or just a string. We need to be able to encode all of these as
|
|
streams.
|
|
|
|
If it's a string, we can just read the entire thing as memory and work
|
|
from there. If it's a seekable FILE pointer (i.e. we can technically
|
|
do random access), just use MMAP to read the thing into memory. If
|
|
it's a non-seekable FILE pointer, we'll need to read a chunk at a
|
|
time. We'll have a vector that caches the data as we read it maybe,
|
|
allowing us to do random access, but only read chunks as and when
|
|
required.
|
|
|
|
Since they're all differing interfaces, we'll need an abstraction so
|
|
parsing isn't as concerned with the specifics of the underlying data
|
|
stream. We can use a tagged union of data structures representing the
|
|
different underlying stream types, then generate abstract functions
|
|
that provide common functionality.
|
|
|
|
2025-08-29: A really basic interface that makes the parse stage a bit
|
|
easier. We're not going to do anything more advanced than the API
|
|
i.e. no parsing.
|
|
**** DONE Design the tagged union
|
|
**** DONE Design the API
|
|
*** DONE Design what a "parser function" would look like
|
|
The general function is something like ~stream -> T | Err~. What
|
|
other state do we need to encode?
|
|
*** TODO Write a parser for integers
|
|
*** TODO Write a parser for symbols
|
|
*** TODO Write a parser for lists
|
|
*** TODO Write a parser for vectors
|
|
*** TODO Write the general parser
|
|
** WIP Unit tests :tests:
|
|
*** TODO Test streams
|
|
**** DONE Test file init
|
|
[[file:test/test_stream.c::void stream_test_file(void)]]
|
|
***** DONE Test successful init from real files
|
|
Ensure stream_size is 0 i.e. we don't read anything on creation.
|
|
Also ensure stream_eoc is false.
|
|
***** DONE Test failed init from fake files
|
|
**** DONE Test peeking and next
|
|
[[file:test/test_stream.c::void stream_test_peek_next(void)]]
|
|
- Peeking with bad streams ('\0' return)
|
|
- Peeking with good streams (no effect on position)
|
|
- Next with bad streams ('\0' return, no effect on position)
|
|
- Next with good streams (effects position)
|
|
- Peeking after next (should just work)
|
|
**** DONE Test seeking
|
|
[[file:test/test_stream.c::void stream_test_seek(void)]]
|
|
- Seeking forward/backward on a bad stream (should stop at 0)
|
|
- Seeking forward/backward too far (should clamp)
|
|
- Seeking forward/backward zero sum via relative index (stream_seek)
|
|
**** DONE Test substring
|
|
[[file:test/test_stream.c::void stream_test_substr(void)]]
|
|
- Substr on bad stream (NULL sv)
|
|
- Substr on bad position/size (NULL sv)
|
|
- Substr relative/absolute (good SV)
|
|
**** TODO Test till
|
|
[[file:test/test_stream.c::void stream_test_till(void)]]
|
|
- till on a bad stream (NULL SV)
|
|
- till on an ended stream (NULL SV)
|
|
- till on a stream with no items in search string (eoc)
|
|
- till on a stream with all items in search string (no effect)
|
|
- till on a stream with prefix being all search string (no effect)
|
|
- till on a stream with suffix being all search string (stops at
|
|
suffix)
|
|
**** TODO Test while
|
|
[[file:test/test_stream.c::void stream_test_while(void)]]
|
|
- while on a bad stream (NULL SV)
|
|
- while on an ended stream (NULL SV)
|
|
- while on a stream with no items in search string (no effect)
|
|
- while on a stream with all items in search string (eoc)
|
|
- while on a stream with prefix being all search string (effect)
|
|
- while on a stream with suffix being all search string (no effect)
|
|
**** TODO Test line_col
|
|
[[file:test/test_stream.c::void stream_test_line_col(void)]]
|
|
- line_col on bad stream (no effect on args)
|
|
- line_col on eoc stream (should go right to the end)
|
|
- line_col on random points in a stream
|
|
*** DONE Test system registration of allocated units
|
|
In particular, does clean up work as we expect? Do we have situations
|
|
where we may double free or not clean up something we should've?
|
|
** Backlog
|
|
*** TODO Design Big Integers
|
|
We currently have 62 bit integers implemented via immediate values
|
|
embedded in a pointer. We need to be able to support even _bigger_
|
|
integers. How do we do this?
|
|
*** TODO Design garbage collection scheme :design:gc:
|
|
Really, regardless of what I do, we need to have some kind of garbage
|
|
collection header on whatever managed objects we allocate.
|
|
|
|
Firstly, the distinction between managed and unmanaged objects:
|
|
- Managed objects are allocations that are generated as part of
|
|
evaluating user code i.e. strings, vectors, conses that are all made
|
|
as part of evaluating code.
|
|
- Unmanaged objects are allocations we do as part of the runtime.
|
|
These are things that we expect to have near infinite lifetimes
|
|
(such as the symbol table, vector of allocated objects, etc).
|
|
|
|
We need to perform garbage collection against the managed objects, and
|
|
leave the unmanaged objects to the runtime.
|
|
**** TODO Mark stage
|
|
We need to mark all objects that are currently accessible from the
|
|
environment. This means we need to have a root environment which we
|
|
mark all our accessible objects from. Any objects that aren't marked
|
|
by this obviously are inaccessible, so we can then sweep them.
|
|
|
|
How do we store this mark on our managed objects? I think the
|
|
simplest approach would be to allocate an extra 8 bytes just before
|
|
any managed object we allocate i.e. [8 byte buffer] <object>. Then,
|
|
during the mark phase, we can walk back those 8 bytes and
|
|
inspect/mutate the mark.
|
|
**** TODO Sweep
|
|
Once we've marked all objects that are accessible, we need to
|
|
investigate all the objects that aren't. We do have
|
|
[[file:alisp.h::vec_t memory;][this]] which provides a global map of
|
|
all the stuff we've allocated so far ([[file:alisp.h::void
|
|
sys_register(sys_t *, lisp_t *);][sys_register]] is used to add to
|
|
this, and any managed object is expected to register).
|
|
|
|
We can iterate through the map and collect all the unmarked objects.
|
|
What do we do with these?
|
|
|
|
1) They are technically freestanding objects allocated through
|
|
~calloc~, so we could just free them.
|
|
2) Manage some collection of previous allocations to reuse in our next
|
|
allocation.
|
|
|
|
Option (1) is obvious and relatively clean to setup in our current
|
|
idea:
|
|
- Say at index I we have an object that is unmarked
|
|
- Free the associated object at index I
|
|
- Swap the end of the array with the cell at index I, then decrement
|
|
the size of the container
|
|
|
|
This is an O(1) time operation.
|
|
|
|
Option (2) is also relatively straightforward, but we need another
|
|
counter in order to make it work:
|
|
- Say at index I we have an object that is unmarked
|
|
- Swap the end of the array with the cell at index I, then decrement
|
|
the size of the container
|
|
**** TODO Use previous allocations if they're free to use
|
|
This way, instead of deleting the memory or forgetting about it, we
|
|
can reuse it. We need to be really careful to make sure our ref(X) is
|
|
actually precise, we don't want to trample on the user's hard work.
|
|
|
|
If we implement our "free cells" as a linked list, we'll essentially
|
|
need to take items out of it when we decide to set it back up in the
|
|
system. Similarly, if we classify something as unused during the
|
|
sweep, we can add it to the free linked list.
|
|
|
|
Question: should this be separate linked lists for each container type
|
|
(i.e. one for conses, one for vectors) or just one big one? The main
|
|
task for these free lists is just "can I get a cell or nah?". We'll
|
|
analyse the time complexity of this task
|
|
|
|
Former approach time complexity:
|
|
- O(1) time to get a free cell since we just need to check the first
|
|
item of the relevant free list (or if it's NIL, we know already)
|
|
- O(1) worst case time if there isn't a free cell
|
|
|
|
Latter approach time complexity:
|
|
- Since we have ~get_tag~ it's O(1) time to check the type of the
|
|
container.
|
|
- Therefore, it would be worst case O(n) if the cell type we need is
|
|
only at the end of the list, or if there isn't any cell of the type
|
|
we need.
|
|
|
|
Former approach is better time complexity wise, but latter is way
|
|
better in terms of simplicity of code. Must deliberate.
|
|
*** TODO Design Strings
|
|
We have ~sv_t~ so our basic C API is done. We just need pluggable
|
|
functions to construct and deconstruct strings as lisps.
|
|
*** TODO Capitalise symbols (TBD) :optimisation:design:
|
|
Should we capitalise symbols? This way, we limit the symbol table's
|
|
possible options a bit (potentially we could design a better hashing
|
|
algorithm?) and it would be kinda like an actual Lisp.
|
|
*** TODO sv_t
|
|
[[file:include/alisp/sv.h::/// String Views]]
|
|
**** TODO sv_substr
|
|
Takes an index and a size, returns a string view to that substring.
|
|
**** TODO sv_chop_left and sv_chop_right
|
|
Super obvious.
|
|
** Completed
|
|
*** DONE Test value constructors and destructors :test:
|
|
Test if ~make_int~ works with ~as_int,~ ~intern~ with ~as_sym~.
|
|
Latter will require a symbol table.
|
|
*** DONE Test containers constructors and destructors :test:
|
|
Test if ~make_vec~ works with ~as_vec~, ~cons~ with ~as_cons~ AND
|
|
~CAR~, ~CDR~.
|
|
|
|
We may need to think of effective ways to deal with NILs in ~car~ and
|
|
~cdr~. Maybe make functions as well as the macros so I can choose
|
|
between them?
|
|
*** DONE Write more tests
|