Files
alisp/alisp.org

205 lines
9.3 KiB
Org Mode

#+title: Alisp
#+author: Aryadev Chavali
#+date: 2025-08-20
#+filetags: :alisp:
* Notes
** Overview
~alisp.h~ is a single header for the entire runtime. We'll also have a
compiled shared library ~alisp.so~ which one may link against to get
implementation. That's all that's necessary for one to write C code
that targets our Lisp machine.
We'll have a separate header + library for the compiler since that's
not strictly necessary for transpiled C code to consume. This will
transpile Lisp code into C, which uses the aforementioned ~alisp~
header and library to compile into a native executable.
** WIP How does transpiled code operate?
My current idea is: we're transpiling into C for the actual Lisp code.
User made functions can be transpiled into C functions, which we can
mangle names for. Macros... I don't know, maybe we could have two
function pointer tables so we know how to execute them?
Then, we'll have an associated "descriptor" file which describes the
functions we've transpiled. Bare minimum, this file has to have a
"symbol name" to C mangled function name dictionary. We can also add
other metadata as we need.
*** TODO Deliberate on whether we compile into a shared library or not
If we compile these C code objects into shared libraries, the
descriptor needs to concern itself with code locations. This might be
easier in a sense, since the code will already be compiled.
** WIP How do we call native code?
When we're calling a natively compiled function, we can use this
metadata mapping to call the C function. This native code will use
our Lisp runtime, same as any other code, so it should be pretty
seamless in that regard. But we'll need to set a calling convention
in order to make calling into this seamless from a runtime
perspective.
* Tasks
** TODO Capitalise symbols (TBD) :optimisation:design:
Should we capitalise symbols? This way, we limit the symbol table's
possible options a bit (potentially we could design a better hashing
algorithm?) and it would be kinda like an actual Lisp.
** TODO Design Strings
We have ~sv_t~ so our basic C API is done. We just need pluggable
functions to construct and deconstruct strings as lisps.
** WIP Reader system
We need to design a reader system. The big idea: given a "stream" of
data, we can break out expressions from it. An expression could be
either an atomic value or a container.
The natural method is doing this one at a time (the runtime provides a
~read~ function to do this), but we can also convert an entire stream
into expressions by consuming it fully. So the principle function
here is ~read: stream -> expr~.
*** DONE Design streams
A stream needs to be able to provide characters for us to interpret in
our parsing. Lisp is an LL(1) grammar so we only really need one
character lookup, but seeking is very useful.
A stream could represent a file (using a FILE pointer), an IO stream
(again using FILE pointer but something that could yield interminable
data), or just a string. We need to be able to encode all of these as
streams.
If it's a string, we can just read the entire thing as memory and work
from there. If it's a seekable FILE pointer (i.e. we can technically
do random access), just use MMAP to read the thing into memory. If
it's a non-seekable FILE pointer, we'll need to read a chunk at a
time. We'll have a vector that caches the data as we read it maybe,
allowing us to do random access, but only read chunks as and when
required.
Since they're all differing interfaces, we'll need an abstraction so
parsing isn't as concerned with the specifics of the underlying data
stream. We can use a tagged union of data structures representing the
different underlying stream types, then generate abstract functions
that provide common functionality.
2025-08-29: A really basic interface that makes the parse stage a bit
easier. We're not going to do anything more advanced than the API
i.e. no parsing.
**** DONE Design the tagged union
**** DONE Design the API
*** DONE Design what a "parser function" would look like
The general function is something like ~stream -> T | Err~. What
other state do we need to encode?
*** WIP Write a parser for integers
*** TODO Write a parser for symbols
*** TODO Write a parser for lists
*** TODO Write a parser for vectors
*** TODO Write the general parser
** Backlog
*** TODO Design Big Integers
We currently have 62 bit integers implemented via immediate values
embedded in a pointer. We need to be able to support even _bigger_
integers. How do we do this?
*** TODO Design garbage collection scheme :design:gc:
Really, regardless of what I do, we need to have some kind of garbage
collection header on whatever managed objects we allocate.
Firstly, the distinction between managed and unmanaged objects:
- Managed objects are allocations that are generated as part of
evaluating user code i.e. strings, vectors, conses that are all made
as part of evaluating code.
- Unmanaged objects are allocations we do as part of the runtime.
These are things that we expect to have near infinite lifetimes
(such as the symbol table, vector of allocated objects, etc).
We need to perform garbage collection against the managed objects, and
leave the unmanaged objects to the runtime.
**** TODO Mark stage
We need to mark all objects that are currently accessible from the
environment. This means we need to have a root environment which we
mark all our accessible objects from. Any objects that aren't marked
by this obviously are inaccessible, so we can then sweep them.
How do we store this mark on our managed objects? I think the
simplest approach would be to allocate an extra 8 bytes just before
any managed object we allocate i.e. [8 byte buffer] <object>. Then,
during the mark phase, we can walk back those 8 bytes and
inspect/mutate the mark.
**** TODO Sweep
Once we've marked all objects that are accessible, we need to
investigate all the objects that aren't. We do have
[[file:alisp.h::vec_t memory;][this]] which provides a global map of
all the stuff we've allocated so far ([[file:alisp.h::void
sys_register(sys_t *, lisp_t *);][sys_register]] is used to add to
this, and any managed object is expected to register).
We can iterate through the map and collect all the unmarked objects.
What do we do with these?
1) They are technically freestanding objects allocated through
~calloc~, so we could just free them.
2) Manage some collection of previous allocations to reuse in our next
allocation.
Option (1) is obvious and relatively clean to setup in our current
idea:
- Say at index I we have an object that is unmarked
- Free the associated object at index I
- Swap the end of the array with the cell at index I, then decrement
the size of the container
This is an O(1) time operation.
Option (2) is also relatively straightforward, but we need another
counter in order to make it work:
- Say at index I we have an object that is unmarked
- Swap the end of the array with the cell at index I, then decrement
the size of the container
**** TODO Use previous allocations if they're free to use
This way, instead of deleting the memory or forgetting about it, we
can reuse it. We need to be really careful to make sure our ref(X) is
actually precise, we don't want to trample on the user's hard work.
If we implement our "free cells" as a linked list, we'll essentially
need to take items out of it when we decide to set it back up in the
system. Similarly, if we classify something as unused during the
sweep, we can add it to the free linked list.
Question: should this be separate linked lists for each container type
(i.e. one for conses, one for vectors) or just one big one? The main
task for these free lists is just "can I get a cell or nah?". We'll
analyse the time complexity of this task
Former approach time complexity:
- O(1) time to get a free cell since we just need to check the first
item of the relevant free list (or if it's NIL, we know already)
- O(1) worst case time if there isn't a free cell
Latter approach time complexity:
- Since we have ~get_tag~ it's O(1) time to check the type of the
container.
- Therefore, it would be worst case O(n) if the cell type we need is
only at the end of the list, or if there isn't any cell of the type
we need.
Former approach is better time complexity wise, but latter is way
better in terms of simplicity of code. Must deliberate.
*** TODO Test system registration of allocated units :test:
In particular, does clean up work as we expect? Do we have situations
where we may double free or not clean up something we should've?
*** TODO Design Strings
We have ~sv_t~ so our basic C API is done. We just need pluggable
functions to construct and deconstruct strings as lisps.
*** TODO Capitalise symbols (TBD) :optimisation:design:
Should we capitalise symbols? This way, we limit the symbol table's
possible options a bit (potentially we could design a better hashing
algorithm?) and it would be kinda like an actual Lisp.
** Completed
*** DONE Test value constructors and destructors :test:
Test if ~make_int~ works with ~as_int,~ ~intern~ with ~as_sym~.
Latter will require a symbol table.
*** DONE Test containers constructors and destructors :test:
Test if ~make_vec~ works with ~as_vec~, ~cons~ with ~as_cons~ AND
~CAR~, ~CDR~.
We may need to think of effective ways to deal with NILs in ~car~ and
~cdr~. Maybe make functions as well as the macros so I can choose
between them?
**** DONE Write more tests