169 lines
7.7 KiB
Org Mode
169 lines
7.7 KiB
Org Mode
#+title: Alisp
|
|
#+author: Aryadev Chavali
|
|
#+date: 2025-08-20
|
|
#+filetags: :alisp:
|
|
|
|
* Notes
|
|
** Overview
|
|
~alisp.h~ is a single header for the entire runtime. We'll also have a
|
|
compiled shared library ~alisp.so~ which one may link against to get
|
|
implementation. That's all that's necessary for one to write C code
|
|
that targets our Lisp machine.
|
|
|
|
We'll have a separate header + library for the compiler since that's
|
|
not strictly necessary for transpiled C code to consume. This will
|
|
transpile Lisp code into C, which uses the aforementioned ~alisp~
|
|
header and library to compile into a native executable.
|
|
|
|
** WIP How does transpiled code operate?
|
|
My current idea is: we're transpiling into C for the actual Lisp code.
|
|
User made functions can be transpiled into C functions, which we can
|
|
mangle names for. Macros... I don't know, maybe we could have two
|
|
function pointer tables so we know how to execute them?
|
|
|
|
Then, we'll have an associated "descriptor" file which describes the
|
|
functions we've transpiled. Bare minimum, this file has to have a
|
|
"symbol name" to C mangled function name dictionary. We can also add
|
|
other metadata as we need.
|
|
|
|
*** TODO Deliberate on whether we compile into a shared library or not
|
|
If we compile these C code objects into shared libraries, the
|
|
descriptor needs to concern itself with code locations. This might be
|
|
easier in a sense, since the code will already be compiled.
|
|
** WIP How do we call native code?
|
|
When we're calling a natively compiled function, we can use this
|
|
metadata mapping to call the C function. This native code will use
|
|
our Lisp runtime, same as any other code, so it should be pretty
|
|
seamless in that regard. But we'll need to set a calling convention
|
|
in order to make calling into this seamless from a runtime
|
|
perspective.
|
|
* Tasks
|
|
** TODO Capitalise symbols (TBD) :optimisation:design:
|
|
Should we capitalise symbols? This way, we limit the symbol table's
|
|
possible options a bit (potentially we could design a better hashing
|
|
algorithm?) and it would be kinda like an actual Lisp.
|
|
** TODO Design Strings
|
|
We have ~sv_t~ so our basic C API is done. We just need pluggable
|
|
functions to construct and deconstruct strings as lisps.
|
|
** WIP Reader system
|
|
We need to design a reader system. The big idea: given a "stream" of
|
|
data, we can break out expressions from it. An expression could be
|
|
either an atomic value or a container.
|
|
|
|
The natural method is doing this one at a time (the runtime provides a
|
|
~read~ function to do this), but we can also convert an entire stream
|
|
into expressions by consuming it fully. So the principle function
|
|
here is ~read: stream -> expr~.
|
|
*** DONE Design streams
|
|
A stream needs to be able to provide characters for us to interpret in
|
|
our parsing. Lisp is an LL(1) grammar so we only really need one
|
|
character lookup, but seeking is very useful.
|
|
|
|
A stream could represent a file (using a FILE pointer), an IO stream
|
|
(again using FILE pointer but something that could yield interminable
|
|
data), or just a string. We need to be able to encode all of these as
|
|
streams.
|
|
|
|
If it's a string, we can just read the entire thing as memory and work
|
|
from there. If it's a seekable FILE pointer (i.e. we can technically
|
|
do random access), just use MMAP to read the thing into memory. If
|
|
it's a non-seekable FILE pointer, we'll need to read a chunk at a
|
|
time. We'll have a vector that caches the data as we read it maybe,
|
|
allowing us to do random access, but only read chunks as and when
|
|
required.
|
|
|
|
Since they're all differing interfaces, we'll need an abstraction so
|
|
parsing isn't as concerned with the specifics of the underlying data
|
|
stream. We can use a tagged union of data structures representing the
|
|
different underlying stream types, then generate abstract functions
|
|
that provide common functionality.
|
|
|
|
2025-08-29: A really basic interface that makes the parse stage a bit
|
|
easier. We're not going to do anything more advanced than the API
|
|
i.e. no parsing.
|
|
**** DONE Design the tagged union
|
|
**** DONE Design the API
|
|
*** WIP Figure out the possible parse errors
|
|
*** DONE Design what a "parser function" would look like
|
|
The general function is something like ~stream -> T | Err~. What
|
|
other state do we need to encode?
|
|
*** WIP Write a parser for integers
|
|
*** TODO Write a parser for symbols
|
|
*** TODO Write a parser for lists
|
|
*** TODO Write a parser for vectors
|
|
*** TODO Write a generic parser that returns a generic expression
|
|
** TODO Test system registration of allocated units :test:
|
|
In particular, does clean up work as we expect? Do we have situations
|
|
where we may double free or not clean up something we should've?
|
|
** TODO Design garbage collection scheme :design:gc:
|
|
Really, regardless of what I do, we need to have some kind of garbage
|
|
collection header on whatever we allocate e.g. references if we
|
|
reference count for GC.
|
|
*** TODO Mark stage
|
|
When some item is being used by another, we need a way to adjust the
|
|
metadata such that the system is aware of it being used.
|
|
|
|
For example, say I have X, Y as random allocated objects. Then I
|
|
construct CONS(X, Y). Then, ref(X) and ref(Y) need to be incremented
|
|
to say I'm using them.
|
|
*** TODO Sweep
|
|
Say I have an object that I construct, C. If ref(C) = 0, then C is no
|
|
longer needed, and is free.
|
|
|
|
There are two components to this:
|
|
- we need a way of decrementing references if an object is no longer needed.
|
|
- we need a way of running through everything we've allocated so far
|
|
to figure out what's free to take away.
|
|
|
|
Once we've filtered out what we don't need anymore, what should we do
|
|
with them? Naive approach would be to just actually ~free~ the cells
|
|
in question. But I think the next item may be a better idea.
|
|
*** TODO Use previous allocations if they're free to use
|
|
If we have no references to a cell, this cell is free to use. In
|
|
other words, if I later allocate something of the same type, instead
|
|
of allocating a new object, why not just use the one I've already got?
|
|
|
|
This way, instead of deleting the memory or forgetting about it, we
|
|
can reuse it. We need to be really careful to make sure our ref(X) is
|
|
actually precise, we don't want to trample on the user's hard work.
|
|
|
|
If we implement our "free cells" as a linked list, we'll essentially
|
|
need to take items out of it when we decide to set it back up in the
|
|
system. Similarly, if we classify something as unused during the
|
|
sweep, we can add it to the free linked list.
|
|
|
|
Question: should this be separate linked lists for each container type
|
|
(i.e. one for conses, one for vectors) or just one big one? The main
|
|
task for these free lists is just "can I get a cell or nah?". We'll
|
|
analyse the time complexity of this task
|
|
|
|
Former approach time complexity:
|
|
- O(1) time to get a free cell since we just need to check the first
|
|
item of the relevant free list (or if it's NIL, we know already)
|
|
- O(1) worst case time if there isn't a free cell
|
|
|
|
Latter approach time complexity:
|
|
- Since we have ~get_tag~ it's O(1) time to check the type of the
|
|
container.
|
|
- Therefore, it would be worst case O(n) if the cell type we need is
|
|
only at the end of the list, or if there isn't any cell of the type
|
|
we need.
|
|
|
|
Former approach is better time complexity wise, but latter is way
|
|
better in terms of simplicity of code. Must deliberate.
|
|
** TODO Design Big Integers
|
|
We currently have 62 bit integers implemented via immediate values
|
|
embedded in a pointer. We need to be able to support even _bigger_
|
|
integers. How do we do this?
|
|
** DONE Test value constructors and destructors :test:
|
|
Test if ~make_int~ works with ~as_int,~ ~intern~ with ~as_sym~.
|
|
Latter will require a symbol table.
|
|
** DONE Test containers constructors and destructors :test:
|
|
Test if ~make_vec~ works with ~as_vec~, ~cons~ with ~as_cons~ AND
|
|
~CAR~, ~CDR~.
|
|
|
|
We may need to think of effective ways to deal with NILs in ~car~ and
|
|
~cdr~. Maybe make functions as well as the macros so I can choose
|
|
between them?
|
|
*** DONE Write more tests
|