Archived

This repository has been archived on 2025-11-10. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Files

Aryadev Chavali 030a289497 Some changes to how streams work, clean up alisp.org

2025-09-01 21:47:25 +01:00

7.7 KiB

Raw Permalink Blame History

Alisp

Notes
Tasks

Notes

Overview

alisp.h is a single header for the entire runtime. We'll also have a compiled shared library alisp.so which one may link against to get implementation. That's all that's necessary for one to write C code that targets our Lisp machine.

We'll have a separate header + library for the compiler since that's not strictly necessary for transpiled C code to consume. This will transpile Lisp code into C, which uses the aforementioned alisp header and library to compile into a native executable.

WIP How does transpiled code operate?

My current idea is: we're transpiling into C for the actual Lisp code. User made functions can be transpiled into C functions, which we can mangle names for. Macros… I don't know, maybe we could have two function pointer tables so we know how to execute them?

Then, we'll have an associated "descriptor" file which describes the functions we've transpiled. Bare minimum, this file has to have a "symbol name" to C mangled function name dictionary. We can also add other metadata as we need.

TODO Deliberate on whether we compile into a shared library or not

If we compile these C code objects into shared libraries, the descriptor needs to concern itself with code locations. This might be easier in a sense, since the code will already be compiled.

WIP How do we call native code?

When we're calling a natively compiled function, we can use this metadata mapping to call the C function. This native code will use our Lisp runtime, same as any other code, so it should be pretty seamless in that regard. But we'll need to set a calling convention in order to make calling into this seamless from a runtime perspective.

Tasks

TODO Capitalise symbols (TBD) optimisation design

Should we capitalise symbols? This way, we limit the symbol table's possible options a bit (potentially we could design a better hashing algorithm?) and it would be kinda like an actual Lisp.

TODO Design Strings

We have sv_t so our basic C API is done. We just need pluggable functions to construct and deconstruct strings as lisps.

WIP Reader system

We need to design a reader system. The big idea: given a "stream" of data, we can break out expressions from it. An expression could be either an atomic value or a container.

The natural method is doing this one at a time (the runtime provides a read function to do this), but we can also convert an entire stream into expressions by consuming it fully. So the principle function here is read: stream -> expr.

DONE Design streams

A stream needs to be able to provide characters for us to interpret in our parsing. Lisp is an LL(1) grammar so we only really need one character lookup, but seeking is very useful.

A stream could represent a file (using a FILE pointer), an IO stream (again using FILE pointer but something that could yield interminable data), or just a string. We need to be able to encode all of these as streams.

If it's a string, we can just read the entire thing as memory and work from there. If it's a seekable FILE pointer (i.e. we can technically do random access), just use MMAP to read the thing into memory. If it's a non-seekable FILE pointer, we'll need to read a chunk at a time. We'll have a vector that caches the data as we read it maybe, allowing us to do random access, but only read chunks as and when required.

Since they're all differing interfaces, we'll need an abstraction so parsing isn't as concerned with the specifics of the underlying data stream. We can use a tagged union of data structures representing the different underlying stream types, then generate abstract functions that provide common functionality.

2025-08-29: A really basic interface that makes the parse stage a bit easier. We're not going to do anything more advanced than the API i.e. no parsing.

DONE Design the tagged union

DONE Design the API

WIP Figure out the possible parse errors

DONE Design what a "parser function" would look like

The general function is something like stream -> T | Err. What other state do we need to encode?

WIP Write a parser for integers

TODO Write a parser for symbols

TODO Write a parser for lists

TODO Write a parser for vectors

TODO Write a generic parser that returns a generic expression

TODO Test system registration of allocated units test

In particular, does clean up work as we expect? Do we have situations where we may double free or not clean up something we should've?

TODO Design garbage collection scheme design gc

Really, regardless of what I do, we need to have some kind of garbage collection header on whatever we allocate e.g. references if we reference count for GC.

TODO Mark stage

When some item is being used by another, we need a way to adjust the metadata such that the system is aware of it being used.

For example, say I have X, Y as random allocated objects. Then I construct CONS(X, Y). Then, ref(X) and ref(Y) need to be incremented to say I'm using them.

TODO Sweep

Say I have an object that I construct, C. If ref(C) = 0, then C is no longer needed, and is free.

There are two components to this:

we need a way of decrementing references if an object is no longer needed.
we need a way of running through everything we've allocated so far to figure out what's free to take away.

Once we've filtered out what we don't need anymore, what should we do with them? Naive approach would be to just actually free the cells in question. But I think the next item may be a better idea.

TODO Use previous allocations if they're free to use

If we have no references to a cell, this cell is free to use. In other words, if I later allocate something of the same type, instead of allocating a new object, why not just use the one I've already got?

This way, instead of deleting the memory or forgetting about it, we can reuse it. We need to be really careful to make sure our ref(X) is actually precise, we don't want to trample on the user's hard work.

If we implement our "free cells" as a linked list, we'll essentially need to take items out of it when we decide to set it back up in the system. Similarly, if we classify something as unused during the sweep, we can add it to the free linked list.

Question: should this be separate linked lists for each container type (i.e. one for conses, one for vectors) or just one big one? The main task for these free lists is just "can I get a cell or nah?". We'll analyse the time complexity of this task

Former approach time complexity:

O(1) time to get a free cell since we just need to check the first item of the relevant free list (or if it's NIL, we know already)
O(1) worst case time if there isn't a free cell

Latter approach time complexity:

Since we have get_tag it's O(1) time to check the type of the container.
Therefore, it would be worst case O(n) if the cell type we need is only at the end of the list, or if there isn't any cell of the type we need.

Former approach is better time complexity wise, but latter is way better in terms of simplicity of code. Must deliberate.

TODO Design Big Integers

We currently have 62 bit integers implemented via immediate values embedded in a pointer. We need to be able to support even bigger integers. How do we do this?

DONE Test value constructors and destructors test

Test if make_int works with as_int, intern with as_sym. Latter will require a symbol table.

DONE Test containers constructors and destructors test

Test if make_vec works with as_vec, cons with as_cons AND CAR, CDR.

We may need to think of effective ways to deal with NILs in car and cdr. Maybe make functions as well as the macros so I can choose between them?

7.7 KiB Raw Permalink Blame History