Rename tasks.org to oats.org, restructure

This commit is contained in:
2025-05-28 23:54:34 +01:00
parent 12de1e8db9
commit bfff660d0e

View File

@@ -1,7 +1,20 @@
#+title: Tasks
#+date: 2025-02-18
#+title: Oats tracker
#+author: Aryadev Chavali
#+description: A general tracker for work being done on the project
#+FILETAGS: :oats:
* WIP Implement a reader
* Issues :issues:
** TODO Fix issue with memcpy overlap when string concatenating
[[file:lisp/lisp.c::// FIXME: Something is going wrong here!]]
Ideas on what's going wrong:
- String sizes seem off
- Maybe something is wrong with arena allocator; we use
[[file:lib/sv.c::newsv.data = arena_realloc(allocator, sv.data,
sv.size, newsv.size);][arena_realloc]] which seems to be the root of
the memcpy-overlap
* Features :features:
** WIP Reader :reader:
We want something a bit generic: able to handle reading from some
buffer of memory (a string, or contents of a file where we can read
the entire thing at once) or directly from a file stream (STDIN,
@@ -15,19 +28,7 @@ We also want to be able to admit when reading went wrong for some
reason with proper errors messages (i.e. can be read by Emacs) - this
will need to be refactored when we introduce errors within the Lisp
runtime itself.
** TODO Implement floats and rationals
Rationals are pretty easy - just two integers (quotient and divisor) -
so a tagged cons cell would do the job. Floats are a bit more
difficult since I'd either need to box them or find a creative way of
sticking IEEE-754 floats into < 64 bits.
Also implement a reader macro for #e<scientific form>. Also deal with
[-,+,]inf(.0) and [-,+,]nan(.0).
Need to do some reading.
[[file:r7rs-tests.scm::test #t (real? #e1e10)][trigger]]
** TODO Consider user instantiated reader macros
*** TODO Consider user instantiated reader macros
We don't have an evaluator so we can't really interpret whatever a
user wants for a reader macro currently, but it would be useful to
think about it now. Currently I have a single function which deals
@@ -39,47 +40,23 @@ consider user environments via the context.
[[file:reader.c::perr_t parse_reader_macro(context_t *ctx, input_t
*inp, lisp_t **ret)][function link]]
* TODO Consider Lisp runtime errors
* TODO Admit arbitrarily sized integers
Currently we admit fixed size integers of 63 bits. They use 2s
complement due to x86 which means our max and min are 62 bit based.
However, to even try to be a scheme implementation we need to allow
arbitrarily sized integers. What are the specific tasks we need to
complete in our model to achieve this?:
+ Allow "reading" of unfixed size integers
+ This will require reading a sequence of base 10 digits without
relying on strtold
+ Implement unfixed size integers into our memory model
+ Certainly, unfixed size integers cannot be carried around like
fixnums wherein we can embed an integer into the pointer.
Thus we have to allocate them in memory.
+ NOTE: There will be definitely be an optimisation to be done
here; integers that are within the bound of a fixnum could be
left as a fixnum then "elevated" to an integer when needed
+ I think the big idea is allocating them as a fixed set of bytes
like big symbols. For big integers we have to read the memory
associated thus we need a pointer. Due to 2s complement it should
be trivial to increase the size of an integer to fit a new result
i.e. if I'm adding two integers and that leads to an "overflow"
where the result is of greater width than its inputs, we should
just allocate new memory for it.
Consequences:
- Greater memory use
- In fact exponential if we need to allocate a whole new integer per
operation rather than utilising the input memory
- Possible loss of performance due to making integers over fixnums
when they don't need to be
- Comparison is harder on integers
- Harder to cache for the CPU
but all of this is to be expected when the user is an idiot.
* TODO Think about how to perform operations on different types
** TODO Integers
** TODO Symbols
** TODO Pairs
* DONE More efficient memory model for symbols
*** TODO Parse exponential notation
We're erroring out here due to not having proper reader notation
[[file:examples/r7rs-tests.scm::test #t (real? #e1e10)]]
** TODO Evaluator
** TODO Runtime errors
** TODO Better numerics
We currently admit fixed size integers (63 bits). We _need_ more to
be a scheme.
*** Unfixed size integers
*** Rationals
*** Floats
*** Complex numbers
** TODO Primitive operations
** TODO Macros
** TODO Modules
* Completed :completed:
** DONE More efficient memory model for symbols
The primitive model for symbol allocation is an 8 byte number
representing the size of the symbol, followed by a variable number of
characters (as bytes). This is stored somewhere in the memory
@@ -123,7 +100,7 @@ need to allocate memory. But, in the worst case of 8 character
symbols, we're only allocating two 64 bit integers: these are easy to
walk on x86 and we've reached at least parity between the memory
required for administration (the size number) and the actual data.
** Being more aggressive?
*** Being more aggressive?
Technically, ANSI bytes only need 7 bits. For each of the 7 bytes
used for the character data, we can take one bit off, leaving us with
7 bits to use for an additional character. We don't need to adjust
@@ -148,10 +125,10 @@ to do a lot more work. x86-64 CPUs are much better at walking bytes
than they are walking 7 bit offsets. This may be something to
consider if CPU time is cheaper than allocating 8 byte symbols
somewhere.
* DONE Tagging scheme based on arena pages
** DONE Tagging scheme based on arena pages
2025-04-09:21:59:29: We went for option (2) of just taking a byte for
free from the memory address and using it as our management byte.
** 1) Page-offset schema
*** 1) Page-offset schema
I've realised arenas are way better than the standard array dynamic I
was going for before. However, we lose the nicer semantics of using
an array index for pointers, where we can implement our own semantics
@@ -213,7 +190,7 @@ will be stable regardless of any further memory management functions
performed on the arena (excluding cleanup) - so once you have a host
pointer, you can use it as much as you want without having to worry
about the pointer becoming invalid in the next second.
** 2) 48-bit addressing exploit
*** 2) 48-bit addressing exploit
Most x86 CPUs only use around 48-56 bits for actual memory addresses -
mostly as a result of not needing _nearly_ as many addresses as a full
64 bit word would provide.