Rename tasks.org to oats.org, restructure
This commit is contained in:
@@ -1,7 +1,20 @@
|
|||||||
#+title: Tasks
|
#+title: Oats tracker
|
||||||
#+date: 2025-02-18
|
#+author: Aryadev Chavali
|
||||||
|
#+description: A general tracker for work being done on the project
|
||||||
|
#+FILETAGS: :oats:
|
||||||
|
|
||||||
* WIP Implement a reader
|
* Issues :issues:
|
||||||
|
** TODO Fix issue with memcpy overlap when string concatenating
|
||||||
|
[[file:lisp/lisp.c::// FIXME: Something is going wrong here!]]
|
||||||
|
|
||||||
|
Ideas on what's going wrong:
|
||||||
|
- String sizes seem off
|
||||||
|
- Maybe something is wrong with arena allocator; we use
|
||||||
|
[[file:lib/sv.c::newsv.data = arena_realloc(allocator, sv.data,
|
||||||
|
sv.size, newsv.size);][arena_realloc]] which seems to be the root of
|
||||||
|
the memcpy-overlap
|
||||||
|
* Features :features:
|
||||||
|
** WIP Reader :reader:
|
||||||
We want something a bit generic: able to handle reading from some
|
We want something a bit generic: able to handle reading from some
|
||||||
buffer of memory (a string, or contents of a file where we can read
|
buffer of memory (a string, or contents of a file where we can read
|
||||||
the entire thing at once) or directly from a file stream (STDIN,
|
the entire thing at once) or directly from a file stream (STDIN,
|
||||||
@@ -15,19 +28,7 @@ We also want to be able to admit when reading went wrong for some
|
|||||||
reason with proper errors messages (i.e. can be read by Emacs) - this
|
reason with proper errors messages (i.e. can be read by Emacs) - this
|
||||||
will need to be refactored when we introduce errors within the Lisp
|
will need to be refactored when we introduce errors within the Lisp
|
||||||
runtime itself.
|
runtime itself.
|
||||||
** TODO Implement floats and rationals
|
*** TODO Consider user instantiated reader macros
|
||||||
Rationals are pretty easy - just two integers (quotient and divisor) -
|
|
||||||
so a tagged cons cell would do the job. Floats are a bit more
|
|
||||||
difficult since I'd either need to box them or find a creative way of
|
|
||||||
sticking IEEE-754 floats into < 64 bits.
|
|
||||||
|
|
||||||
Also implement a reader macro for #e<scientific form>. Also deal with
|
|
||||||
[-,+,]inf(.0) and [-,+,]nan(.0).
|
|
||||||
|
|
||||||
Need to do some reading.
|
|
||||||
|
|
||||||
[[file:r7rs-tests.scm::test #t (real? #e1e10)][trigger]]
|
|
||||||
** TODO Consider user instantiated reader macros
|
|
||||||
We don't have an evaluator so we can't really interpret whatever a
|
We don't have an evaluator so we can't really interpret whatever a
|
||||||
user wants for a reader macro currently, but it would be useful to
|
user wants for a reader macro currently, but it would be useful to
|
||||||
think about it now. Currently I have a single function which deals
|
think about it now. Currently I have a single function which deals
|
||||||
@@ -39,47 +40,23 @@ consider user environments via the context.
|
|||||||
|
|
||||||
[[file:reader.c::perr_t parse_reader_macro(context_t *ctx, input_t
|
[[file:reader.c::perr_t parse_reader_macro(context_t *ctx, input_t
|
||||||
*inp, lisp_t **ret)][function link]]
|
*inp, lisp_t **ret)][function link]]
|
||||||
* TODO Consider Lisp runtime errors
|
*** TODO Parse exponential notation
|
||||||
* TODO Admit arbitrarily sized integers
|
We're erroring out here due to not having proper reader notation
|
||||||
Currently we admit fixed size integers of 63 bits. They use 2s
|
[[file:examples/r7rs-tests.scm::test #t (real? #e1e10)]]
|
||||||
complement due to x86 which means our max and min are 62 bit based.
|
** TODO Evaluator
|
||||||
|
** TODO Runtime errors
|
||||||
However, to even try to be a scheme implementation we need to allow
|
** TODO Better numerics
|
||||||
arbitrarily sized integers. What are the specific tasks we need to
|
We currently admit fixed size integers (63 bits). We _need_ more to
|
||||||
complete in our model to achieve this?:
|
be a scheme.
|
||||||
+ Allow "reading" of unfixed size integers
|
*** Unfixed size integers
|
||||||
+ This will require reading a sequence of base 10 digits without
|
*** Rationals
|
||||||
relying on strtold
|
*** Floats
|
||||||
+ Implement unfixed size integers into our memory model
|
*** Complex numbers
|
||||||
+ Certainly, unfixed size integers cannot be carried around like
|
** TODO Primitive operations
|
||||||
fixnums wherein we can embed an integer into the pointer.
|
** TODO Macros
|
||||||
Thus we have to allocate them in memory.
|
** TODO Modules
|
||||||
+ NOTE: There will be definitely be an optimisation to be done
|
* Completed :completed:
|
||||||
here; integers that are within the bound of a fixnum could be
|
** DONE More efficient memory model for symbols
|
||||||
left as a fixnum then "elevated" to an integer when needed
|
|
||||||
+ I think the big idea is allocating them as a fixed set of bytes
|
|
||||||
like big symbols. For big integers we have to read the memory
|
|
||||||
associated thus we need a pointer. Due to 2s complement it should
|
|
||||||
be trivial to increase the size of an integer to fit a new result
|
|
||||||
i.e. if I'm adding two integers and that leads to an "overflow"
|
|
||||||
where the result is of greater width than its inputs, we should
|
|
||||||
just allocate new memory for it.
|
|
||||||
|
|
||||||
Consequences:
|
|
||||||
- Greater memory use
|
|
||||||
- In fact exponential if we need to allocate a whole new integer per
|
|
||||||
operation rather than utilising the input memory
|
|
||||||
- Possible loss of performance due to making integers over fixnums
|
|
||||||
when they don't need to be
|
|
||||||
- Comparison is harder on integers
|
|
||||||
- Harder to cache for the CPU
|
|
||||||
|
|
||||||
but all of this is to be expected when the user is an idiot.
|
|
||||||
* TODO Think about how to perform operations on different types
|
|
||||||
** TODO Integers
|
|
||||||
** TODO Symbols
|
|
||||||
** TODO Pairs
|
|
||||||
* DONE More efficient memory model for symbols
|
|
||||||
The primitive model for symbol allocation is an 8 byte number
|
The primitive model for symbol allocation is an 8 byte number
|
||||||
representing the size of the symbol, followed by a variable number of
|
representing the size of the symbol, followed by a variable number of
|
||||||
characters (as bytes). This is stored somewhere in the memory
|
characters (as bytes). This is stored somewhere in the memory
|
||||||
@@ -123,7 +100,7 @@ need to allocate memory. But, in the worst case of 8 character
|
|||||||
symbols, we're only allocating two 64 bit integers: these are easy to
|
symbols, we're only allocating two 64 bit integers: these are easy to
|
||||||
walk on x86 and we've reached at least parity between the memory
|
walk on x86 and we've reached at least parity between the memory
|
||||||
required for administration (the size number) and the actual data.
|
required for administration (the size number) and the actual data.
|
||||||
** Being more aggressive?
|
*** Being more aggressive?
|
||||||
Technically, ANSI bytes only need 7 bits. For each of the 7 bytes
|
Technically, ANSI bytes only need 7 bits. For each of the 7 bytes
|
||||||
used for the character data, we can take one bit off, leaving us with
|
used for the character data, we can take one bit off, leaving us with
|
||||||
7 bits to use for an additional character. We don't need to adjust
|
7 bits to use for an additional character. We don't need to adjust
|
||||||
@@ -148,10 +125,10 @@ to do a lot more work. x86-64 CPUs are much better at walking bytes
|
|||||||
than they are walking 7 bit offsets. This may be something to
|
than they are walking 7 bit offsets. This may be something to
|
||||||
consider if CPU time is cheaper than allocating 8 byte symbols
|
consider if CPU time is cheaper than allocating 8 byte symbols
|
||||||
somewhere.
|
somewhere.
|
||||||
* DONE Tagging scheme based on arena pages
|
** DONE Tagging scheme based on arena pages
|
||||||
2025-04-09:21:59:29: We went for option (2) of just taking a byte for
|
2025-04-09:21:59:29: We went for option (2) of just taking a byte for
|
||||||
free from the memory address and using it as our management byte.
|
free from the memory address and using it as our management byte.
|
||||||
** 1) Page-offset schema
|
*** 1) Page-offset schema
|
||||||
I've realised arenas are way better than the standard array dynamic I
|
I've realised arenas are way better than the standard array dynamic I
|
||||||
was going for before. However, we lose the nicer semantics of using
|
was going for before. However, we lose the nicer semantics of using
|
||||||
an array index for pointers, where we can implement our own semantics
|
an array index for pointers, where we can implement our own semantics
|
||||||
@@ -213,7 +190,7 @@ will be stable regardless of any further memory management functions
|
|||||||
performed on the arena (excluding cleanup) - so once you have a host
|
performed on the arena (excluding cleanup) - so once you have a host
|
||||||
pointer, you can use it as much as you want without having to worry
|
pointer, you can use it as much as you want without having to worry
|
||||||
about the pointer becoming invalid in the next second.
|
about the pointer becoming invalid in the next second.
|
||||||
** 2) 48-bit addressing exploit
|
*** 2) 48-bit addressing exploit
|
||||||
Most x86 CPUs only use around 48-56 bits for actual memory addresses -
|
Most x86 CPUs only use around 48-56 bits for actual memory addresses -
|
||||||
mostly as a result of not needing _nearly_ as many addresses as a full
|
mostly as a result of not needing _nearly_ as many addresses as a full
|
||||||
64 bit word would provide.
|
64 bit word would provide.
|
||||||
Reference in New Issue
Block a user