diff options
author | Aryadev Chavali <aryadev@aryadevchavali.com> | 2025-05-28 23:54:34 +0100 |
---|---|---|
committer | Aryadev Chavali <aryadev@aryadevchavali.com> | 2025-05-28 23:54:34 +0100 |
commit | bfff660d0e1a03063b889f84e8cfd046565c6046 (patch) | |
tree | 925ef120d7a20bafaed34fcf87dc5bb150ea3e3d | |
parent | 12de1e8db90bccd5a0eefd21075f07c7b7e3dfaa (diff) | |
download | oats-bfff660d0e1a03063b889f84e8cfd046565c6046.tar.gz oats-bfff660d0e1a03063b889f84e8cfd046565c6046.tar.bz2 oats-bfff660d0e1a03063b889f84e8cfd046565c6046.zip |
Rename tasks.org to oats.org, restructure
-rw-r--r-- | oats.org (renamed from tasks.org) | 101 |
1 files changed, 39 insertions, 62 deletions
@@ -1,7 +1,20 @@ -#+title: Tasks -#+date: 2025-02-18 - -* WIP Implement a reader +#+title: Oats tracker +#+author: Aryadev Chavali +#+description: A general tracker for work being done on the project +#+FILETAGS: :oats: + +* Issues :issues: +** TODO Fix issue with memcpy overlap when string concatenating +[[file:lisp/lisp.c::// FIXME: Something is going wrong here!]] + +Ideas on what's going wrong: +- String sizes seem off +- Maybe something is wrong with arena allocator; we use + [[file:lib/sv.c::newsv.data = arena_realloc(allocator, sv.data, + sv.size, newsv.size);][arena_realloc]] which seems to be the root of + the memcpy-overlap +* Features :features: +** WIP Reader :reader: We want something a bit generic: able to handle reading from some buffer of memory (a string, or contents of a file where we can read the entire thing at once) or directly from a file stream (STDIN, @@ -15,19 +28,7 @@ We also want to be able to admit when reading went wrong for some reason with proper errors messages (i.e. can be read by Emacs) - this will need to be refactored when we introduce errors within the Lisp runtime itself. -** TODO Implement floats and rationals -Rationals are pretty easy - just two integers (quotient and divisor) - -so a tagged cons cell would do the job. Floats are a bit more -difficult since I'd either need to box them or find a creative way of -sticking IEEE-754 floats into < 64 bits. - -Also implement a reader macro for #e<scientific form>. Also deal with -[-,+,]inf(.0) and [-,+,]nan(.0). - -Need to do some reading. - -[[file:r7rs-tests.scm::test #t (real? #e1e10)][trigger]] -** TODO Consider user instantiated reader macros +*** TODO Consider user instantiated reader macros We don't have an evaluator so we can't really interpret whatever a user wants for a reader macro currently, but it would be useful to think about it now. Currently I have a single function which deals @@ -39,47 +40,23 @@ consider user environments via the context. [[file:reader.c::perr_t parse_reader_macro(context_t *ctx, input_t *inp, lisp_t **ret)][function link]] -* TODO Consider Lisp runtime errors -* TODO Admit arbitrarily sized integers -Currently we admit fixed size integers of 63 bits. They use 2s -complement due to x86 which means our max and min are 62 bit based. - -However, to even try to be a scheme implementation we need to allow -arbitrarily sized integers. What are the specific tasks we need to -complete in our model to achieve this?: -+ Allow "reading" of unfixed size integers - + This will require reading a sequence of base 10 digits without - relying on strtold -+ Implement unfixed size integers into our memory model - + Certainly, unfixed size integers cannot be carried around like - fixnums wherein we can embed an integer into the pointer. - Thus we have to allocate them in memory. - + NOTE: There will be definitely be an optimisation to be done - here; integers that are within the bound of a fixnum could be - left as a fixnum then "elevated" to an integer when needed - + I think the big idea is allocating them as a fixed set of bytes - like big symbols. For big integers we have to read the memory - associated thus we need a pointer. Due to 2s complement it should - be trivial to increase the size of an integer to fit a new result - i.e. if I'm adding two integers and that leads to an "overflow" - where the result is of greater width than its inputs, we should - just allocate new memory for it. - -Consequences: -- Greater memory use - - In fact exponential if we need to allocate a whole new integer per - operation rather than utilising the input memory -- Possible loss of performance due to making integers over fixnums - when they don't need to be -- Comparison is harder on integers -- Harder to cache for the CPU - -but all of this is to be expected when the user is an idiot. -* TODO Think about how to perform operations on different types -** TODO Integers -** TODO Symbols -** TODO Pairs -* DONE More efficient memory model for symbols +*** TODO Parse exponential notation +We're erroring out here due to not having proper reader notation +[[file:examples/r7rs-tests.scm::test #t (real? #e1e10)]] +** TODO Evaluator +** TODO Runtime errors +** TODO Better numerics +We currently admit fixed size integers (63 bits). We _need_ more to +be a scheme. +*** Unfixed size integers +*** Rationals +*** Floats +*** Complex numbers +** TODO Primitive operations +** TODO Macros +** TODO Modules +* Completed :completed: +** DONE More efficient memory model for symbols The primitive model for symbol allocation is an 8 byte number representing the size of the symbol, followed by a variable number of characters (as bytes). This is stored somewhere in the memory @@ -123,7 +100,7 @@ need to allocate memory. But, in the worst case of 8 character symbols, we're only allocating two 64 bit integers: these are easy to walk on x86 and we've reached at least parity between the memory required for administration (the size number) and the actual data. -** Being more aggressive? +*** Being more aggressive? Technically, ANSI bytes only need 7 bits. For each of the 7 bytes used for the character data, we can take one bit off, leaving us with 7 bits to use for an additional character. We don't need to adjust @@ -148,10 +125,10 @@ to do a lot more work. x86-64 CPUs are much better at walking bytes than they are walking 7 bit offsets. This may be something to consider if CPU time is cheaper than allocating 8 byte symbols somewhere. -* DONE Tagging scheme based on arena pages +** DONE Tagging scheme based on arena pages 2025-04-09:21:59:29: We went for option (2) of just taking a byte for free from the memory address and using it as our management byte. -** 1) Page-offset schema +*** 1) Page-offset schema I've realised arenas are way better than the standard array dynamic I was going for before. However, we lose the nicer semantics of using an array index for pointers, where we can implement our own semantics @@ -213,7 +190,7 @@ will be stable regardless of any further memory management functions performed on the arena (excluding cleanup) - so once you have a host pointer, you can use it as much as you want without having to worry about the pointer becoming invalid in the next second. -** 2) 48-bit addressing exploit +*** 2) 48-bit addressing exploit Most x86 CPUs only use around 48-56 bits for actual memory addresses - mostly as a result of not needing _nearly_ as many addresses as a full 64 bit word would provide. |