oats.org


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207

#+title: Oats tracker
#+author: Aryadev Chavali
#+description: A general tracker for work being done on the project
#+FILETAGS: :oats:

* Issues :issues:
** TODO Fix issue with memcpy overlap when string concatenating
[[file:lisp/lisp.c::// FIXME: Something is going wrong here!]]

Ideas on what's going wrong:
- String sizes seem off
- Maybe something is wrong with arena allocator; we use
  [[file:lib/sv.c::newsv.data = arena_realloc(allocator, sv.data,
  sv.size, newsv.size);][arena_realloc]] which seems to be the root of
  the memcpy-overlap
* Features :features:
** WIP Reader :reader:
We want something a bit generic: able to handle reading from some
buffer of memory (a string, or contents of a file where we can read
the entire thing at once) or directly from a file stream (STDIN,
network streams, etc).

We don't need a tokeniser - the basic grammar of a Lisp is really easy
to narrow down, so we can skip tokenisation and go straight for
parsing.

We also want to be able to admit when reading went wrong for some
reason with proper errors messages (i.e. can be read by Emacs) - this
will need to be refactored when we introduce errors within the Lisp
runtime itself.
*** TODO Consider user instantiated reader macros
We don't have an evaluator so we can't really interpret whatever a
user wants for a reader macro currently, but it would be useful to
think about it now.  Currently I have a single function which deals
with parsing reader macros but that's just static.

Thing is, it does have the context as a parameter to pass to delegate
functions (such as ~parse_vec~) - wouldn't be a massive jump to also
consider user environments via the context.

[[file:reader.c::perr_t parse_reader_macro(context_t *ctx, input_t
*inp, lisp_t **ret)][function link]]
*** TODO Parse exponential notation
We're erroring out here due to not having proper reader notation
[[file:examples/r7rs-tests.scm::test #t (real? #e1e10)]]
** TODO Evaluator
** TODO Runtime errors
** TODO Better numerics
We currently admit fixed size integers (63 bits).  We _need_ more to
be a scheme.
*** Unfixed size integers
*** Rationals
*** Floats
*** Complex numbers
** TODO Primitive operations
** TODO Macros
** TODO Modules
* Completed :completed:
** DONE More efficient memory model for symbols
The primitive model for symbol allocation is an 8 byte number
representing the size of the symbol, followed by a variable number of
characters (as bytes).  This is stored somewhere in the memory
context, which will likely be in the heap.

We're actually wasting a ridiculous amount of memory with this model.
We'll almost never be using the full 64 bits of the size to represent
a symbol (who's ever going to go close to 1.8 quintillion bytes for a
single symbol?) - and even more annoying, there are tons of small
sized symbols where we actually need _more space_ for the 8 byte size
than the underlying symbol data.

I think there's definitely a better model available, at least for
smaller symbols.  We already have inlined integers where the pointer
itself is the integer, why can't we do the same for symbols?
A pointer has 8 bytes of data to use - one character is one byte -
thus we can represent 8 character symbols in one pointer.

If we want this to still be within the remit of our pointer tagging
scheme, we'll need a bit of space for our administrative purposes
(admin slop...).  So let's take _one_ byte out of that 8 for that.  So
we can represent any symbols 7 bytes long in a single pointer.  We'll
need to keep in mind we want to represent symbols that may be less
than 7 characters, so that one admin byte is going to be doing some
heavy lifting.

Let's examine that one admin byte:
+ At least 3 bits are necessary for the actual pointer tag: "at least"
  because we might increase the size of the tag based on demand
+ Thus, 5 bits left for our use - let's fit the size in there.
  pow(2,6) - 1 = 63, so we have _way_ more than we need

What are the benefits of doing this?:
+ Symbol equality for small symbols is a cinch: just compare the two
  tagged "pointers"
+ 7 or less character symbols require no memory allocation, just
  working off the stack

One thing to note is that for more than 7 character symbols, we'll
need to allocate memory.  But, in the worst case of 8 character
symbols, we're only allocating two 64 bit integers: these are easy to
walk on x86 and we've reached at least parity between the memory
required for administration (the size number) and the actual data.
*** Being more aggressive?
Technically, ANSI bytes only need 7 bits.  For each of the 7 bytes
used for the character data, we can take one bit off, leaving us with
7 bits to use for an additional character.  We don't need to adjust
anything else in the schema.

So, hypothetically we could represent up to 8 character symbols!  This
would require packing the characters more aggressively into a single
pointer.  Let's look at the layout of our pointers.  This table is
indexed from most significant to least i.e. 0 is the MSB and 63 is the
LSB:

|-------+------------|
| Index | Usage      |
|-------+------------|
|  0-55 | Characters |
| 56-60 | Size       |
| 61-63 | Tag        |
|-------+------------|

Honestly though, for an extra byte of information we'll probably have
to do a lot more work.  x86-64 CPUs are much better at walking bytes
than they are walking 7 bit offsets.  This may be something to
consider if CPU time is cheaper than allocating 8 byte symbols
somewhere.
** DONE Tagging scheme based on arena pages
2025-04-09:21:59:29: We went for option (2) of just taking a byte for
free from the memory address and using it as our management byte.
*** 1) Page-offset schema
I've realised arenas are way better than the standard array dynamic I
was going for before.  However, we lose the nicer semantics of using
an array index for pointers, where we can implement our own semantics
regarding what bits in that pointer are free to use, when using a
normal stable pointer into the arena; the host operating system has
its own semantics regarding how pointers are arranged and this _will_
change between operating systems.  In particular, because of the way
I've arranged pages, we can't use the classic "div by 8" methodology
where new allocations on the heap generally must be aligned by 8 bytes
(u64), so we can use those 3 bits at the bottom for our tagging;
offsets into pages are where our pointers will lie and they won't
necessarily be divisible by 8.

So we can't use the pointers directly into the pages - we'll call
these types of pointers `host pointers`, because once we have them
it's trivial to access the underlying data.  We'll call the pointers
we want to make `managed pointers` because we're managing the memory
system associated with them.  We want to be able to translate from
managed pointers to host pointers.

Managed pointers are really just encodings for direct access into the
arena memory.  So in 8 bytes, we need to encode both the page and the
specific offset in that page where the pointer is pointing to.  We
also want to leave space for tagging and any metadata we might want to
store in the pointer to that data.  A schema I could think of was:

|------------------+--------------------|
| Index (in bytes) | Representation     |
|------------------+--------------------|
|                0 | Metadata (tagging) |
|              1-4 | Offset in page     |
|              4-7 | Page choice        |
|------------------+--------------------|

This gives us pow(2, 24) - 1 = 16777215 possible pages and
pow(2, 32) - 1 = 4294967295 offsets in each page.  Thus our total
addressable memory would be pow(2, 56) - 1 = 72057594037927935 bytes.

Certainly no machine would ever have this much memory and so we're
quite safe for most machines.  That reserved management byte for our
purposes (tagging, currently) will make the math to translate it a bit
easier.

Let's reason about how we'd encode and decode these addresses.  The
arena itself should provide addresses with the management byte set to
0 for the user to encode what they wish.  The top bytes should be
encoded as per the above i.e. top 3 bytes as the page index, next 4
bytes as the offset in that page.  This shouldn't be super difficult
when we're doing it within the management functions of the arena
itself as this data should be handy when performing the allocation.

When decoding these addresses i.e. retrieving data i.e. translating
from a managed pointer to a host pointer, all it will need to do is
convert the pointer into a byte buffer and copy the top 3 bytes as a
page index and the next 4 bytes as the offset in the page.  Once these
are verified to be valid, we can just access the underlying pages and
get the host pointer.  Because of how arenas work, those host pointers
will be stable regardless of any further memory management functions
performed on the arena (excluding cleanup) - so once you have a host
pointer, you can use it as much as you want without having to worry
about the pointer becoming invalid in the next second.
*** 2) 48-bit addressing exploit
Most x86 CPUs only use around 48-56 bits for actual memory addresses -
mostly as a result of not needing _nearly_ as many addresses as a full
64 bit word would provide.

So we /could/ get away with using one of those bytes for our
administrative tasks.  Since the smallest remit we have is one byte,
we'll stick to that (but maybe we could go for two bytes - need to
investigate further).

This byte should be the MSB, but using that for tagging will require
more work than the lowest byte (to look at it we'll need to push that
byte all the way down).  So we'll be going for a low byte strategy by
shifting the pointer up by one byte.  This leaves us with the lowest
byte to play with as we choose.