AIM-514, the beginning

In which I take a deliberate swan dive off the deep end.

Returning readers may recall that I am no fan of LispMs. However this semester at UT Austin I'm taking a research reading class on computer architecture. The focus of the class and hopefully of next semester's project is that of paralleism in computer architectures. As my professor has a sense of humor, his first assignment for me was to implement exactly one of the linked list interpreting machines which I previously mocked.

So here we go.

The paper in question is AI Memo 514, Steele and Sussman '79. This post is just an analysis of the paper which I wrote to force myself to understand Steele's memory controller. This post is not a presentation of or introduction to either Steele or my designs. Rather it is a bunch of notes on both which I needed to get out of my head and figured I would publish. Future posts will present and talk about the resulting FPGA implementation.

Memory model

This machine uses a 16 bit address space and 24 bit words. Pointer words are pairs [tag:8, value:16] where tag is one of the types or opcodes, and value is a memory address. Other word interpretations also exist but are special cases.

That structures tend to be small, randomly accessed and short lived is a long established fact in the study of garbage collection and object lifetimes. A naive memory model where literal lists are used to represent all storage while possible sacrifices a lot of space storing link pointers, and a machine using this model sacrifices much time traversing link pointers. Were I to implemnent a strictly linked list based system, a 48 bit word fully encoding a pair (car, cdr) would be more natural. However as hinted above and as studied in the various cdr coding papers (AIM-578 ~ Steele '80 & citations) frequently the structure we desire is not truly a linked list although we may encode it as one, rather it is a buffer or an n-tuple which may have some very compact & efficient vector representation given known size.

This suggests that buffer allocation while not logically part of the memory model must be supported, and leads to the conclusion of a 24 bit word size.


Steel observes on Pg. 13 that a constant requirement of the naive list interpreters presented earlier is to test the head of the list for equality with respect to the set of symbols which the interpreter knows how to handle. It would be far more efficient Steele observes to simply encode in the type field of the pointer to the list what the op to be performed on that list is. This encoding saves at least one memory access per expression evaluated and is a subtlety worth noting.

Note: I shall abuse the → notation here. → b denotes a pointer to whatever b is. From Clojure, ^foo is understood to denote that the following expression has metadata foo. ^{:tag int} x for instance would be a legitimate expression, the shorthand for which is simply ^int x since the common case of metadata is tags. So ^cons → (e₁ e₂) would be understood to mean a pointer to the list (e₁ e₂) having the tag symbolically meaning cons.

Steel proposes for instance that if and cons may be encoded with such tags, showing the following example on Pg. 14:

(if a
  '(x y)
  (if c
    (cons e 69)))

The encoding of which is given to be as follows:

^if → (^var → (0, 3)
       ^list → (^symbol → x
                ^list → (^symbol → y
                         ^symbol → nil))
       ^if → (^var → (1, 2)
              ^list → (^symbol → d)
              ^comb → (^var → (0, 1)
                       ^more → (^number → 69
                                ^cons → nil))))

One such optimization noted on Pg. 13 is that while naively local variables could be implemented by maintaining and walking an association list as Scheme is lexically scoped we can instead statically write a pair [n, m], where n is the number of binding stack frames (0 or greater) which we must traverse up to find the scope of the binding in question and m is the binding number at that level. By way of example, consider the following:

(let ((x 1)
      (y 2))
  (let ((z 3))
    (+ x ; relatively local (1, 0)
       y ;       ...        (1, 1)
       z ;       ...        (0, 0)

Steele makes use of several types, but does not discuss any of them at length here. Number, Symbol, List, Variable and Procedure all occur as tags for values. Furthermore the ops If, Bind, Local, Funcall, Cons, Car, Cdr, Atom, Combination and More-args appear. The implication is that there are several more such types which are not mentioned, however I have yet to discover them.

Memory controller

On Pg. 18, Steele discusses a multiple cycle memory controller (seemingly a simple stack machine) which operates on pointers and has the operations push, pop, drop, setq, cons, car and cdr. Push is a control state on which the value of the inbound bus is pushed onto the memory controller's stack. Pop writes the top of the stack out over the result bus. Drop discards the top of the stack. Setq writes the top of the stack to the 2nd of the stack, consuming both. Cons consumes two elements from the stack and pushes one (a new cell) back. Car and cdr both consume the top of the stack, pushing a replacement result value.

Steele's design does not feature a garbage collector due to chip area restrictions and by way of appology to the effect that it could be done cites much work on the subject of garbage collection. Due to having a large (8k word) memory in my prototype and wanting to get something working I will also for the time being ignore garbage collection.

One could implement a mark and sweep collector by changing the pointer format from having 8 bits of tag to having seven bits of tag and one of gc flag. A single bit is kept to count the number of garbage collection passes which have occured. When allocating resources, the tag bit is set to the value of this bit. When marking garbage, the complement of this bit is used to denote that space is reached and that value is recursively written to all reachable objects starting from the register set. Collection and compaction of freed space into larger regions then occur as one would expect and finally the counter bit is flipped. This gives the invariant that all storage is tagged as unused at the start of each collection pass and saves the need to traverse all of storage to reset collection flags.