Composition and Diamonds

02 Feb 2017

In software, there is an ever present tempation to declare that something is finished. To look upon an artifact, to pronounce it perfect, and to believe that it will persist unchanged for all time. This is the model of “martian computing” which begat the Urbit project. And it’s wrong.

A specification is a precise description of what an entity is, typically written in terms of decomposition. An abstraction is an attempt to describe an entity, or class of entities, in more general terms. Where a specification will define precisely how something happens, an abstraction will merely state that it will happen.

Abstractions may be judged by their hardness – that is, the strength of the invariants they enforce internally or provide externally, and those which they require but leave to their environment to ensure.

Some abstractions, like the idea of a generator or a stream, are weak in that they require little and provide little. All the notion of a generator exports is a contract or pattern for getting more values and by which the source will signal when its end has been reached. Yet this is a convenient model for the sequential consumption of any number of chunked sequential or eventual value sources which presumes nothing about how the values are generated.

We can define the abstraction of

filter :: (λ a → Bool) → [a] → [a]

(Note: in Haskell notation that’s “filter is a function of an a function which returns a boolean for any a and a source of as to a source of as”) to be x for x in xs if not f(x). In Python, this exact formulation is an explicitly sequential generator which preserves the order of elements. But what does filter actually have to do? Does the order of elements matter? Should it? When should an element’s membership in the result be determined? Does it matter? Why would it matter?

The type of filter is part of the abstraction, but it is a weak contract compared to either of the operational formulations above. Consider what other functions could be defined that satisfy the type signature λ (λ a → Bool) → [a] → [a] as above. You could define a function which repeats the first element for which the provided function is true forever. You could define a function which repeats the 2nd element for which the provided function is true only as many times as the are elements in the input sequence. You could define a function which ignores the function argument and returns the input sequence. You could define a function which ignores the function argument and returns the input sequence reversed. And on and on and on and on.

A more precise definition of filter would be ∄x∈filter(f, xs) | f(x) is false. (Note: to unpack the notation, that is “there is no x in filter(f, xs) such that f(x) is false”) This is a far better, more general abstraction. At an operational semantics level, filter could shuffle. It could operate in parallel on subsequences and return a parallel “first to deliver” concatenation. It could be lazy or any manner of other things.

Let’s consider another abstraction - the (first, next) or cons cell.

+-------+------+    +-------+------+
| first | next | -> | first | next | -> null
+-------+------+    +-------+------+
   |                    |
   v                    v
  val                  val

This is, honestly, a really bad abstraction because it’s quite explicit about the details. Heck the name “cons”, “car” and “cdr” are all historical baggage. However this is an abstraction. It provides the notion of the first of a list, the next or rest of the list, and the end of the list being nil. In doing so provides a model for thought to be sure, but it hides none of the details of the machine. As processor core speed has outstripped memory access speed and as caches have become more and more important for circumventing the Von Neuman bottleneck, it has become a progressively less relevant abstraction because it is precise about machine details which are less and less appropriate to modern machines.

For this reason many Lisp family systems choose to provide what are referred to as CDR-optimized or chunked lists. These are list-like structures wherein a number of value links are grouped together with a single next link.

 +-------+--------+-------+--------+-------+-------+-//-+------+
 | first | second | third | fourth | fifth | sixth | // | next | -> null
 +-------+--------+-------+--------+-------+-------+-//-+------+
     |       |
     v       v
    val     val

For instance a list of eight elements could fit entirely within a single chunk, and occupies a contiguous block of memory which provides more cache locality for linear traversals or adding elements to the end. However, this chunked model makes splicing sub-lists, slicing, or explicitly manipulating next links expensive because the next link doesn’t exist! For instance if from (0, 1, 2, ..., 10) as a CDR₇ encoded list one were to try and slice out the sub-list [1...5], one could build a “sub-list” structure which refers to the substructure of the source list. The instant one tries to alter a link pointer within the extracted sub-list, the entire sub-list must be copied so that there exists a link pointer to be manipulated. However all these approaches to chunking, slicing, and manipulation still easily provide a common first, next, end sequence traversal abstraction.

So what does this mean about abstractions generally? Abstractions are models for computation and are relevant in a context. For instance, big-O analysis of an algorithm is an analysis of asymptotic performance with respect to an abstract machine. It is not a precise analysis of the performance of the algorithm with respect to the average or worst cases on a physical machine. These details, however, are the things which programmers care about. O(N) could mean T(100*N) or T(N/2). In order to be useful for practicing programmers, abstractions must eventually become more detailed than they must be as tools for proof. It is not enough to know that f(xs) will be sorted; programmers are at least accustomed to expectations that f(xs) will occur in such and such time and space. Were those expectations to be violated or suddenly change, program architecture decisions which presumed those performance properties would have to be revisited.

Church numerals are an interesting case of this mismatch between tools for thought and tools for implementation. They’re a useful tool for expressing abstract arithmetic and repetition in a proof divorced from any practicable machine. You can express division, remainders, negatives, and even imaginary numbers this way. Church numerals provide a natural representation for arbitrarily large values in the context of the lambda calculus. But they’re grossly mismatched with the realities of finite binary machines which working on fixed length bit vectors. Bit vector machines can’t capture the entire unbounded domain of Church numerals. But we can’t build a machine which can perform arithmetic on Church numerals with the same performance of a bit vector machine. It’s fundamentally a trade-off between a tool for thought and a tool for implementing and reasoning about a physical machine.

This pattern has consequences for the decisions we make when designing software. It may be hubristically tempting to conceive of the artifacts we develop as generation ships; construct which will long survive us without significant structural change if we but exercise appropriate art and find the right Martian gem, reality is far less forgiving. Rarely is there a diamond-hard artifact so divorced from business concerns that it can adequately weather the ravages of time unchanged.

Rather than seek such gems – or, in failing to produce such a gem, making an excessive number of trade-offs – good software engineering should be characterized by using and producing a number of small abstractions. Small abstractions are advantageous because they provide little and expose little, thus involving a minimum number of externalities and minimizing vulnerability to crosscutting concerns. In order to build a system of any size or complexity, composing several such abstractions is required. If, due to a change in externalities, one or several such small abstractions become inappropriate, replacing a small abstraction, in the worst case, involves no more impact to the system as a whole than replacing a larger – or, worse, monolithic (no) – abstraction. Due to small changing surface area it is likely that reuse between the initial and successor system states will be maximized and the cost to transition the system will be lower than if there were a large or so-large-as-to-be-no abstraction which must be replaced almost entirely.

So. Write better software by decoupling. Seek to prevent or at least minimize crosscutting concerns. Take advantage of control flow abstraction and compose abstractions together. Mars doesn’t have a monolithic diamond. It is a field of small gems.

Spoiler warning: this document is largely a product of an evening arguing with @ztellman, being fixated and barely sober enough to write when I got home. @argumatronic egged me on to finish this and subsequently was kind enough to copy edit early versions of this for me. I’ve been told that many of the ideas appearing here will soon appear in his book, and wanted to note that for the most part Zack got here first.