Intermediate Abstraction

27 Jul 2017

This talk was presented at the Bay Area Clojure meetup, hosted by Funding Circle. These notes are presented here with thanks to the organizers and attendees.

1. Preface
2. Context
3. Complexity
4. Abstraction
5. Performance
6. Fin: Purism

1 Preface

Hey folks! For those of you who don't know me, I'm Reid McKenzie. I've been writing Clojure on and off for about four years now. These days I work for Twitter as a Site Reliability Engineer.

Twitter Site Reliability is really small. We're not even 150 engineers tasked with ensuring that all the infrastructure and services required to support the business stay online. It's clear that as the business continues to grow, we in SRE can't continue be traditional reactive system operators. We must be software engineers developing automation and designing systems capable of intervention-less recovery.

Achieving quality in software engineering means we must develop an understanding of how to exercise our powers of abstraction, and that we appreciate and managing the complexity with which we are surrounded. This talk presents some formal and philosophical concepts intended to elevate the ways we approach problem solving which will I hope be useful to you.

2 Context

Programming is a young profession all things told.

If we trace the roots of our profession to Lovelace as she is the first to have written of a machine which could be used for general purpose and perhaps program or modify itself then our profession such as it is dates to the 1830s. Just shy of the two century mark.

Date	Who	Event
1928	Hilbert	Entscheidungsproblem
1933	Godel, Herbrand	μ-recursive functions (simple arithmetic)
1936	Church	λ calculus, models Entscheidungsproblem with reduction
	Turing	Turing machine, models Entscheidungsproblem with halting
1941	Zuse	First programmable, general purpose computer
194{5,6}	Von Neumann	Von Neumann single memory architecture
…
1950	USGOV, Banks	Computers become common place for census & data processing
1956	Bell Labs	Transistors are invented
1970	IBM, Commodore	First transistorized PCs come to the market
1976	Wozniak	Apple I
…
2017		Us, here, today, floundering variously

Computation as a science is grounded in the concepts of evaluation and analysis. The science is in the theoretical limits of what a machine could do. The question of how to achieve something on a given machine is the province of software engineering.

Software engineering is a younger discipline still. We've only really been writing software since the '50s (Excepting Lovelace's programs which never had a machine), so call it 60 years of recognizable programming during the course of which huge shifts have occurred in the industry.

In retrospect one can only wonder that those first machines worked at all, at least sometimes. The overwhelming problem was to get and keep the machine in working order.

– Dijkstra "The Humble Programmer" EWD340 (1972)

The problem with software engineering is, well, for the most part we never get to do it. The profession of programming as adopted in industry is rooted in the now seemingly long tradition of just getting the machine to work at all and then "debugging" it which has carried since the first programs in the '50s. The operations tradition from which SRE programs are usually developed is particularly guilty of this.

Rather than have a concrete body of science to reference and with which to inform decisions, every working programmer develops an aesthetic taste for what seems to them like the right way to solve problems; what kept the computer or systems of computers in working order and solved the problem best. The problem with taste is that it is deeply subjective. It reflects our personal experiences in education, what javascript frameworks we've played with on the side and how we did it at $LASTJOB. When tastes conflict, they can't inform each-other except with great difficulty because there's no underlying theory which can be used to relate and moderate them.

The question is how do we earn the E in our titles. And the answer is not to continue artistically operating the systems we have or continuing to do what seems to work. We must develop and communicate formal understandings for the tools and tactics we use.

3 Complexity

Simple; Simplex; To have but a single strand, to be one.

Complex; To be woven from several strands.

Complect (archaic); To plait together.

Easy; from the French alsie meaning comfortable and tranquil. Not difficult, requiring no great labor or effort.

– Rich Hickey "Simple Made Easy" (2012)

Often, we miss-use simple to mean easy.

Ideally, we want to write code which is both simple and easy. This requires that we be concious both of complexity and of ergonomics. However we only rarely have objective critera for these two critical dimensions.

The fact that a perfectly reasonable engineer may consider a tool sufficient and easy while another may just as reasonably a tool baroque and overbearing is a core motivator for this talk. Sharing an objective metric of simplicity clarifies otherwise subjective questions and enables us to agree upon measurements of solution quality.

Thankfully there is prior art on the subject of attempting to estimate complexity. Karl Popper's Logic of Scientific Discovery builds a framework of formal logic with which to formalize the scientific process itself.

Popper bases his project upon falsifiability and corroboration through experiment. A hypothesis for Popper is a formal statement which is falsifiable because it implies testable outcomes. An experiment cannot confirm a hypothesis, because to do so would imply that no other experiment (state of the entire universe) could possibly falsify the hypothesis. However, we can fail to falsify the claims of the hypothesis, and we can even show that the experiment produced a state which corroborates the prediction(s) of the hypothesis.

Popper proposes that the complexity of a theory can be measured by reasoning about the set of states of the universe which would falsify the theory. These sets can be related to each other and relative complexity measured by establishing implication and subset relations, but this is far from enough to be a really working rule of measurement. Comparing infinities (sets of states of the universe) is hardly intuitive.

The good news is that there are ways we can do this!

Cyclomatic complexity (McCabe) measures the number of control paths through a program. This approach works, and even gives an obvious heuristic for recognizing complexity in the simple number of branches but it is uni-dimensional and doesn't capture the fact that our programs have and manipulate state.

Dataflow complexity is a related family of metrics which try to measure the possibility space of the data (state) in a program. Unfortunately, beyond giving an intuition for the idea that complexity grows with the memory footprint of a program there's not an efficient way to quantify or relate this kind of complexity.

We get into this mess of having to count states in order to talk about complexity because both of these complexity estimators use the model of a Von Neumann finite state machine machine whose states cannot easily be related to each other.

If we go back to Popper, if we have two logical reduction statements P and Q, we can estimate their relative complexity by simply comparing the number of degrees of freedom of each expression. Likewise using the μ calculus (simple arithmetic and reduction) or the λ calculus (function application and reduction) we instantly regain the property that we can understand the complexity of an expression or program merely by looking at the number of terms it involves and counting them.

This intuition is supported by cyclomatic complexity and dataflow complexity metrics because (for programs which halt under reduction) the complexity of a program written with reduction is, well, one. Each datum and program point is unique and occurs only once.

λx.xx λx.xx

→ λx.xx λx.xx

That this must be an estimate, not a precise metric as a divergent combinator such as ω will mess this whole thing right up. But per the halting problem there's not much we can do here so if we accept (as we must) that this is an incomplete analysis and restrict ourselves to the domain of terminating expressions we can still get a lot of utility out of this rule of thumb.

Now, we don't write purely functional programs, and when we do write programs which use functional tools and abstractions there's still global state lying around because we compute on Turing machines not graph reduction engines (those are really hard to build). How do we estimate complexity for real programs then?

It has been suggested that there is some kind of law of nature telling us that the amount of intellectual effort needed grows with the square of program length. But, thank goodness, no one has been able to prove this law. And this is because it need not be true.

– EWD, "The Humble Programmer"

There is room for reasonable disagreement here, but I'd propose a very simple heuristic for estimating complexity; the product of the following properties of a program or program segment.

1 + Number of input parameters
1 + Number of branches
1 + Number of back-edges
1 + Amount of REACHED CAPTURED state which is accessed (maybe 2x cost of a parameter)
1 + Amount of REACHED CAPTURED state which is modified (maybe more than 2x the cost of a parameter)

(defn map [f coll]
  (when coll
    (lazy-seq
      (cons (f (first coll))
            (map f (rest coll))))))

That is, a function which accepts more parameters and takes many branches using them is more complex than a function which accepts a function and a few parameters as parameters and merely delegates to the argument function. The real branching behavior of such an applied higher order function may be complex, but the function itself is simple. It captures little state and does little directly.

(defn log*
  "Attempts to log a message, either directly or via an agent; does not check if
  the level is enabled.
  For performance reasons, an agent will only be used when invoked within a
  running transaction, and only for logging levels specified by
  *tx-agent-levels*. This allows those entries to only be written once the
  transaction commits, and are discarded if it is retried or aborted.  As
  corollary, other levels (e.g., :debug, :error) will be written even from
  failed transactions though at the cost of repeat messages during retries.
  One can override the above by setting *force* to :direct or :agent; all
  subsequent writes will be direct or via an agent, respectively."
  [logger level throwable message]
  (if (case *force*
        :agent  true
        :direct false
        (and (clojure.lang.LockingTransaction/isRunning)
             (*tx-agent-levels* level)))
    (send-off *logging-agent*
              (fn [_#] (impl/write! logger level throwable message)))
    (impl/write! logger level throwable message))
  nil)

...

(declare ^:dynamic *logger-factory*)

...

(defmacro log
  "Evaluates and logs a message only if the specified level is enabled. See log*
  for more details."
  ...
  ([logger-factory logger-ns level throwable message]
   `(let [logger# (impl/get-logger ~logger-factory ~logger-ns)]
      (if (impl/enabled? logger# ~level)
        (log* logger# ~level ~throwable ~message)))))

An expression which does little, but does so using values which it draws from global state such as a configuration value may still be simple, but it is not so simple as a function which accepts that same configuration structure directly as a parameter. This complexity becomes evident when testing, as a test for the simple function merely requires calling the simple function with the appropriate configuration value where testing the globally configured function requires manipulating the state of the entire application so that the shared mutable configuration state conforms to the tests requirements.

We can see that the log macro is actually quite complex becuase it reaches a significant amount of global state despite the fact that the log macro itself doesn't actually do that much. The log macro has to check in a global mutable registry of namespaces to logger implementations for a logger, check the whether that logger is enabled and then do the logging, potentially manufacturing a new logger using the factory from global state if it has to.

(defn mongo!
  "Creates a Mongo object and sets the default database.
      Does not support replica sets, and will be deprecated in future
      releases.  Please use 'make-connection' in combination with
      'with-mongo' or 'set-connection!' instead.
       Keyword arguments include:
       :host -> defaults to localhost
       :port -> defaults to 27017
       :db   -> defaults to nil (you'll have to set it anyway, might as well do it now.)"
  {:arglists '([:db ? :host "localhost" :port 27017])}
  [& {:keys [db host port]
      :or   {db nil host "localhost" port 27017}}]
  (set-connection! (make-connection db :host host :port port))
  true)

A function which changes state may be complex compared to a function which produces no effects and returns a value, but it introduces far more whole program complexity if global state is modified especially if it is modified many times.

This supports the intuition that perhaps sorting a list received as an argument in place doesn't add so much whole program complexity because the effect is restricted in scope to the caller and wherever the list to be modified was sourced from whereas reading and manipulating the ERRNO global value in a C program may directly impact any number of other program behaviors. Such modification defeats referential transparency, and forces us to use sequential logics at the level of the whole program to achieve understanding.

We use multiplication rather than addition to reflect that really what we're trying to approximate is the VOLUME of a multi-dimensional state space, which is measured by the product of the length of the sides, not their sum.

With apologies to Dijkstra, I note that it is quite possible for a program's complexity by this metric alone to grow at least in proportion to the square of the program's length. However there's also still plenty of room at the bottom for simple programs to achieve near-linear complexity growth.

4 Abstraction

We all know that the only mental tool by means of which a very finite piece of reasoning can cover a myriad cases is called “abstraction”; as a result the effective exploitation of his powers of abstraction must be regarded as one of the most vital activities of a competent programmer. In this connection it might be worth-while to point out that the purpose of abstracting is not to be vague, but to create a new semantic level in which one can be absolutely precise.

– EWD, "The Humble Programmer"

What is an abstraction?

Abstractions can be considered to consist of a model, interface and an environment.

The model of an abstraction is the set of operational semantics which it exposes to a user. For instance a queue or channel as an abstraction provides the model that a user writes to it and a consumer reads from it. Reads may be in some order such as FIFO or LIFO, or un-ordered with respect to writes. A bounded queue may provide the additional semantics that, if the writer outpaces the reader it can only get so far ahead before the queue put operation blocks the writer. A Concurrent Sequantial Processes queue provides the operational semantics that when a write occurs the reader is awakened and so only the reader or the writer is ever operating at once.

The interface is the set of operations which the abstraction provides to the user. For instance the channel can have elements enqueued and dequeued. Perhaps the queue can also be closed at one end, denoting that either the source or the destination is finished whereupon the other end will eventually (or immediately) also close.

Abstractions don't exist in a vacuum. They have expectations which they may demand of the outside world (externalities) such as restrictions on how they are used or in what context they are employed. Everything that the model omits is ultimately an assumption about the environment.

It may be helpful to note that the environment is the dual of the model. Anything that is not part of the model must be part of the environment, and by growing the model we can shrink dependency on the environment.

This is deeply significant, in that mismatch between environments and reality results in friction for the user, and tends to be resolved not by abandoning the abstraction but by humans adapting to the restrictions imposed upon them by inadequate tools.

Abstractions may be compared - one said to be simpler than the other - on the basis of the size of the model, interface and expectations imposed on the environment. The model after all is a space of states, the interface a set of functions above, and externalities may be either a set of propositions or a space of states as the two are equivalent.

The ergonomics of abstractions may also be compared - one said to be more consistent than the other - on the basis of the consistency of the names and structure of the abstraction's interface with each-other and predictability of the relationship between the interface, the model and its externalities.

4.1 Examples of Abstraction

Per the Dijkstra quote above, abstraction is all about changing or choosing semantics. I'll give a treatment of linguistic abstraction shortly, but by naming things we can build semantic levels - languages - which convey what we want.

We can abstract over control flow by using higher order functions which provide for instance conditional execution giving us a tool with which we can talk about for instance conditionally applying a transformation.

We can abstract over values by choosing types or decomposing values into their components, using aggregate logical values and references which enable us to refer to parts of our data.

However it is not abstractive to perform computation. When we "compute" (reduce) an expression, we discard information irrecoverably. For instance an average is not meaningfully an abstraction over a time series. It is a datom, the product of some other irrecoverable data under a function. By computing this new value, we've given ourselves an aggregate view which no longer carries the same semantics and cannot meaningfully be considered to be equivalent to its source.

4.2 Limits of Abstractions

Unfortunately, abstractions do have limits.

Traditional types which you may be familiar with, your int and char, are really just raw presentations of what the physical machine can do with a given value. Rarely however do these presentations of machine capabilities map to the senses which are meaningful to us. They are the poorest possible of abstractions - none at all.

#include <stdio.h>
#include <stdint.h>

#define EVAL(...) printf("%s: %d\n", #__VA_ARGS__, __VA_ARGS__)

int main(int argc, char** argv) {
  EVAL((int32_t)(1<<31)-1);
  EVAL((int32_t)(~(1<<31)));
  EVAL((int32_t)(1<<31));
  return 0;
}

(int32_t)(1<<31)-1: 2147483647
(int32_t)(~(1<<31)): 2147483647
(int32_t)(1<<31): -2147483648

How many times in Java have you actually wanted precisely a value with 2³² states of which 2³¹-1 states are said to represent positive numbers, 2³¹ states represent negative numbers and the zeroed bit string represents … zero. And arithmetic wraps around, flipping the sign should we excede the bounds of [-2³¹, …, +2³¹-1]. We also know that while we have a vector of bits … somewhere and one of them is a sign, and we can count on the 32nd bit being the sign we can't actually make endianness assumptions about the bytes in our integer(s). So much for bitmasking tricks.

This abstraction of an int type provides addition, subtraction, multiplication, division, absolute value and of course comparison. All of which we associate with the concept of an integer. Furthermore this abstraction supplies, by specifying model of the machine representation, all the operations we customarily associate with a vector of bits - and, or, xor, shifting, inversion.

If all we want is addition and comparison, the rest of this behavior is not just semantic baggage, it adds complexity. What if I want to count to 2³¹? I have to be aware that this abstraction's model says I'll get -2³¹+1, not the value I expect. Or perhaps an out of band signal that an IntegerOverflowException occurred, which constitutes another cross-cutting concern because now the abstraction implies a system of exceptions with which I must be familiar.

For a large domain of problems, this abstraction does well enough. However we must be constantly aware of its failure modes. We as professionals bear a constant cognitive cost in conforming ourselves to the limits which this abstraction presents to us.

It should be recognized here that Haskell, Lisp(s) and other languages which provide arbitrary precision integer types out of the box capture a simpler programming model by default. A model which has more implementation complexity to be sure and consequently expects more of the environment for instance the presence of a memory manager but the interface is smaller and less thorny.

4.3 Inheritance, Composition & Re-use

Software quality is defined first by solving the business needs of the here and now. If software or strategy or any other tool doesn't do that, it's worthless. Software is a form of physical capital in the same sense as a steam engine or a factory. A factory which can be re-tooled to produce articles of a different design is more valuable than a factory which will require full replacement should the product change. Likewise an engine or assembly line which must be significantly reworked in order to add capacity or change task is less valuable than a design which naturally features a pattern along which it can be changed without significant alteration.

Object Oriented Programming (that is, programming via encapsulation and inheritance) promised that it would improve among other things software reusability when it broke into the scene.

The problem with inheritance as a strategy in practice is that any tool which wants to participate in some abstraction is forced to conform to the expectations of the interface established by the parent which the child wishes to participate in. The interface must accumulate.

Furthermore, reuse through inheritance has the problem of capturing mutable state. An extension or wrapper class must be intimately familiar with the state of the class it wraps and the expectations it may make. The model accumulates.

In "Simple Made Easy" Rich gave a good treatment of this, pointing out that code partitioning doesn't imply decoupling and pointing to Interface Oriented Programming is the response to this problem, because interfaces only bring with them their direct set of expectations.

To be precise about what's happening here - the model of inheritence oriented programming forces us to accreet interfaces and externalities. Thankfully this is not an essential property of abstraction, merely a property of this particular technique.

(defn mapcat [f & colls]
  [f & colls]
  (apply concat (apply map f colls)))

When we take two abstractions and put them together, we say that we've composed them. For instance two functions compose together to make a third. The map function composes with the concatenate function to give us mapcat. map lets us apply a function over many values, and concat allows us to join sequences together. We can thus define mapcat to be a function which lets us join the results of applying a function which produces sequences to many values. The composite model of mapcat requires only the additional constraint that f: a → [b], that is the function to be mapped produces results which can be concatenated.

(def evens #(keep even? %1))

We can build abstractions which have smaller interfaces. For instance if we take a function of a configuration and a datom and we partially apply it with a configuration, we now have a fixed function of a datom. Its model is smaller - we know how it will behave because we've specified whatever configuration(s) the larger base model depends on and the interface is smaller - it consists only of the datom to manipulate.

Building a small wrapper around a large Java API would be a good example of such a composite abstraction with a smaller interface.

We can also build composite abstractions which traverse the model and externality trade-off. For instance we can build abstractions which simplify the set of externalities by providing input or state validation where previously we had unchecked expecatations. This expands the model since now the expectations are explicitly modeled and checked. We can also reduce the complexity of the model by choosing to make assumptions.

4.4 Naming

Credit: Zachary Tellman, Elements of Clojure (unreleased).

Linguistic abstraction in the form of naming is the most fundamental tool available to us as both artisans and engineers.

Names being the tools we use to understand our tools, choosing good names (whatever that means) is of primary importance. But what characterizes a "good" name?

Frege suggests that a name consists of three properties - the sign, the textual representation of the name, the referent being the entity referred to by the sign, and finally the sense, being the properties ascribed to the referent.

The traditional example is that Phosphorous (morning star) and Hesperus (evening star) are both Greek celestial bodies which we now understand to both name the planet Venus. The sign and referent are according to Frege insufficient to understand these two terms which are prima facie synonyms because they don't actually interchange.

A good name must satisfy two properties - it must be narrow and consistent.

A narrow name excludes things it cannot represent; its sense is only as broad as the context of its use requires. For instance if the only expectation to be made of a reference is that it will be a mapping, then map or its contraction m is a perfectly acceptable name if that is the only sense in which the referent is used.

A consistent name shares the same sense for a referent as is used for that referent in other related contexts. To continue the example of a mapping, to subsequently sign the same referent as dict elsewhere would be less than ideal both because it's a broader name which implies a specific kind of mapping and thus no longer shares the same sense. A simple mapping m may only imply clojure.lang.ILookup, whereas hashmap implies java.util.HashMap and side-effectful update. Naming and naming conventions are tremendously important in the context of dynamically checked systems wherein names largely depend on their sense to communicate the type or at least intent of the use of the referent.

Lets consider some examples (adapted from Zach's fine book)

;; This is a Very Bad expression.
;;
;; The left referent is very broad, when we clearly use it in a narrow sense. It
;; should be more narrowly named.
;;
;; The value we get out of our broad referent is named very narrowly. The single
;; string contains far too much sense.
(get m "sol-jupiter-callisto")

;; This is a better equivalent expression, because we communicate a more
;; appropriate sense for our previously over-broad sign, and we unpacked the
;; previously overly precise key into a set of more general structures each of
;; which communicates less sense.
(get-in starsystem ["jupiter" "callisto"])

;; Better yet capture the logical operation, the fact that Callisto is a moon of
;; Jupiter is an assumption in all of the above expressions.
(get-body-by-name system "callisto")

;; It would also be better to be completely precise about the sense of the name
;; Callisto by doing something like
(-> (get-system "sol")     ; This communicates that we want to get a "system" named sol
    (get-body "callisto")) ; And we care about the component "body" named callisto

;; This expression further communicates the sense that Callisto is a moon of Jupiter.
(-> (get-system "sol")
    (get-body "jupiter")
    (get-moon "callisto"))

;; Both of these two latter expressions are better because more of the relevant
;; sense of the name callisto is captured explicitly, rather than being implied
;; upon or expected of the seemingly broad mapping referent.

4.5 Achieving Abstraction

As suggested in the discussion of the consistency of signs, part of the problem here is that we're forced by the tools we use to project our sense directly upon the sign.

Types: A heresy	static sense	dynamic sense
static dispatch	Haskell, C, Java	Java (class casting)
dynamic dispatch	C (dyn linking)	Smalltalk, Python, Ruby

Static type systems provide a model with which we can talk about the sense we associate to a name by capturing at least some of the sense in which we intend to use the referent as a type and knowing that if we can determine a type (and the type must conform to a type of which we knew ahead of time) then we can begin to understand the behavior of our systems through the lense of the types we've assigned to our data.

Some systems allow us to use this information statically to check that at least some part of the sense or type is consistently respected in the use of a referent.

Some systems enable you to capture more sense than others, by encoding the sense statically in what we usually call a type.

Type systems can be compared in terms of how much sense they enable you to capture, and how difficult it is to do so.

Lets say that we wanted to count up from zero, to continue the example of positive integers from above. We could choose a sign with which to reference our value which indicates that it's an index such as the traditional i, or idx or even index depending on individual taste. But sometimes we want to play clever tricks with the fact that sometimes negative indexes are defined, or we want to be tricky and reach back to some location "before" in memory where we're currently indexed to and soforth. The choice of sign implies only sense, it cannot constrain the referent.

However, we can choose abstractions which let us do so. For instance, we could build an abstraction which only allows us to capture ℕ, the natural or counting numbers.

from collections import namedtuple
from functools import total_ordering

@total_ordering
class N(namedtuple("N", ["value"])):
  """A numeric value constricted to N (the counting numbers)."""

  def __new__(cls, value):
    if value < 0 or isinstance(value, float):
      raise ValueError("N is constrained to [0, 1, ...)")
    return super(N, cls).__new__(cls, value)

  def __int__(self):
    return self.value

  def __add__(self, other):
    return N(int(self) + int(other))

  def __sub__(self, other):
    return N(int(self) - int(other))

  def __eq__(self, other):
    return int(self) == int(other)

  def __lt__(self, other):
    return int(self) < int(other)

This isn't so different from the full Haskell implementation of Numeric.Positive.

There are techniques which can be used to try and capture more sense. This technique is best known as "smart constructors". Rather restrict the sense to the sign, we've defined a type which can only take on values conforming to our desired sense.

The additional semantic restrictions we can impose on ourselves from the type free us from either having to check the sense ourselves over and over again which is the most verbose and fragile path, or from settling for the integer sense we get "for free" and hoping that it's enough.

This tactic can be applied to any domain where we want to give a type to a subset of some other space of values. For instance strings of a known format. It is just window dressing, but frequently it's enough that we can approximate the semantics we actually want.

To take another example, what if we were to try and make a type for a Start of Authority (SOA) version? We could use a simple string, after all that's what the SOA becomes when we lay it out in a zone file, and the only requirements we have of an SOA is that it increase to define an ordering among zone file versions.

Generally however, strings (even constrianed strings!) should be avoided. Text is the least structured form of data. Parsing is, even with the aid of parser combinators and regular expressions, difficult and error prone. It's far better to use fully realized data representations which can be rendered to strings than to keep data as a string and decode it.

Which carries more semantic meaning - the string "20170706001", or the tuple

from collections import namedtuple
from scanf import scanf  # 3rdparty

class SOA(namedtuple("SOA", ["year", "month", "day", "version"])):
  """A tuple capturing a DNS SOA format."""

  PATTERN = "%04d%02d%02d%03d"

  def __new__(cls, text):
    """Decode a SOA string, returning a SOA tuple"""
    return super(SOA, cls).__new__(cls, *scanf(text, self.PATTERN))

  def __str__(self):
    """Encode this SOA as a string"""
    return self.PATTERN % self  # self is an appropriate 4-tuple already

print SOA("20170706001")
# SOA(year=2017, month=7, day=6, version=1)

The tuple can be rendered to and parsed from text as required, and above all it captures the sense in which we use SOAs which leaving the SOA as a plain string does not. If strings are just passed through and not manipulated, then perhaps it's acceptable to keep them represented as strings and skip decoding but structures like this which capture more of the sense of our data should be preferred.

Unfortunately there's often lots of other sense we may care about - for instance that we have a number within the index bounds of some structure or that there exists a record in the database with an id primary key of this value for any possible value. Checking these senses statically may well be impossible. There could be a concurrent change to the database and a once valid identifier becomes dangling and soforth.

Static types in the Haskell sense of things help, they allow you to capture some of the relevant sense - enough in my experience that software development may be easier because the type system helps you especially when you need to change abstractions but thanks to the completeness theorem we know all too well that they can never check every property we may care about.

4.6 Utility Is Contextual

It's tempting to think that we can, through great skill or shear luck, discover perfect abstractions; Platonic ideal or Martian crystalline forms of a concept which satisfy all our needs for all time. The problem with this view is that different contexts have different needs. Many objects may generate the same shadow on the wall of Plato's cave.

For instance above I attacked the "stock" numeric abstractions which include hardware implementation details, suggesting that arbitrary precision math is a better default. I stand by that argument, but it must be recognized that there are occasions when we must make different tradeoffs in choosing our models.

Sometimes we need to interface with the machine. Sometimes we want to use (or author) bit vectors. Sometimes we don't have a memory manager and can't afford truely arbitrary precision arithmetic. Sometimes the performance degredation of operating on a double floating point number as opposed to a single or half width floating point number is meaningful and we must make a choice to increase complexity and sacrifice ergonomics in the name of utility. That some machine type is the shared interface of some other abstraction and that the externality of the machine's particular wall time performance relate to our context changes what is appropriate.

Just as we should exercise caution in choosing our names so that we don't choose overly narrow or broad names, so we must exercise caution in choosing our abstractions. The simplest possible abstraction with the fewest possible externalities may well not be the most appropriate to the domain, and worse still a simple abstraction is not likely to lend itself to change. Like other crystals, abstractions shatter or shear rather than bend from their natural shape.

Coupling of abstractions - allowing the tools you choose to share externalities and to expect one-another is dangerous because it means that even small deviation from the original requirements and context may prove cause for significant change with rippling consequences for your entire system.

5 Performance

The greatest performance improvement of all is when a system goes from not-working to working.

– Osterhaut "My Favorite Sayings"

Performance and efficiency generally are factors in the measurement of how well a tool solves or responds to the needs of the "user", whether they be a person, business or other entity.

A code breaking machine which can decode a daily message in 32 hours isn't sufficient. It can't decode messages fast enough for them to be relevant. Such a machine is not useless, but it doesn't solve the problem set for it adequately. Three such machines together will provide, at an 32 hour delay each, the previous day's message and perhaps a fourth or even pairs of machines may be supplied for reliability in order to provide a more-or less continuous feed of intelligence but it would still be yesterday's data today and delivered at great cost.

In this example we have an objective criterion of how much performance is "enough". The cracking machine must have a cracking rate of at least 1 msg per 24 hrs, and we have an objective measurement that we're only able able to crack at the rate of 1 msg per 32 hrs.

There are two primary approaches to achieving performance where it is required. The first approach is to simply do less work - either by reducing the problem using algorithmic optimization, or by reducing the size of inputs by firing customers or negotiating system scope. When optimization must occur, algorithmic or organizational optimization should be by far preferred in the interest of overall program simplicity with all its other virtues.

The second approach is to tune what work must be done to better suit the machinery we use to do it. This is what we usually mean when optimization comes up, although in general we should prefer to find ways to simplify and reduce the problem.

Two opinions about programming date from those days. … The one opinion was that a really competent programmer should be puzzle-minded and very fond of clever tricks; the other opinion was that programming was nothing more than optimizing the efficiency of the computational process, in one direction or the other.

– EWD, "The Humble Programmer"

To be clever means to be "sharp", "quick" and "skillful". The old English roots suggest a sense of "cutting" and "keenness". It would be cleverness to design a tool which does the difficult or seemingly impossible within resource constraints by leveraging intimate knowledge of the system.

This formulation of a "clever trick" implies that a really clever trick must also have a high complexity. This is the root of the trouble with optimizations of the second kind. Clever tricks while perhaps relevant in the moment are in the long term likely to prove troublesome.

Worse, clever tricks of this kind require specializing some part or all of your program so that it reflects performance considerations. This has the particular negative consequence of meaning that your code cannot be so simple and algorithmic anymore - it must be reflective of some state(s) or trick(s) used to achieve acceptable performance because performance is now a crosscutting concern which must be accounted for by your abstractions.

Frequently, reasoning from first principles is not enough to produce fully optimized code. Modern high performance sorting algorithms are absolutely fascinating, because while the algorithmic complexity may suggest that provably optimal algorithms such as mergesort or quicksort should be universally adopted it turns out there are frequently ways to cheat and do better. Modern collections libraries use adaptive sorting strategies which benefit from existing sortedness in the input collection and frequently contain multiple different sorting implementations which are applied to different sizes of collection.

Even apparently simple statements like "doing two string concatenations will be slower than doing one" are impossible to evaluate without a real effort at profiling. The JVM among other platforms recognize that string manipulation is "expensive" due to the need to resize and copy buffers. However, both the Oracle JDK and OpenJDK actually include several optimizations which detect code doing trivial string concatenations and silently replace it with equivalent code which uses the StringBuilder class which delays buffer resizing as long as possible.

If you want fast, start with comprehensible. You can't make it fast if you can't change it.

– Paul Phillips, "We're Doing It All Wrong" (2013)

Unless there are known constraints, such as here the hard throughput requirement of a message a day question of "is it fast enough?" and the statements "this isn't performant" or "we need to do this for performance" are totally meaningless. Worse, they detract from the goal which should be to ship a tool which serves a purpose and introduce artificial pressure to conform the choice of abstractions to arbitrary false goals.

6 Fin: Purism

Niklaus Wirth's work is notable superficially because Wirth is himself a prodigious hacker. He's responsible for, among other projects, no fewer than five separate programming languages and two operating systems. What's the trick? How does he do it?

In the implementation of the Modula-2 compiler, Wirth and his students used two metrics to estimate and track its quality over the course of its evolution. One was the simple line count of the program - a poor proxy for complexity but a data point none the less. Another was the time which it took the compiler to compile itself.

One of Wirth's students implemented an elegant hash table backed by binary trees, for use in mapping strings to symbols - to look up local variables and implement lexical scope. Doing so introduced a new, complex datastructure, increased the line count of the compiler and added at best no performance benefit for it turned out most scopes in real programs were small for which a simple association list could be faster. Wirth ripped the hash table out.

Wirth was also a great user of tools, such as Backus-Naur Form and parser generators for expressing precisely the meaning which he wished to ascribe to a given set of symbols and the notation which his languages should represent. Even in these, Wirth's ascetic attachment to simplicity is clear - all of his languages are simple and can be efficiently implemented without any clever tricks whatsoever.

Simplicity and quality aren't properties which code or systems get by osmosis or by grant of the gods. They are the product of deliberate effort by programmers to resist the entropic pressure applied by reality and customer requirements on systems and to in spite of that pressure carve out cohesive, appropriate abstractions which leave room for growth.

It would be equally absurd, then, to expect that in unjust, cowardly, and voluptuous action there should be a mean, an excess, and a deficiency; for at that rate there would be a mean of excess and of deficiency, an excess of excess, and a deficiency of deficiency. But as there is no excess and deficiency of temperance and courage because what is intermediate is in a sense an extreme, so too of the actions we have mentioned there is no mean nor any excess and deficiency, but however they are done they are wrong; for in general there is neither a mean of excess and deficiency, nor excess and deficiency of a mean.

– Aristotle, "Nichomachean Ethics", Book II, Ch. 6

In the "Nichomachean Ethics", Aristotle (or rather his student Plato writing in his master's name) develops a system of ethics on the basis that one must always choose the mean between excess and deficiency.

There is a fashion in the software community to be "pragmatic". That is, to be tempered or practical with respect to an attitude or policy. Unfortunately, if you temper temperance with an added dose of reality, you no longer have "pragmatism", you have a regression. Software as an industry is built on an incrementalism of "progress" which is informed by experience but largely unaware and often dismissive of the possibility space.

Before we can claim to be pragmatic, we must have a vision of the perfection from which we turn aside in practical interests. It is my opinion that the overwhelming majority of working programmers are not conscious of what could be; of how higher order tools could help them; of how the pains they feel are the fault of specific failures of their tools and are not essential.

My goal with this talk was to share with you some of the philosophy and purism which I carry with me and which I hope helps me to strive for more perfect solutions. I hope that it helps you too.

Thanks.