The Thirty Million Line Problem

This blog post is a review and musing on a talk of the same title “The Thirty Million Line Problem” (2015) (~1h lecture + 40m q&a).

To summarize the talk which badly needs it, Casey Muratori argues the following:

  1. That software used to be simple not general purpose. The example of interest is game operating systems as for the Amiga and such. It was possible for developers to be in total command of a machine and its resources.
  2. That software “complexity” (measured in lines of code) has exploded in the last decades.
  3. That much of the “complexity” of modern software stacks is in the operating system not applications.
  4. That “complexity” in operating systems is driven by attempting to support enormous variety of hardware and hardware abstraction layers which paper over differences between devices.
  5. The example of USB is given as an instance where hardware designers gave device designers leeway to do whatever they want over a standardized bus, which has created more dependencies on vendor-provided drivers.

Casey posits that general purpose kernel+driver complexity will grow unchecked unless standardized device interfaces are set for device manufacturers, and hardware is standardized into an “ISA” which specifies primitive driverless device interfaces. He posits that this is not an unreasonable proposal; historical game consoles had known and documented hardware interfaces, and modern SoCs are arguably “standardized” computers. Both of these enabled simplification of drivers and in the kernel such as Casey’s seeking.

There’s a couple themes to Casey’s talk I want to poke at.

The first is what software complexity is. I’ve taken a crack at this in the past and want to again. Casey doesn’t make it clear if his position is that complexity is strictly line count of software, or if it’s something different. I think the question of how we measure complexity is really interesting and not one we’ve spent enough ink on. Casey’s intuitive argument that complexity is at least sketched by code line count seems to be a common one in the industry, and one worth exploring.

Second is whether hardware’s really the problem here. To give the bit away I agree that it is, but the why is interesting and there’s good reason to wonder whether the complexity is incidental or essential.

Casey’s definitely on to something that much of the complexity driving the reuse of general purpose operating systems lies in the device driver library that comes with each OS. Implementing even one device driver let alone support for entire product families is an enormous engineering burden and it’s no surprise that often driver support is provided by the manufacturers. For instance Intel and Nvidia both have dedicated Linux development resources.

Hardware abstraction layers are another enormous source of complexity, not just because they’re hard to implement but because they’re fundamentally faulty abstractions. The purpose of a HAL is to provide “predictable” behavior across a variety of hardware implementations. This means using hardware features where they’re available and trying to provide efficient software bridges where they aren’t. There’s a tension here between providing “transparent” access to the underlying hardware with an inconsistent interface and predictable (hardware determined) performance, and providing a “consistent” interface which masks the underlying hardware and may provide very different performance across different devices due to needing software implementations of what are hardware features elsewhere.

A “transparent” abstraction is really no abstraction whatsoever. It’s just an extra step in taking on a hard dependency. An abstraction which can’t provide consist enough performance won’t be useful because in order to get acceptable performance it must be bypassed or otherwise “seen through”.

There is of course a Dan tweet for this, but it remains an incredibly important point in software engineering it feels like we skate over routinely.

Third and finally is whether Casey’s project of standardizing the hardware <-> software interfaces to eliminate the complexity of drivers and HALs would solve the problem it sets out to.

In some spaces for which the technology is stable I think it could and that standardization has already been achieved. Disk drives and other storage technologies already have good established driver interfaces. Keyboards, mice and other human input devices likewise have standard interfaces.

I think the problem with Casey’s proposal lies in the technology he cares most about - accelerators. The purpose of an accelerator is to provide the maximum of performance. This means that - at least to some level - a hardware dependent abstraction is presented. An at least translucent abstraction.

As hardware performance shifts, eventually the abstraction will to. For instance it’s one thing for a new board to expose a faster multiply operation, it’s another entirely to expose matrix multiplication or vector operations. The operational semantics of “multiply” stretch if you will to “fast multiply”. They don’t stretch to a fundamentally different interface. If a user desires maximum performance, they have to adopt the different operational model somehow somewhere. This seems to preclude the idea of stability and we’re back to challenging the notion of whether durable abstractions are even available.

In Casey’s world, that new interface would have to follow a new industry standard for what it would look like, so at least vendor churn of interfaces on the open market would be constrained and software would be able to target that standard interface. I think this is the best we could do, and it rings a lot of The Cuniform Tablets of 2015 in some regards of trying to present a stable if not preservationist minded programming target. Stable much in the same sense of the early computers Casey calls back to repeatedly - a fully defined architecture which never gets to change.

It’s interesting to muse on Casey’s point that a Raspberry Pi, no longer nearly the toys they were when this talk was new, arguably presents such a platform and whether Apple enjoys a comparable advantage with their relatively small hardware support matrix; especially on the new M1 hardware.