Oxcart going forwards

11 Dec 2014

When I last wrote about Oxcart work pretty much went on hiatus due to my return to school. As there has been some recent interest in the status of Lean Clojure overall I thought I’d take the opportunity to review the state of Oxcart and the plan for Oxcart going forwards.

Oxcart and Clojure

The static linking approach and lack of run time dynamism found in Oxcart is explicitly at odds with the philosophy of core Clojure. Where Clojure was designed to enable live development and makes performance sacrifices to enable such development as discussed here, Oxcart attempts to offer the complement set of trade offs. Oxcart is intended as a pre-deployment static compiler designed to take a working application and to the greatest extent possible wring more performance out of unchanged (but restricted) Clojure as PyPy does for Python. As Oxcart explicitly avoids the dynamic bindings which Clojure embraces, Alex Miller, the Clojure Community Manager, has repeatedly stated that he expects to see little cross pollination from Oxcart and related work to Clojure itself.

This would be all well and good, were it not for the existing behavior of Clojure’s clojure.lang.RT class. As currently implemented in Clojure 1.6 and 1.7, RT uses its <initc> method to compile the following resources with clojure.lang.Compiler.

“clojure/core”
“clojure/core_proxy”
“clojure/core_print”
“clojure/genclass”
“clojure/core_deftype”
“clojure/core/protocols”
“clojure/gvec”
“clojure/instant”
“clojure/uuid”

These represent about 10799 lines of code, all of which could easily be statically compiled and most importantly tree shaken ahead of time by Oxcart or another tool rather than being loaded at boot time. This also means that the unavoidable booting of Clojure itself from source can easily dominate loading user programs especially after static compilation to raw classes. A quick benchmark on my machine shows that booting a Clojure 1.6 instance, loading a ns containing only a -main that only prints “hello world” takes ~2.8 seconds from source compared to ~2.5 seconds booting the same program compiled with Oxcart suggesting that the cost of booting Clojure is the shared ~2.5 second boot time. This is the test.hello benchmark in Oxcart’s demos.

$ git clone git@github.com:oxlang/oxcart.git &&\
  cd oxcart &&\
  git checkout 0.1.2 &&\
  bash bench.sh test.hello
Running Clojure 1.6.0 compiled test.hello....
Hello, World!

real    0m1.369s
user    0m3.117s
sys     0m0.083s
Oxcart compiling test.hello....
Running Oxcart compiled test.hello....
Hello, World!

real    0m1.212s
user    0m2.487s
sys     0m0.073s

Then there’s the test.load benchmark. This benchmark as-is pushes credulity because it compiles 502 functions of which only the -main which uses none of the other 501 will be invoked. This reflects more on program loading time than on the loading time of “clojure/core”, but I still think instructive in the costs of boot time compilation, showing a ~7s boot time for Clojure compared to a ~2.5s boot time for Oxcart. As arbitrary slowdowns from macroexpansions which Thread/sleep would be entirely possible I consider this program within the bounds of “fairness”.

A Fork in the Road

There are two solutions to this limitation, and both of them involve changing the behavior of Clojure itself. The first is my proposed lib-clojure refactor. Partitioning Clojure is a bit extreme, and in toying with the proposed RT → Util changes over here I’ve found that they work quite nicely even with a monolithic Clojure artifact. Unfortunately there seems to be little interest from Clojure’s Core team (as judged via Alex’s communications over the last few months) in these specific changes or in the static compilation approach to reducing the deployment overhead of Clojure programs. The second is to fork Clojure and then make lib-clojure changes which solves the problem of convincing Core that lib-clojure is a good idea but brings its own suite of problems.

Oxcart was intended to be my undergraduate thesis work. While the 16-25% speedup previously reported is impressive, Oxcart does nothing novel or even interesting under the hood. It only performs four real program transformations: lambda lifting, two kinds of static call site linking and tree shaking. While I suppose impressive for an undergrad, this project also leaves a lot on the table in terms of potential utility due to its inability to alter RT’s unfortunate loading behavior. I also think there is low hanging fruit in doing unreachable form elimination and effect analysis, probably enough that Oxcart as-is would not be “complete” even were its emitter more stable.

I’m reluctant to simply fork Clojure, mainly because I don’t think that the changes I’ve been kicking about for lib-clojure actually add anything to Clojure as a language. If I were to fork Clojure, it’d be for Oxlang which actually seeks to make major changes to Clojure not just tweak some plumbing. But writing a language so I can write a compiler is frankly silly so that’s not high on the options list. The worst part of this is that forking Clojure makes everything about using Oxcart harder. Now you have dependencies at build time (all of “stock” Clojure) that don’t exist at deployment time (my “hacked” Clojure). Whatever hack that requires either winds up complicating everyone’s project.clj or in an otherwise uncalled for leiningen plugin just like lein-skummet. Tooling needs to be able to get around this too when every library you’d want to use explicitly requires [org.clojure/clojure ...] which totally goes away once Oxcart emits the bits you need and throws the rest out. Most of all I don’t want to maintain a fork for feature parity as time goes on. However I also don’t see any other a way to get around RT’s existing behavior since the RT → Util refactor touches almost every java file in Clojure.

Flaws in the Stone

Oxcart itself also needs a bunch of work. While I think that Nicola has done an awesome job with tools.analyzer and tools.emitter.jvm I’m presently convinced that while it’s fine for a naive emitter (what TEJVM is), it’s a sub-optimal substrate for a whole program representation and for whole program transforms.

Consider renaming a local symbol. In the LLVM compiler infrastructure, “locals” and other program entities are represented as mutable nodes to which references are held by clients (say call sites or use sites). A rename is then simply an update in place on the node to be changed. All clients see the change with no change in state. This makes replacements, renames and so forth constant time updates. Unfortunately due to the program model used by tools.analyzer and tools.emitter.jvm, such efficient updates are not possible. Instead most rewrites degenerate into worst case traversals of the entire program AST when they could be much more limited in scope. Cutaway is one experiment in this direction, but it at best approximates what clojure.core.logic.pldb is capable of. I hope that over Christmas I’ll have time to play with using pldb to store, search and rewrite a “flattened” form of tools.analyzer ASTs.

Oxcart is out of date with tools.emitter.jvm and tools.analyzer. This shouldn’t be hard to fix, but I just haven’t kept up with Nicola’s ongoing work over the course of the last semester. This will probably get done over Christmas as well.

Oxcart doesn’t support a bunch of stuff. As of right now, defmulti, defmethod, deftype, defprotocol, proxy, extend-type and extend-protocol aren’t supported. I’m pretty sure all of these actually work, or could easily work, they just didn’t get done in the GSoC time frame.

Finally and I think this is the thing that’s really blocking me from working on Oxcart: it can’t compile clojure.core anyway. This is a huge failing on my part in terms of emitter completeness, but it’s a moot point because even if I can compile clojure.core with Oxcart RT is gonna load it anyway at boot time. I also suspect that this is an incompleteness in the project as a whole which probably makes it an unacceptable thesis submission although I haven’t spoken with my adviser about it yet.

The Endgame

As of right now I think it’s fair to call Oxcart abandoned. I don’t think it’s a worthwhile investment of my time to build and maintain a language fork that doesn’t have to be a fork. I talked with Alexander, one of the clojure-android developers and a fellow GSoC Lean Clojure student/researcher about this stuff and the agreement we reached was that until 1.7 is released there’s no way that the lib-clojure changes will even get considered and that the most productive thing we can do as community members is probably to wait for 1.8 planning and then try to sell lib-clojure and related cleanup work on the basis of enabling clojure-android and lean clojure/Oxcart. Practically speaking in terms of my time however, if it’s going to be a few months until 1.7 and then a year until 1.8, that only gives leaves me my last semester of college to work on Oxcart against an official version of Clojure that can really support it. If that’s what it takes to do Oxcart I’ll likely just find a different thesis project or plan on graduating without a thesis.

(with-plug
  That said, serious interest in Oxcart as a deployment tool or another
  contributor would probably be enough to push me over the futility hump
  of dealing with a Clojure fork and get Oxcart rolling.)