Immutable Env Things

As with some of my other posts, this was originally an email which turned into enough of a screed I thought it would be of general interest and worth posting. Some op/ed material has been removed, but the technical details are unaltered if not clarified.

So here's the rundown I've got so far on a lisp with immutable environments. I'm merely gonna sketch at some stuff, because giving a full treatment would require dusting off and finishing a bunch of stuff I came up with for Ox and put down again.

I also threw out the Haskell sketch I was dinking with because it was just distracting from writing this :P

So lets start from the top.

Clojure is a form-at-a-time language which happens to support reading files of forms (or rather Readers which produce textual representations of forms, where they come from is magical).

Binding is achieved through a global, thread shared, transactional, mutable mapping from Symbols to Namespaces, which are themselves transactional mappings from Symbols to Vars, which are themselves a triple (QualifiedSymbol, Metadata, Binding). Vars happen to also support dynamic binding (pushing and popping), but this is less used. From here on out I'll treat them simply as a global mostly static binding mechanism, which is their primary use case anyway.

Control and local bindings are achieved entirely using lambda/fn forms compiled to JVM methods, produced by the opaque compiler as opaque IFn/AFn objects. Top level forms are compiled and executed for effect by being wrapped in an anonymous (fn [] <form>) which can be blindly invoked by the compiler to realize whatever effects the form may have.

Circular dependencies are achieved via Var indirection. A Var is first created with no binding, so that it can be referenced. With that environment binding, code which will depend on (call) the Var's final bound value can be compiled since it just needs the target Var to exist in order to compile. The depended on Var may then be redefined, having both itself and its dependent visible in the compilation context and so the two Var bindings can be mutually recursive.

While traditional and lispy, this approach has a number of problems which I'm sure I don't need to reiterate here. The goals of an immutable namespace system then should be to

  1. Make Namespaces the compilation unit rather than Forms
  2. Make Namespaces reloadable (change imports/exports/aliases w/o system reboot)
  3. Enable Namespace compilation to fail sanely
  4. Enable refactoring/analysis

The sketch of this I've been playing with is as follows:

Lets assume an interpreter. If you can interpret you can compile as an implementation detail of eval and interpretation is easier to implement initially. Assume a reader. So we read a form, then we have to evaluate it. Evaluation is first macroexpansion, then sequential evaluation of each resulting form in the environment. Evaluation at the top level is eval Env Form -> Env (we discard the result) which is fine. Because we aren't really doing compilation, only interpretation, Clojure's tactic of having Vars and analyzing against Var bindings works 1:1. Create an unbound symbol, analyze/bind against it, then add a binding later.

The (global) environment structure I had in mind is pretty simple: one unqualified symbol names the current namespace (*ns*), a mapping exists from unqualified symbols to Namespaces.

Namespaces are essentially compilation contexts, and really just serve to immutably store the alias mapping, import mapping, the mapping of the Vars in the namespace to bindings, and a list of the public Vars/exports from the namespace.

Vars are a fully qualified name, metadata and a value. That's it.

Compilation occurs at the namespace level. All the forms in a namespace are sequentially read and evaluated in interpretation mode. All macros expand normally and type hints are processed they just don't do anything.

Once the whole namespace has been successfully loaded in interpretation mode, a second pass may be made in which static linking is performed. Because everything in the namespace has been fully realized already it's possible to statically link even mutually recursive calls within the same namespace. Full type inference within the namespace is also possible here, etc.

Aside: Interestingly because it is single pass, the existing Clojure compiler doesn't (can't) handle hinted mutually recursive function calls at all. It relies on Var metadata to store arglists (and hints for primitive invocations), so in order to emit a hinted recursive call the forward declaration of the var has to carry the same ^:arglists metadata (hints and all!) which will be added by defn whenever it is finally defined.

Once a namespace has been fully loaded and compiled (including the second pass) the global environment with the resulting namespace and var mappings is the result of whatever caused loading. If an error occurs during compilation, we just abort and return the environment state before anything else happened. I think there are tricks to be played with generated garbage class names and classloader isolation here so that this works even during the 2nd static compile pass, but it should be possible to fully clean up a failed compile.

So this works really great for module loading, and even for macros which expand into multiple def forms. This model of eval :: env -> form -> (env, result) starts to break down when you want to talk about macros that do evaluation during code loading, and likewise the REPL. Basically your interpreter winds up holding an additional reference, *env* or something, which is intrinsically mutable, but which references an immutable environment and holds shall we say the state continuation of the entire REPL. Consider user code which actually calls eval during execution. Whatever state changes that evaluation creates need to be preserved in the "global" continuation.

In this model when you compile a form at the REPL, the runtime could obviously interpret (this may be appropriate for macroexpansion) and can also compile. This is only tricky when the user recompiles a symbol which was already bound/compiled. In this case, the runtime could either eagerly recompile all the code which links against this function using the same interpret then statically link tactic or could just invalidate any compiled bytecodes and lazily recompile later. The former is probably more predictable.

Once a namespace has been reloaded/altered, all namespaces importing it must also be recompiled in topsort order using the same tactics. That we already have per-symbol dependency information and per-module dependency information helps with building refactoring tools which otherwise have to derive all this themselves. Ideally top level expressions/definitions would also expose local variable tables tables and local variable dataflow graphs so that analysis there is also possible.

Aside: It may be possible to allow cyclic dependencies between namespaces (see submodules in racket for some study on this problem). In the common case it may well be that macros are well behaved and that cyclic dependencies between modules work out just fine. However because macros perform arbitrary computation it's also pretty easy to cook up pathological cases where macro generated code never reaches a fixed point in repeated interpretive analysis which can be statically compiled. For this reason I'm inclined towards formalizing a ns or module special form and throwing out require, refer, import and friends as legal top level forms altogether. Namespaces should be declarative as in Java/Haskell.

There's probably a way which I just haven't seen yet to treat updates to the "global" namespace mapping and updates within a namespace the same way via the same mechanism since they're both dependent on the same dependency tracking and recompile machinery.

Writing this has made me think about having an ImmutableClassLoader which is derived entirely from a immutable environment and loads classes by lazily (statically) compiling forms.

That's kinda all I got. ztellman gets credit for the mutable *env* in the repl bit which I spent literally weeks trying to do without last year. Maybe something with monad transformers can get you there but I couldn't figure it out. mikera has some sketches of all this with types in his KISS project.

I've already prototyped an environment system that looks a lot like this in the Ox repo if you want to go dig. Originally there was this ox/lang/environment.clj then I started dinking with a Java implementation of the environment structure ox/lang/environment which also has a couple different stack based contextual binding types for the interpreter.

As written about in the Jaunt essays, I've kinda concluded that whatever this immutable ns language winds up looking like is so divorced from what Clojure or at least the way that most people write Clojure that you'll loose so much value in the changeover it won't pay off for a very long time. The mount library is a perfect example of this in that it's deeply imperative and takes advantage of code loading order to achieve start and stop. It's not the only bit of code I've seen which does effects at load time either. Hence Jaunt which tries to be less flawed around namespace reloading/refactoring without going whole hog on ns immutability.

The more I kick this stuff around the more I find that the whole endeavor verges on a Haskell clone in sexprs and the more I decide I'd rather take the time to learn Haskell properly than mess with an immutable lisp or Jaunt. Besides then you get nice stuff like typeclasses and fully typed instance dispatch and equality that actually works and theorems and native bytecode and yeah.