Strictly Tagged Clojure
03 Aug 2015This summer I’ve been an intern at Factual, and this is an experience report from the semiannual internal hackathon where Alan ‘amalloy’ Malloy and I experimented with using Alexander Yakushev’s Skummet fork of Clojure to emit lean(er) bytecode.
Some motivation
One of Clojure’s primary use cases is as a more palatable tool with which to interact with the rich Java ecosystem and existing Java libraries. Because of its facilities for such inter-operation, Clojure is even sometimes used to write performance sensitive code which would otherwise be written in Java. However there are limitations to the success with which this may be done.
While JVM bytecode is statically typed, Clojure is an aggressively dynamically checked language which makes pervasive use of the Object type to delay typechecking. To this end, Clojure will use JVM Object reflection to resolve instance fields and methods when performing interoperation. While correct for unknown types, because reflective access is slow compared to direct access for known types it has long been possible to write type hints which advise Clojure about the runtime JVM type of a value and enable Clojure to use direct access and direct method invocation rather than reflective access.
However these hints are not types in the sense of a static type being a contract on the domain of values, they are merely hints for reflection elimination and place no contract on the domain of a hinted value.
This hinting behavior for reflection elimination comes at the cost of emitting
checkcast
instructions. As the JVM is statically typed, one cannot simply swear that a
value is of a type, a checking cast must be used. Clojure, when emitting
non-reflective method calls and field accesses, does not statically know (and
makes no attempt to prove) that the value or expression in play is in fact of
the type which you may have tagged it. All local variables and function
parameters which are not JVM primitives are stored as Objects and so must be
checkcast
ed to the desired type on every use.
So are we stuck trading slow reflective access for checkcast
instructions
(which are faster to be sure but cause method bloat when doing lots of interop
on previously checked values)?
Of course not! While Clojure does not currently have support for real contractual types when using tags, we can sure add such behavior!
Now. Before you burn me at the stake for being a heretic, clearly since Clojure
does not currently have strict local types, we can’t just make tags strict.
TEMJVM actually
makes that mistake, and as a
result cannot compile clojure/core
because among others,
clojure.core/ns-publics
makes use of a type hint which while safe for nonstrict type tags is not correct
in the context of strict tags. This has to be an additive, opt-in change.
So, what Alan and I did was create new special fn
metadata flag ^:strict
. If
a fn
body being compiled has the ^:strict
tag, then and only then are are
type tags treated as a contract rather than being advisory. This is a strictly
additive change because stock Clojure will ignore the metadata and emit less
efficient but still correct code.
So as an example, let’s consider the following fn:
1
2
3
4
5
6
(defn sum ^long [^Iterable xs]
(let [iter (.iterator xs)]
(loop [tot (long 0)]
(if (.hasNext iter)
(recur (+ tot (long (.next iter))))
tot))))
Eliding a bunch of implementation details for brevity, this fn compiles on stock Clojure 1.7 to the following JVM bytecodes:
public final long invokePrim(java.lang.Object xs);
0 aload_1 [xs]
1 aconst_null
2 astore_1 [xs]
3 checkcast java.lang.Iterable [47]
6 invokeinterface java.lang.Iterable.iterator() : java.util.Iterator [51] [nargs: 1]
11 astore_2 [iter]
12 lconst_0
13 nop
14 lstore_3 [tot]
15 aload_2 [iter]
16 checkcast java.util.Iterator [53]
19 invokeinterface java.util.Iterator.hasNext() : boolean [57] [nargs: 1]
24 ifeq 51
27 lload_3 [tot]
28 aload_2 [iter]
29 checkcast java.util.Iterator [53]
32 invokeinterface java.util.Iterator.next() : java.lang.Object [61] [nargs: 1]
37 invokestatic clojure.lang.RT.longCast(java.lang.Object) : long [64]
40 invokestatic clojure.lang.Numbers.add(long, long) : long [70]
43 lstore_3 [tot]
44 goto 15
47 goto 52
50 pop
51 lload_3
52 lreturn
Local variable table:
[pc: 15, pc: 52] local: tot index: 3 type: long
[pc: 12, pc: 52] local: iter index: 2 type: java.lang.Object
[pc: 0, pc: 52] local: this index: 0 type: java.lang.Object
[pc: 0, pc: 52] local: xs index: 1 type: java.lang.Object
// Method descriptor #77 (Ljava/lang/Object;)Ljava/lang/Object;
// Stack: 5, Locals: 2
public java.lang.Object invoke(java.lang.Object arg0);
0 aload_0 [this]
1 aload_1 [arg0]
2 invokeinterface clojure.lang.IFn$OL.invokePrim(java.lang.Object) : long [79] [nargs: 2]
7 new java.lang.Long [30]
10 dup_x2
11 dup_x2
12 pop
13 invokespecial java.lang.Long(long) [82]
16 areturn
So here we have two methods. The first one, the invokePrim
, takes an Object
and returns a primitive long
since we long hinted our function. The invoke
method is a wrapper around the invokePrim
method which provides for “boxing”
(wrapping in an object) the primitive result of calling invokePrim
. This
allows our fn
to be used by code which wants and can use a long
, and code
which doesn’t know/care and just wants an Object
back like a normal fn
would
return.
So lets dig into the invokePrim
method.
- Load
xs
off the arguments stack. It’s just typedObject
because that’s the parameter type. - Load the constant
nil
. - Store the
nil
to the local namedxs
, thus clearing it. Note that in the locals table,xs
has the typeObject
. This means that when we getxs
from the local, we have tocheckcast
it again because we’ve lost type information by storing and loading it. checkcast
thexs
we loaded toIterable
since we don’t really know what it is.invokeinterface
of the.iterator
method to get anIterator
from our now guaranteedIterable
.- Store our
Iterable
into theiter
local (discarding type information as above) - Load the constant
0
from the class constant pool. - Store the
tot
local (primitive typed) - Load our
iter
checkcast
because in storing it we forgot that it’s anIterator
invokeinterface
to see if there are elements left, producing a primitiveboolean
- Branch on the
boolean
going to 21 (in this list) if false - Load
tot
from the local - Load
iter
from the local checkcast
thatiter
is stillIterator
invokeinterface
to get the next value from the iterator, producing anObject
invokestatic
to call the staticclojure.lang.RT
method for convertingObject
s to primitivelong
s.invokestatic
to add the two primitivelong
s on the stack- Store the new value of
tot
- Loop back to 10.
- Clear the stack
- Load
tot
return
So with the exception of the first checkcast
to make sure that the Object
we
got as an argument that should be Iterable
is in fact an instance of
Iterable
, the checkcast
s after load
are all provably uncalled for. The
static types of these values is known because their Java signatures are known,
and the only reason that we have to emit all these checks is that the Compiler
throws that information away by storing these values in untyped (Object
)
locals.
The Hack
Every
Expr
in clojure.lang.Compiler
already knows (or can state) its type either as
tagged or inferred, and whether it has such a tag. However, these stated Java
classes are lies! A function invocation (IFn.invoke
call site) is statically
typed to return Object
(unless it’s a primitive call site but we know that as
well) no matter what the tag on the IFn
being invoked may say. For example
clojure.core/str
is tagged ^String
and does indeed return a String
,
however after invoking the appropriate IFn
the JVM doesn’t know that there’s a
String
on the stack because the IFn
interface discards that type
information. It just knows it has an Object
. The fix is that we add a
Expr.needsCast
method and implement it for every instance of Expr
in
Compiler.java. So now when in strict mode, we know that unless Expr.needsCast
returns true
, the value on the stack after Expr.emit
absolutely is of type
Expr.getJavaClass
. Otherwise we cannot avoid the checkcast
.
We also have to change the behavior of locals so that we can emit locals with
types other than Object
. By typing locals we preserve their type information
as tagged or inferred across loads and stores. This allows the Expr
representing a local use to report that it only needs a cast when the usage of
the local doesn’t have the same tag as the type of the binding and we cannot
statically show no cast is required.
With these changes, our modified Compiler.java can indeed produce and use strictly typed locals. So lets add our annotation…
1
2
3
4
5
6
(defn ^:strict sum ^long [^Iterable xs]
(let [iter (.iterator xs)]
(loop [tot (long 0)]
(if (.hasNext iter)
(recur (+ tot (long (.next iter))))
tot))))
And generate bytecode on our modified version of Skummet 1.7-RC1-r4 (again abbreviated).
public final long invokePrim(java.lang.Object);
Code:
0: aload_1
1: aconst_null
2: astore_1
3: checkcast #30 // class java/lang/Iterable
6: invokeinterface #34, 1 // InterfaceMethod java/lang/Iterable.iterator:()Ljava/util/Iterator;
11: astore_2
12: lconst_0
13: nop
14: lstore_3
15: aload_2
16: invokeinterface #40, 1 // InterfaceMethod java/util/Iterator.hasNext:()Z
21: ifeq 43
24: lload_3
25: aload_2
26: invokeinterface #44, 1 // InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object;
31: invokestatic #49 // Method clojure/lang/RT.longCast:(Ljava/lang/Object;)J
34: ladd
35: lstore_3
36: goto 15
39: goto 44
42: pop
43: lload_3
44: lreturn
LocalVariableTable:
Start Length Slot Name Signature
15 29 3 tot J
12 32 2 iter Ljava/util/Iterator;
0 44 0 this Ljava/lang/Object;
0 44 1 xs Ljava/lang/Object;
public java.lang.Object invoke(java.lang.Object);
Code:
0: aload_0
1: aload_1
2: invokeinterface #59, 2 // InterfaceMethod clojure/lang/IFn$OL.invokePrim:(Ljava/lang/Object;)J
7: new #13 // class java/lang/Long
10: dup_x2
11: dup_x2
12: pop
13: invokespecial #62 // Method java/lang/Long."<init>":(J)V
16: areturn
The win compared to the original bytecode should be obvious. Sure enough in the
invokeStatic
method we only emit the one checkcast we absolutely have to have
because the xs
argument could really be anything. The tot
and iter
locals
are both statically typed, and so we can just load them and invoke the
appropriate interfaces directly.
In some simple benchmarks, this optimization on this fn translates to a 5%-10%
performance improvement which isn’t too impressive. However other fns like
clojure.core/str
in our testing were able to get up to 20% performance
improvements from strict locals.
Disclaimer
This is the product of a two day hack. While Alan and I have been able to get it
to work and emit working code, honestly this isn’t something we’re comfortable
taking to production yet. Some clear wins such as being able to emit typed fn
arguments by popping arguments, checking them and then putting them in typed
local bindings for use and being able to take advantage of types on closed over
locals remain on the table.
What Didn’t (seem) To Work
While Alan debugged Compiler.java and picked off some types related wins we
hadn’t gotten yet I worked on adding more inlining behavior to
clojure.core
. Much of core
, especially the lexically early fn
s are just
thin wrappers around interop on clojure.lang.RT
, which does have reasonable
interface types on most of its methods.
The hope was that what with the typed locals work, preserving more type information across calls to the Clojure standard library and inlining the Clojure standard library where possible to interop calls with clearly inferable types we would be able to produce demonstrably faster code.
While in theory this should be at least break even and probably a win, we
haven’t managed to benchmark it well and show a clear win from the aggressive
inlining work. In fact, the really interesting case of possibly aggressive
inlining being an into
which is able to use a typed, static Transient
loop
is impossible because into
is implemented in terms of reduce
, which takes
the reducing fn
as a value and then dispatches via clojure.lang.IReduce
in
order to get fast iteration over chunked seq. However we can’t statically inline
a call site through being taken as a value so that’s the end of the line for
that idea.
Inline Unrolling
We were however able to fully inline some interesting cases of
clojure.core/assoc
and clojure.core/conj
. A common pattern in Clojure is to
write a function f
which has a zero arguments case returning a constant, a one
argument case returning the argument unmodified and a two or more arguments case
in which the operation f
is reduced over the arguments provided. Rather than
directly implement IFn
, functions emitted by Clojure instead extend
clojure.lang.AFn
(source),
which provides some out of the box support for functions of variable arity and
function application.
Next Steps
These changes were motivated by internal performance requirements and will likely get polished until they are ready to be upstreamed back to Skummet. While we expect that the patch we have developed will never be included into Clojure as-is if only due to high impact, we hope to see this behavior or something like it enter the mainline in a future version of Clojure.
Edit 1: Skummet pull request submitted
^d