Typing the processor
13 Nov 2013Previously I implemented a pipelined version of a simple processor for my batbridge assembly architecture. This time, I’m going to take that architecture a step further by using the Clojure.core.typed library by @ambrosebs to get some level of type safety in preparation for a binary assembler.
Introducing typed Clojure
Clojure is at its core a typed language because the runtime that Clojure sits upon (the Java JVM) is a type based system. However by a large the programming model offered to Clojure programmers is essentially type agnostic. For the most part, Clojure programmers are free to deal in terms of abstract sequence and map types without real regard for the true Java class used to represent their data at runtime.
This is a Good Thing. It reduces the intellectual load on programmers in so much as programmers are free to leave the implementation details of their algorithm to the Clojure compiler. Unfortunately, types and their implications for data operations are also more or less central to the way that programmers solve problems. In short without some notion of types we are bound hand and foot.
Typed Clojure as a library seeks to implement an optional annotative type system for Clojure. Rather than enforce strict typing, Typed Clojure seeks to make types a tool by which to ease evolving Clojure programs and an aid to program comprehension rather than a set of implicit or comment noted assumptions.
A personal example: This time last year I was tasked with writing an out of
order processor simulator for class (gee does that sound familiar). Everything
was going great until I discovered that I needed to add metadata to the
instructions in my simulator. Lots of metadata. Now at the time, I didn’t know
about Clojure.core/meta
and Clojure.core/with-meta
which make it possible to
attach metadata to existing objects. Rather I began the painful process of
reworking my simulator days before it was due from using a vector based
representation of an instruction to using a map. The results were unfortunate,
in so much as Maps it turns out are ISeq. This means that the sequence operators
which expected a Vector almost worked on maps making it a serious pain for me to
debug these type mismatches. As I said in my first post, the simulator never
quite worked.
Typed Clojure helps resolve this class of problem by offering inference backed annotative typing. This means that you describe through annotations what the structure of your intended types are, and then a type inference engine attempts to verify that your program is correct with respect to the described data structures. So in the case of my map vs. vector instruction representation problem, Typed Clojure checking would have made it obvious what code had not yet been rewritten to operate on the instruction map rather than the previous instruction vector structure.
Diving in
As of this writing, Typed Clojure is at version
[org.Clojure/core.typed "0.2.17"]
and it is against this version that I will
be typing my current BatBridge simulator.
Typed Clojure offers a number of built in types: Seq
s, Set
s, Maps
,
Integers
, the list goes on and on. So just for grins, lets do a simple demo.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
(ns demo
(:require [Clojure.core.typed :as t :refer [defn>]]))
(t/ann three t/AnyInteger)
(def three 3)
(t/ann ^:no-check Clojure.core/zero? [t/AnyInteger -> boolean])
(t/ann myzero? [t/AnyInteger -> boolean])
(defn myzero? [n]
(zero? n))
(defn> add-two :- t/AnyInteger
([x :- t/AnyInteger]
(+ x 2))
([x :- t/Anyinteger
& more :- (t/Seq t/AnyInteger)]
(+ (apply + x more) 2)))
Type Clojure uses the ann
form to annotate a symbol with type(s). Here, I use
the ann
form to indicate that the symbol three
is of type AnyInteger
. I
use AnyInteger
rather than integer
or long
because Clojure typecasts
between long and integer under the covers, and AnyInteger
wraps both of them.
I then defined a function, myzero?
, which simply wraps Clojure.core/zero?
annotating both my function and Clojure.core/zero?
with the appropriate
types. For functions, the ann
notation is [param-type + -> ret-type]
. Here,
I define both zero functions to be [t/AnyInteger -> boolean]
. This works as
expected, but when we run the typechecker on this namespace we will see that the
annotation of Clojure.core/zero?
warns that it isn’t being checked. This is
because I used the ^:no-check
reader metadata when I annotated the symbol
Clojure.core/zero?
. When the Typed Clojure type checker encounters
^:no-check
flagged symbols no effort is made to process the definition of the
annotated symbol and ensure that the annotative type is correct with respect to
the runtime type.
Finally I use the new defn>
syntax, to achieve the same sort of effect as my
previous defn
/ann
pair. In fact, if you inspect the defn>
macro, you will
discover that it simply generates the defn
/ann
pair. However it provides
nice syntactic sugar so I elected to demonstrate it.
Checking types
There are three main entry points I use to typed Clojure:
Clojure.core.typed/cf
, Clojure.core.typed/check-ns
and
lein-typed.
Clojure.core.typed/cf
(short for Check Form) is a function of two
arguments, the first being an expression for which some type is
claimed and the second being a type. cf
engages the Typed Clojure
inference engine and attempts to verify that the typed form is indeed
of the type claimed. For example, if we attempt to typecheck one of
the example forms above…
Introducing Seqs, Maps and Unions
Typed Clojure allows for considerably more variety in annotation than simple
types like integers and booleans. Sets and Maps are fully supported, through the
use of the forms Clojure.core.typed/Set
and Clojure.core.typed/Map
.
As one would imagine, Clojure.core.typed/Set
is a function of one argument,
which creates a type representing a set of elements which all share the argument
type.
Clojure.core.typed/Map
is similar, in that it is a function of two arguments,
one for the key type and one for the value type.
Clojure.core.typed/Seq
describes an arbitrary sequence of elements sharing a
type, and Clojure.core.type/Vec
get the same place but is specific that the
sequence involved is a Vector.
However we’re still in the realm of relatively simple types as far as Clojure is concerned. Much of Clojure is about mixing and matching different types in and out to create variable type structures and other kinds of weirdness. For this we need three more forms:
Introducing U, HMap and HVec
U
is the Typed Clojure function for expressing the union of several
types. That’s all it’s good for, but that makes it immensely powerful. Need to
represent a value which could be either Long
or Integer
? (U Long
Integer)
. Yes, it’s that easy.
While still experimental, HMap
is for me where the rubber meets the road in
terms of Typed Clojure’s usefulness. Heterogeneous maps, maps where different
special keys map to differently typed values, are a core design pattern of the
Clojure community. As described over
on the core.typed wiki,
HMaps
can represent required keys, the types thereof, optional keys and the
types thereof, and even the absence of certain keys! HVec
is the same idea as
HMap
, except that it’s used to represent Vectors rather than Maps.
Applying this to BatBridge
Okay. Well let’s start defining some of the types that are integral to the BatBridge simulator then…
okay, cool… this lets us represent all the legal values which the simulator could see as storage locations within the VM, and values that we can interpret to put in those storage locations. Now let’s try and tackle the VM state itself… we’re gonna need an HMap with a bunch of mandatory keys…
But what about the instructions themselves? We need three real instruction encodings: the vector instructions which I use for my assembly notation, the map instructions which execute operates on and which decode builds, and the commit commands that writeback uses.
Now, armed with these type annotations, it’s more or less trivial to chase
through the simulator I originally presented adding the appropriate
annotations. However, all this comes to a head when you try and type the
processor’s memory because there is a fundamental type conflict in both of my
processors to date: both InstructionVecs and AnyIntegers are legal values within
the :memory
of a Processor. This means that the type system cannot be sure
(and nor can we) that the value fetched from some address is in fact an
instruction not a value.
There are two ways to resolve this conflict. The approach which I’ve
taken thus far if you’ve read the Typed Clojure code
over on github
is to sweep this conflict under the rug by creating a pair of
^:no-check
functions one of which reads memory returning
instructions, the other of which reads memory returning
values. However the real fix to this is to introduce a binary
(numeric) encoding for instructions. I hope that this constitutes a
helpful introduction to Typed Clojure, next time I will at long last
define an encoding for BatBridge and present the full architecture
complete with a bytecode assembler toolkit.
^d