Typing the processor

Previously I implemented a pipelined version of a simple processor for my batbridge assembly architecture. This time, I’m going to take that architecture a step further by using the Clojure.core.typed library by @ambrosebs to get some level of type safety in preparation for a binary assembler.

Introducing typed Clojure

Clojure is at its core a typed language because the runtime that Clojure sits upon (the Java JVM) is a type based system. However by a large the programming model offered to Clojure programmers is essentially type agnostic. For the most part, Clojure programmers are free to deal in terms of abstract sequence and map types without real regard for the true Java class used to represent their data at runtime.

This is a Good Thing. It reduces the intellectual load on programmers in so much as programmers are free to leave the implementation details of their algorithm to the Clojure compiler. Unfortunately, types and their implications for data operations are also more or less central to the way that programmers solve problems. In short without some notion of types we are bound hand and foot.

Typed Clojure as a library seeks to implement an optional annotative type system for Clojure. Rather than enforce strict typing, Typed Clojure seeks to make types a tool by which to ease evolving Clojure programs and an aid to program comprehension rather than a set of implicit or comment noted assumptions.

A personal example: This time last year I was tasked with writing an out of order processor simulator for class (gee does that sound familiar). Everything was going great until I discovered that I needed to add metadata to the instructions in my simulator. Lots of metadata. Now at the time, I didn’t know about Clojure.core/meta and Clojure.core/with-meta which make it possible to attach metadata to existing objects. Rather I began the painful process of reworking my simulator days before it was due from using a vector based representation of an instruction to using a map. The results were unfortunate, in so much as Maps it turns out are ISeq. This means that the sequence operators which expected a Vector almost worked on maps making it a serious pain for me to debug these type mismatches. As I said in my first post, the simulator never quite worked.

Typed Clojure helps resolve this class of problem by offering inference backed annotative typing. This means that you describe through annotations what the structure of your intended types are, and then a type inference engine attempts to verify that your program is correct with respect to the described data structures. So in the case of my map vs. vector instruction representation problem, Typed Clojure checking would have made it obvious what code had not yet been rewritten to operate on the instruction map rather than the previous instruction vector structure.

Diving in

As of this writing, Typed Clojure is at version [org.Clojure/core.typed "0.2.17"] and it is against this version that I will be typing my current BatBridge simulator.

Typed Clojure offers a number of built in types: Seqs, Sets, Maps, Integers, the list goes on and on. So just for grins, lets do a simple demo.

(ns demo
	(:require [Clojure.core.typed :as t :refer [defn>]]))

(t/ann three t/AnyInteger)
(def three 3)

(t/ann ^:no-check Clojure.core/zero? [t/AnyInteger -> boolean])
(t/ann myzero? [t/AnyInteger -> boolean])
(defn myzero? [n]
	(zero? n))

(defn> add-two :- t/AnyInteger
	([x :- t/AnyInteger]
	    (+ x 2))
	([x :- t/Anyinteger
      & more :- (t/Seq t/AnyInteger)]
        (+ (apply + x more) 2)))

Type Clojure uses the ann form to annotate a symbol with type(s). Here, I use the ann form to indicate that the symbol three is of type AnyInteger. I use AnyInteger rather than integer or long because Clojure typecasts between long and integer under the covers, and AnyInteger wraps both of them.

I then defined a function, myzero?, which simply wraps Clojure.core/zero? annotating both my function and Clojure.core/zero? with the appropriate types. For functions, the ann notation is [param-type + -> ret-type]. Here, I define both zero functions to be [t/AnyInteger -> boolean]. This works as expected, but when we run the typechecker on this namespace we will see that the annotation of Clojure.core/zero? warns that it isn’t being checked. This is because I used the ^:no-check reader metadata when I annotated the symbol Clojure.core/zero?. When the Typed Clojure type checker encounters ^:no-check flagged symbols no effort is made to process the definition of the annotated symbol and ensure that the annotative type is correct with respect to the runtime type.

Finally I use the new defn> syntax, to achieve the same sort of effect as my previous defn/ann pair. In fact, if you inspect the defn> macro, you will discover that it simply generates the defn/ann pair. However it provides nice syntactic sugar so I elected to demonstrate it.

Checking types

There are three main entry points I use to typed Clojure: Clojure.core.typed/cf, Clojure.core.typed/check-ns and lein-typed.

Clojure.core.typed/cf (short for Check Form) is a function of two arguments, the first being an expression for which some type is claimed and the second being a type. cf engages the Typed Clojure inference engine and attempts to verify that the typed form is indeed of the type claimed. For example, if we attempt to typecheck one of the example forms above…

Introducing Seqs, Maps and Unions

Typed Clojure allows for considerably more variety in annotation than simple types like integers and booleans. Sets and Maps are fully supported, through the use of the forms Clojure.core.typed/Set and Clojure.core.typed/Map.

As one would imagine, Clojure.core.typed/Set is a function of one argument, which creates a type representing a set of elements which all share the argument type.

Clojure.core.typed/Map is similar, in that it is a function of two arguments, one for the key type and one for the value type.

Clojure.core.typed/Seq describes an arbitrary sequence of elements sharing a type, and Clojure.core.type/Vec get the same place but is specific that the sequence involved is a Vector.

However we’re still in the realm of relatively simple types as far as Clojure is concerned. Much of Clojure is about mixing and matching different types in and out to create variable type structures and other kinds of weirdness. For this we need three more forms:

Introducing U, HMap and HVec

U is the Typed Clojure function for expressing the union of several types. That’s all it’s good for, but that makes it immensely powerful. Need to represent a value which could be either Long or Integer? (U Long Integer). Yes, it’s that easy.

While still experimental, HMap is for me where the rubber meets the road in terms of Typed Clojure’s usefulness. Heterogeneous maps, maps where different special keys map to differently typed values, are a core design pattern of the Clojure community. As described over on the core.typed wiki, HMaps can represent required keys, the types thereof, optional keys and the types thereof, and even the absence of certain keys! HVec is the same idea as HMap, except that it’s used to represent Vectors rather than Maps.

Applying this to BatBridge

Okay. Well let’s start defining some of the types that are integral to the BatBridge simulator then…

(t/def-alias Memory "A storage medium within a processor"
  (t/Map t/AnyInteger t/AnyInteger))

(t/def-alias Storage
  (U (Value :registers) (Value :memory)))

(t/def-alias InstructionParam "An instruction parameter"
  (U t/AnyInteger
     (Value :r_PC)
     (Value :r_IMM)
     (Value :r_ZERO)))

okay, cool… this lets us represent all the legal values which the simulator could see as storage locations within the VM, and values that we can interpret to put in those storage locations. Now let’s try and tackle the VM state itself… we’re gonna need an HMap with a bunch of mandatory keys…

(t/def-alias Processor "A processor state"
  (HMap :mandatory {:registers Memory
                    :memory    Memory
                    :fetch   InstructionVec
                    :decode  InstructionMap
                    :execute CommitCommand
                    :halted  boolean}
        :complete? true)) ; indicates that anything else makes the Processor illegal

But what about the instructions themselves? We need three real instruction encodings: the vector instructions which I use for my assembly notation, the map instructions which execute operates on and which decode builds, and the commit commands that writeback uses.

(t/def-alias InstructionVec "A symbolic vector instruction"
  '[t/Keyword InstructionParam InstructionParam InstructionParam t/AnyInteger])

(t/def-alias InstructionMap "A parsed map instruction"
  (HMap :mandatory {:icode t/Keyword
                    :dst   InstructionParam
                    :srca  InstructionParam
                    :srcb  InstructionParam
                    :lit   t/AnyInteger}
        :complete? true))

(t/def-alias CommitCommand
  (U '[(Value :memory)   t/AnyInteger t/AnyInteger]
     '[(Value :registers) t/AnyInteger t/AnyInteger]
     (Value :halt)))

Now, armed with these type annotations, it’s more or less trivial to chase through the simulator I originally presented adding the appropriate annotations. However, all this comes to a head when you try and type the processor’s memory because there is a fundamental type conflict in both of my processors to date: both InstructionVecs and AnyIntegers are legal values within the :memory of a Processor. This means that the type system cannot be sure (and nor can we) that the value fetched from some address is in fact an instruction not a value.

There are two ways to resolve this conflict. The approach which I’ve taken thus far if you’ve read the Typed Clojure code over on github is to sweep this conflict under the rug by creating a pair of ^:no-check functions one of which reads memory returning instructions, the other of which reads memory returning values. However the real fix to this is to introduce a binary (numeric) encoding for instructions. I hope that this constitutes a helpful introduction to Typed Clojure, next time I will at long last define an encoding for BatBridge and present the full architecture complete with a bytecode assembler toolkit.