XML and Other Horrors

Those of you who follow me on Twitter may be aware of my recent travails with a bogeyman I've named only as "XML". Having banged my head against this abomination of software for several days, it's venting time so buckle in for the ride.

The Hell Am I Building

The summer I was working at Calxeda, my friends @frozenfoxx and @Sweet_Grrl got me hooked on a tabletop game, Warmachine by the Privateer Press aka PP. At a high level since PP's digital presence is pretty last decade, Warmachine is a tabletop miniature army game in the 2.5cm to 12cm base diameter size range. Models and units of models have statistics in the classical D&D style, and players engage in deathmatch and king of the hill style games.

Compared to Warhammer 40k which is probably better known, Warmachine features a much more steampunk theme & universe, simpler rules of play, lower entry price point and a greater focus on balance of play rather than on army construction and model specialization.

In short, Warmachine is a player's game while Warhammer is a modeler's game.

Unfortunately, a few realities of this game genre assert themselves.

  1. Games are slow and take up a lot of space
  2. Games are only possible at the other player's convenience
  3. Exploring game balance questions is difficult b/c play is time-hard
  4. You've got to buy before you try much of the time

As a student with homework, I simply haven't been able to attend Austin's various weekly Warmachine evenings in about a year which means it's a long time since I've gotten to play a game I enjoy a lot. Unlike Starcraft or Dota I can't just drop into a game at my convenience. But what if I could?

I've previously messed with AIs for Magic: the Gathering, so the idea at some point asserted itself that I could (or even should) try to build a full up simulator for Warmachine IP issues be damned.

The What

There are two "moving pieces" to this project. Or maybe prior art is the better term. Sebastien Laforet, here and after referred to as the Frenchman, built an Android app which is capable of rendering data files encoding the rules of play and model statistics for the game. Great! Someone else already typed all that in for me. In English, but typed it in. These files are available on github and are "just" XML. More on this in a bit.

Two other people, "Hobo" and "PG_Corwin" build a tile set for the VASSAL tabletop simulation engine. Unfortunately VASSAL is just a sprite engine with no understanding of the rules or this whole project would be moot. Treating the module as a JAR yields a whole bunch of sprites in the /images/ tree, so there's "free" art to work with as well.

The Devils in the Data

In theory this should be dead easy. The rules files are a full enumeration of every known piece in the game, and the sprites package has an image for most of them, so just wire them together into a data store of some sort and I'd be done from an assets point of view. I'd just need to well build the graphics engine and figure out how to simulate the ruleset.

Unfortunately this isn't so simple for a couple reasons.

First of all, the XML files Sebastien put together are designed to do one thing and one thing only: encode the rule text on the playing cards associated with each model and unit. Cards are generally formatted with an image, some fluff and statistics on the front and rules on the back. So for instance this recently released model, Coleman Stryker v3 has a model and stat card face as such:

So we can slap all this in XML and it works just fine (with serious elisions for length)

<warcaster id="Yz02"
           name="Stryker3"
           full_name="Lord General Coleman Stryker"
           qualification="Cygnar Epic Cavalry Warcaster"
           focus="6"
           warjack_points="5"
           fa="C"
           completed="true">
  <basestats name="Stryker"
             spd="8" str="6" mat="7" rat="6" def="15" arm="16" cmd="10"
             hitpoints="18" immunity_electricity="true" />
  <weapons>
    <melee_weapon name="Quicksilver MKIII"
                  pow="8" p_plus_s="14"
                  magical="true" reach="true">
      <capacity title="DISRUPTION">
        ....
      </capacity>
    </melee_weapon>
    <mount_weapon name="Mount" pow="10">
      <capacity title="THUNDEROUS IMPACT">
        ....
      </capacity>
    </mount_weapon>
    <ranged_weapon name="Quicksilver Blast"
                   rng="8" rof="1" aoe="-" pow="14"
                   magical="true" electricity="true">
      <capacity title="DISRUPTION">
        ....
      </capacity>
    </ranged_weapon>
  </weapons>
  <feat title="LIGHTNING CHARGE">
    ....
  </feat>
  <spell name="ARCANE BOLT"
         cost="2" rng="12" aoe="-" pow="11" up="NO" off="YES">....</spell>
  <spell name="CHAIN BLAST"
         cost="3" rng="10" aoe="3" pow="12" up="NO" off="YES">....</spell>
  <spell name="ESCORT"
         cost="2" rng="SELF" aoe="CTRL" pow="-" up="YES" off="NO">....</spell>
  <spell name="FURY"
         cost="2" rng="6" aoe="-" pow="-" up="YES" off="NO">....</spell>
  <spell name="IRON AGGRESSION"
         cost="3" rng="6" aoe="-" pow="-" up="YES" off="NO">....</spell>
  <capacity title="ELITE CADRE [STORM LANCES]" type="">....</capacity>
  <capacity title="FIELD MARSHAL [ASSAULT]">....</capacity>
  <capacity title="PLASMA NIMBUS">....</capacity>
</warcaster>

So we have some meaning for a <warcaster> which is a title and a specialization of <model>, we have <basestats> which is common to pretty much everything, we have a <weapons> block with a list of <weapon> entries each of which has some attributes (rules) named <capacity> and this is all pretty sane. Writing a set of functions which can walk this tag tree using clojure.xml was really simple, and worked great, and all you have to do is reduce over all this tag soup with some bindings and a state object to build a representation of a given model.

That code has been working for about 12 months, because shortly thereafter batteries of special cases appeared from the woodwork and fouled my initial efforts on this up.

See as in Magic: the Gathering, Warmachine derives a lot of well value from novelty and from mixing things up in the ruleset. This means that it's not uncommon to find things which aren't what they appear to be. The newest faction in the game has this awesome model:

The text at the bottom "Battle engine and solos" is the important bit. The TEP itself is what is called a Battle Engine and has a whole battery of rules associated with that concept. However it comes in a box with three other models which aren't part of this Battle Engine entity, they are independent during play hence the use of the word Solo. So... what the hell is this thing and how do we represent it?

Well this is our Frenchman's answer, again with omissions:

<battleEngine id="cocE01"
              full_name="Transinfinite Emergence Projector & Permutation Servitors"
              cost="9" qualification="Convergence Battle engine" fa="2"
              completed="true">
  <basestats spd="5" str="10" mat="0" rat="4" def="10" arm="19" cmd="10"
             hitpoints="20" construct="true" pathfinder="true"/>
  <weapons>
    <ranged_weapon name="Aperture Pulse"
                   rng="SP10" aoe="-" pow="10" rof="1" location="-">
      <capacity title="AUTO FIRE[2]">....</capacity>
      <capacity title="FIRING FORMULAE">....</capacity>
    </ranged_weapon>
  </weapons>
  <capacity title="GUN PLATFORM">....</capacity>
  <capacity title="SACRIFICIAL PAWN [PERMUTATION SERVITORS]">....</capacity>
  <capacity title="SERVITOR SATELLITES">....</capacity>
  <capacity title="STEADY">....</capacity>
  
  <model id="Permutation Servitors" full_name="Permutation Servitors">
    <basestats spd="6" str="3" mat="5" rat="5" def="12" arm="13" cmd="0"
               construct="true" pathfinder="true"/>
    <capacity title="ORBIT">....</capacity>
    <capacity title="STEADY">....</capacity>
  </model>
</battleEngine>

Did you catch that? <battleEngine> NESTS another <model>! From the WHAC app, which seeks to duplicate the cards that come in the box this makes sense. There's a card which represents the TEM, and there's another card which gives the statistics for each of the spawnable Servitors. Also, <battleEngine> as a tag is implicitly dealt with as <model>, same as <warcaster> was earlier.

Rendering in the app this looks just fine, because you render the "top" model being the big ol' TEM anyone reading the cards cares about and the Servitors are secondary and so don't occur as a primary listing.

Unfortunately this is nonsense from my point of view as a consumer trying to extract structured information. The TEM is a <model> in and of itself, as are the Servitors, and they come together in a package that PP calls PIP 36028.

A similar problem applies to units. It turns out that the Frenchman encodes units by using this implicit <model> behavior. So a unit will be named "Foo, Leader & Grunts", use the implicit <model> tag, just have <basestats>, some weapons and capacities and that'll be all. Which sorta works.

Unfortunately it falls apart in units with models that aren't all the same. I've wasted enough length on images and listings here, but go check out the encoding of the Black 13th if you care. The Black 13th is composed of three characters, being Lynch, Watts and Ryan each of whom is unique. They are encoded using the implicit <model> on their shared <unit> to describe Lynch, the leader, and the other two have their own nested <model>s.

Again from a rendering standpoint this makes sense, but really what I want is something like this:

<unit name="the Black 13th" id="...">
  <model name="Lynch">
    ...
  </model>

  <model name="Ryan">
    ...
  </model>

  <model name="Watts">
    ...
  </model>
</unit>

because that would make sense.

The even more degenerate case of this is unfortunately the common case of an infantry unit: "Foo, Leader & Grunts" where the leader will get the implicit <model> and there'll be a <model name="grunts"> or something in its body. This is really a problem because usually the officer has a different sprite than the grunts, and the grunts need to have an ID in order to have a sprite permanently associated with them. Guess what attribute they may not have.

And don't get me started on the spelling errors and qualification mistakes and soforth I've found in these files as a result of trying to automate parsing them.

The Devils in the Sprites

The sprites are their own little shitshow. There isn't a naming convention to all of the files as they were done by several artists over the course of years. Most of the time they're named with substrings or abbreviations for a model's name, so some regex voodoo is able to recover about a third of the model to file associations but for most of it pfft I got to spend last night hand matching data file model IDs to sprite names.

I'm down to about 250 entities which aren't associated with a sprite, and I built a three page webapp to speed up building these associations, so it could be worse but it's still painful.

Why do this? Well because the XML file that's in the root of the VASSAL module and describes all the sprite to model associations isn't actually valid XML, it's some VASSAL specific nonsense that I can't seem to parse.

So yeah. XML and other people's data can go burn.

The question now before me is whether I try to do awful things to my XML file parser so I can continue consuming Sebastien's data file format non-deal as it is in the hope that as he maintains it I can "just" import changes or whether I want to for my own sanity just refactor these files and walk away from any future work of Sebastien's in the interest of quality and sanity of parsed results.

End Of Rant