Notes from building Raspberry Pi clusters
28 Nov 2020A while ago I got it into my head to put a Raspberry Pi cluster together.
A while ago I got it into my head to put a Raspberry Pi cluster together.
I’ve been doing a lot of reflecting lately on the last project I shipped - what went well, and what didn’t. A while back I tweeted out some halfbaked thoughts. One of which was a reflection that while the entire engineering organization beyond my team was using a tremendously powerful toolset, we still got bogged down.
Version numbers are hard. We as programmers are awful about tracking what constitutes a breaking change compared to what constitutes an invisible bugfix. From a formal semantics perspective, there is no such thing as a bug and every “bugfix” is strictly speaking a breaking change. And yet this notion of “fixing” behavior “broken” by shoddy implementations is central to modern notions of libraries, dependency management and deployment. I’ve previously written about versioning, its semantics and some theories about how we could do it better.
This post is somewhat meta - as it concerns a whole bunch of the automation by which I go about writing on this blog.
Previously, I talked at some length about the slightly heinous yet industrial grade monitoring and PDU automation solution I deployed to keep my three so called modes - ethos, logos and pathos - from locking up for good by simply hard resetting them when remote monitoring detected an incident. That post (somewhat deliberately I admit) had some pretty gaping holes around configuration management for the restart script. The restart script is handwritten and hand-configured. It has no awareness of the Ansible inventory I introduced in my fist Ansible post - which captures much of the same information. Why the duplication?
In my first homelab post, I mentioned that I chose the AMD Ryzen 5 1600 processor to run my compute nodes. Unfortunately, the Ryzen series is vulnerable to random soft locks which doesn’t seem to have a workaround other than “don’t let them idle”, and I neglected to do my do-diligence when I purchase this hardware platform because I trusted @cemerick who owns a whole stack of nearly identical boxes.
Previously, I looked at using Ansible and Ansible’s inventory capabilities to begin managing services and configuration on my homelab.
Previously, I wrote about the process by which I bootstrapped Arch Linux onto the nodes of my Homelab. Just running Linux is hardly interesting - millions of computers do that every day. The real trick is going from bare Linux installs to deployed services, which requires managing the configuration and state of the servers. In this post, I’m gonna introduce some basic concepts from Ansible, and point out some stuff I wish I’d known starting off.
In September of 2018 I finally got around to writing up the hardware I’d deployed in April of that year. Then a bunch of life happened - a job change and another big move - so updates on the homelab to stalled for all work continued.