Homelab: First steps

22 Jun 2019

In September of 2018 I finally got around to writing up the hardware I’d deployed in April of that year. Then a bunch of life happened - a job change and another big move - so updates on the homelab to stalled for all work continued.

As I’ve finally gotten a bunch of major issues sorted out I’ll be working to get my unfinished drafts out the door and write up some of the interesting stuff I’ve got going on now.

Previously, I introduced my homelab project and my three little Ryzen powered horrors: ethos, logos and pathos. I wrote some about the process of choosing the hardware and glossed over the purely mechanical assembly - but how do you make some computers useful? Once I got them all assembled I had to get some Linux installs on there.

Usually one would just deploy Ubuntu or some other Linux distribution. Going with a widely adopted, “stable” distribution makes it easy to just put your installations together and focus your efforts on more interesting things like application development.

Unfortunately to my taste “stable” distributions are a massive waste of time. In order to deliver stability, distributions like Ubuntu and Centos purposefully don’t package latest versions. This is acceptable in the setting of a large organization where running finished software against mostly unchanging dependencies is the goal. Unfortunately for my personal use, the downsides of not being able to refer to the latest developer documentation and of not being able to leverage the latest features are more painful.

This makes my distro of choice Arch Linux, an unstable rolling release distribution with absolutely fantastic documentation. Arch makes a bunch of decisions designed to favor user flexibility - one of which is the decision not to ship an installer.

In the past when I’ve done Arch installs, I’ve done them one at a time either building out a new desktop, a new laptop or in rare circumstances such as the systemd migration rebuilding an existing system 😞. It’s annoying to do an install with manual in hand every time, but doing less than one a year it hasn’t been worth spending time on. For the homelab however I had three to stamp out right off the bat.

So how to automate the installation of a minimalist Linux distribution?

An arch install has four core steps to it: first you boot an image, then from the bootstrap image you partition and mount the disks, install packages to make the system bootable, and then reboot into the real install for whatever else is required.

To get my boxes up and running, I adapted the strategy from this blog post. My full version of which is here. The basic strategy is that, rather than go to all the work of building a fully automated Arch image which will boot to a running sshd, we’ll just boot each node and start running sshd ourselves. Then we’ll use a trio of scripts to relieve ourselves of the drudgery of actually installing the nodes. Not ideal, but a reasonable automation level at my lack of scale.

So what’s the automation look like? We’re gonna have three scripts - run.sh which provides the entry point, install.sh which does all the work up to arch-chrooting into the new installation and chroot.sh which will apply the finishing touches.

run.sh

This script is the entry point. Once you’ve stood up a node by hand and gotten it running sshd, this script wraps up shuffling ssh keys and executing the two scripts install.sh and chroot.sh on the target machine. We’ll use those two scripts to actually install the node.

#!/bin/bash

set -ex

read HOSTNAME
read HOST
PORT=22

echo PORT="$PORT", HOST="$HOST"
HOST_ROOT="root@$HOST"
PUBKEY=$(cat ~/.ssh/id_rsa.pub)

if [ ! -f mirrorlist.ranked ]; then
  awk '/^## United States$/{f=1}f==0{next}/^$/{exit}{print substr($0, 2)}' /etc/pacman.d/mirrorlist.pacnew > mirrorlist.unranked
  rankmirrors -n 6 mirrorlist.unranked > mirrorlist.ranked
fi

# copy your public key, so can ssh without a password later on
ssh -o StrictHostKeyChecking=no -tt -p "$PORT" "$HOST_ROOT" "mkdir -p -m 700 ~/.ssh; echo $PUBKEY > ~/.ssh/authorized_keys; chmod 600 ~/.ssh/authorized_keys"

# copy install scripts from ./root folder
scp -o StrictHostKeyChecking=no -P "$PORT" ./root/* "$HOST_ROOT:/root"
scp -o StrictHostKeyChecking=no -P "$PORT" mirrorlist.ranked "$HOST_ROOT:/etc/pacman.d/mirrorlist"

# set the executable bits
ssh -o StrictHostKeyChecking=no -tt -p "$PORT" "$HOST_ROOT" "chmod +x *.sh"

## run the install script remotely
# ssh -o StrictHostKeyChecking=no -tt -p "$PORT" "$HOST_ROOT" "./install.sh $HOSTNAME chroot.sh"

install.sh

This script prepares the target machine’s filesystem(s) and uses pacstrap to install packages into the prepared filesystems before we chroot into the mostly finished installation for the finishing touches. Most of this script is incredibly specific to the partitioning layout I wanted for my compute nodes.

Based on prior experience with having to rebuild Linux boxes, I’m a strong believer in separating all your data from the OS. On physically separate disks. Just so in case something really goes sideways you can blow the entire OS away without fearing for your data. This was a guiding principle behind my node design, and the choice to have separate 1 TiB drives.

#!/bin/bash

set -ex

## Set to true if disks should be wiped
# nuke_boot=true
## set to true if the boot disk should be wiped
# nuke_os=false
## Set this to true if the data disk should be wiped
# nuke_data=false

# Size (in MB) for the boot partition
boot_size=1024

# $1 is the hostname to install

HOSTNAME="$1"
CHROOT_SCRIPT="$2"

# Partition the NVME device as the boot device
BOOT_DISK="/dev/nvme0n1"
[ "${nuke}" = true ] && parted -s "$BOOT_DISK" mklabel gpt

BOOT_PARTITION="${BOOT_DISK}p1"
if [ "${nuke_boot:-false}" = true ]; then
  parted -s -a optimal "$BOOT_DISK" mkpart primary 0 "${boot_size}"

  # Set the requisite flags
  parted -s "$BOOT_DISK" set 1 boot on
  parted -s "$BOOT_DISK" set 1 esp on

  # Make fat32
  mkfs.fat -F32 "$BOOT_PARTITION"
fi

ROOT_PARTITION="${BOOT_DISK}p2"
if [ "${nuke_os:-false}" = true ]; then
   parted -s -a optimal "$BOOT_DISK" mkpart primary "${boot_size}" 100%
   mkfs.ext4 -F "$ROOT_PARTITION"
fi

# Partition the data/scratch disk
DATA_DISK="/dev/sda"
DATA_PARTITION="${DATA_DISK}1" # NVME and SATA have different naming schemes >.>
if [ "${nuke_data:-false}" = true ]; then
  parted -s "$DATA_DISK" mklabel gpt
  parted -s -a optimal "$DATA_DISK" mkpart primary ext4 0% 100%
  mkfs.ext4 -F "$DATA_PARTITION"
fi

# Mount the boot partition - we'll chroot into it in a second
mount "$ROOT_PARTITION" /mnt

mkdir -p /mnt/boot/efi
mount "$BOOT_PARTITION" /mnt/boot/efi

mkdir -p /mnt/data
mount "$DATA_PARTITION" /mnt/data

mkdir -p /mnt/root/.ssh

# Bootstrap into the new disk & install a bunch of stuff
pacman --noconfirm -Sy archlinux-keyring

if [ "${nuke_os}" = true ]; then
  # No OS left, re-pacstrap
  pacstrap /mnt base base-devel grub efibootmgr openssh ntp
  genfstab -p /mnt >> /mnt/etc/fstab
fi

cp "${CHROOT_SCRIPT}" /mnt/
#cp ~/.ssh/authorized_keys /mnt/
#cp /etc/pacman.d/mirrorlist /mnt/etc/pacman.d/mirrorlist

# Chroot into the new disk and run the chroot part of this setup dance
arch-chroot /mnt "/${CHROOT_SCRIPT}"\
            "${BOOT_DISK}"\
            "${HOSTNAME}"

# Remove the chroot bits
if [ -f "/mnt/${CHROOT_SCRIPT}" ]; then
  rm "/mnt/${CHROOT_SCRIPT}"
fi

if [ -f /mnt/authorized_keys ]; then
  rm /mnt/authorized_keys
fi

umount -R /mnt
systemctl reboot

chroot.sh

The chroot.sh script does everything that we can’t do until there’s a viable OS mostly built out. It’s still pretty minimalist - bootstrap scripts aren’t my long term config management plan so all this has to do is finish standing up a box we can then ssh to more for Ansible. More on my experience with Ansible later.

#!/bin/bash

set -ex

# $1 to this script is the boot device's name
# $2 is the hostname to deploy

HOST="$2"
USERNAME=arrdem
HOME_DIR="/home/${USERNAME}"

# grub as a bootloader
grub-install --target=x86_64-efi \
             --efi-directory=/boot/efi/ \
             --bootloader-id=GRUB \
             --recheck "$1"

# This makes the grub timeout 0, it's faster than 5 :)
#
# Skipping this so that it's still possible to physically get on the
# node without using custom boot media.
# sudo sed -i 's/GRUB_TIMEOUT=5/GRUB_TIMEOUT=0/g' /etc/default/grub

grub-mkconfig -o /boot/grub/grub.cfg

# run these following essential service by default
systemctl enable sshd.service
systemctl enable dhcpcd.service
systemctl enable ntpd.service

# Network configuration
echo "$HOST.apartment.arrdem.com" > /etc/hostname
cat <<EOF > /etc/hosts
127.0.0.1 localhost
::1 localhost
127.0.1.1 $HOSTNAME $HOSTNAME.apartment.arrdem.com
EOF

# adding your normal user with additional wheel group so can sudo
useradd -m -G wheel -s /bin/bash "$USERNAME"

# adding public key both to root and user for ssh key access
mkdir -m 700 -p "$HOME_DIR/.ssh"
mkdir -m 700 -p /root/.ssh
cp /authorized_keys "/$HOME_DIR/.ssh"
mv /authorized_keys /root/.ssh
chown -R "$USERNAME:$USERNAME" "$HOME_DIR/.ssh"

# adjust your timezone here
ln -f -s /usr/share/zoneinfo/America/Los_Angeles /etc/localtime
hwclock --systohc

# adjust your name servers here if you don't want to use google
# echo 'name_servers="8.8.8.8 8.8.4.4"' >> /etc/resolvconf.conf

# Set up the locale
echo en_US.UTF-8 UTF-8 > /etc/locale.gen
echo LANG=en_US.UTF-8 > /etc/locale.conf
locale-gen

# because we are using ssh keys, make sudo not ask for passwords
echo 'root ALL=(ALL) ALL' > /etc/sudoers
echo '%wheel ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers

# I like to use vim :)
echo -e 'EDITOR=emacs' > /etc/environment

# auto-complete these essential commands
echo complete -cf sudo >> /etc/bash.bashrc
echo complete -cf man >> /etc/bash.bashrc

# Userland shite & servicse
pacman --noconfirm -S sudo wget vim emacs-nox ansible

systemctl enable sshd
systemctl enable ntpd

Uncharacteristically, there’s a yak unshaved here. While this was quite successful at mitigating the worst of the toil, I still have to manually stand up every node I want to install. Given some more effort, I could have assembled a custom Arch image configured with sshd and including some of my pubkeys. That would have made the entire process almost painless - it’d just be boot and go. There is even documentation on how to customize the Arch boot media, it’s just somewhat involved. If I was going to do hundreds of node installations it’d be worthwhile.

The next logical step, which is basically where Twitter and other close to autonomous infrastructures wind up, is to combine such a customized boot image with netbooting and a coordination service. This enables you to almost entirely automate away the process of standing up nodes, presuming that appropriate partitioning and other configuration scripts are available for any given compute platform.

Maybe I’ll get around to building one of those eventually but my nodes have been stable enough I haven’t had to reinstall nodes daily 😊