Automated Publishing with Jekyll

17 Sep 2019

This post is somewhat meta - as it concerns a whole bunch of the automation by which I go about writing on this blog.

Alright nerds. You asked for natural numbers and zookeeper but you're gonna get trash world ruby first. Strap in.
— Reid; Yak Hunter (@arrdem) September 17, 2019

I’d like to be able to write more. At present, I’m running a CI/CD setup on one of my servers which - when I push to the blog repo - causes a deploy. Great! No touch deploys! Honestly, it’s served me well for a really long time now.

The not so great part is the semantics of Jekyll under a setup like this. Jekyll is a pretty tried-and-true static site generator, used for among other things Github’s Pages feature. Ruby ain’t my personal cup of tea, but it does a good enough job of taking Markdown, applying some CSS and emitting HTML which is all I need.

But static is the operative word here. Jekyll only exists at $ jekyll build time.

Now Jekyll has a feature - date - which lets you tag a post you’re authoring with an effective date. If this date is in the future, Jekyll will ignore it unless you’re rendering in --future mode. See where I’m going with this?

The problem is I may be up at 2-3 a.m. writing or mucking with the lab because I never entirely kicked those habits. Y’all mostly aren’t, and by posting “late” at “night” I’m largely self-defeating. Y’all won’t see it until the morning, and by then it’ll have been pushed back because it’s “old” relative to the morning tweet flood.

What I’d like to do is to start using a real CMS style workflow for authoring content. I write a post, schedule it for it for 9-10 a.m. the next day and forget about it.

So let’s go throw some more script(s) at this. I’m gonna want some automation for rendering the site, and I’ll need more for announcing changes to the site.

Autopublish

Let’s start with the autodeploys. I’ll want a wrapper script. It’d be nice to be able to include the gem dependencies of my blog setup in the blog itself - and run a gem install when they change.

Because I’m doing this with Ansible as usual, all this is gonna be parameterized on the precise blog name. It’d be nice after all to run jaunt’s blog (dead) or ox’s blog (not alive yet) off of the same infra.

role/git-jekyll-domain/templates/jekyll-build.j2

#!/bin/bash
# Usage
#  bash jekyll-build <blogname>

set -ex

[[ $(whoami) -eq "" ]]

echo "[autodeploy] starting build for $1"

# Set by git when executing post-receive, would stop git from
# detecting it's in a repo
unset GIT_DIR

# Go to the argument site to build
cd "$HOME/$1"

# Not technically race safe but close enough for now
if [ -e build.lock ]; then
    echo "Another build is in progress - soft aborting"
  exit 0
else
  touch build.lock
fi

before=$(git rev-parse HEAD)
git pull origin master && echo "[autodeploy] repo updated"
git checkout -f && echo "[autodeploy] reset complete"
after=$(git rev-parse HEAD)

# If the gemfile has changed, install changes before rendering
if git log --name-only $before..$after | grep "Gemfile"; then
    echo "[autodeploy] dependency changes detected, installing"
    gem install --file Gemfile
    echo "[autodeploy] gem update completed"
fi

echo "[autodeploy] attempting to render"
## FIXME: this is a garbage path hack
JEKYLL=$(find ~/.gem -type f -name jekyll | sort | head -n 1)

## FIXME: how to do an atomic-mv cutover here instead of killing the
## file tree in place?
rm -rf _site
"${JEKYLL}" build && echo "[autodeploy] done rendering!"

echo "[autodeploy] done!"

rm build.lock

Okay so that’s not bad - now we just need to lay down a couple other things. The git hook for instance. Git’s hooks are just shell scripts which get run after some event occurs. In this case I’m leveraging the post-receive hook which runs after objects have been pushed and refs have been updated. This means that the state I’ve pushed is fully in the repo, and the above build script will be able to pull it.

role/git-jekyll-domain/templates/post-receive.j2

#!/bin/bash

sudo -u {{ distribution_nginx_user }}\
    /srv/http/jekyll-build "{{ domain }}"

But I really don’t want to just grant my git user sudo, that’d be nuts. So let’s have a sudoers.d file that’ll allow this one command.

role/git-jekyll-domain/templates/10-jekyll-build.j2

# Grant the git user the right to the static site rebuild script as the http user
git ALL=({{ distribution_nginx_user}}:ALL) NOPASSWD: /srv/http/jekyll-build

Bolting all this together with an Ansible role doesn’t take too much more doing -

role/git-jekyll-domain/tasks/main.yml

# Expected parameters:
#   {{repo}} - the absolute path to the source repo
#   {{domains}} - a list of domains to serve
#   {{domain}} - (default {{domains[0]}} the name of the domain to serve, also the name of its template
#   {{ssl}} - whether this is a "normal" domain or an SSL enabled domain
#   {{cron}} - whether to run the build on a 5min cron.
---
- name: Install system packages
  package: name={{ item }} state=present
  with_items:
    - python-pygments
    - ruby
    - rubygems
    - git

- name: Install ruby-dev
  when: "ansible_distribution == 'Ubuntu'"
  package: name={{ item }} state=present
  with_items:
    - make
    - build-essential
    - ruby-dev

- name: Clone site
  git:
    repo: "{{ repo }}"
    version: master
    dest: "/srv/http/{{ domain }}"
  become: yes
  become_user: "{{ distribution_nginx_user }}"

- name: check if Gemfile exists
  stat: 
    path: "/srv/http/{{ domain }}/Gemfile"
  register: gemfile

- name: Install gems
  when: gemfile.stat.exists == True
  become_user: "{{ distribution_nginx_user }}"
  command: "sudo -u {{ distribution_nginx_user }} gem install -g /srv/http/{{ domain }}/Gemfile"

- name: Install post-receive
  template:
    src: post-receive
    dest: "{{ repo }}/hooks/post-receive"

- name: Set executable bit
  file:
    dest: "{{ repo }}/hooks/post-receive"
    mode: "u+x"
    owner: git
    group: git

- name: Install sudoers entry
  template:
    src: 10-git-http-jekyll.j2
    dest: /etc/sudoers.d/10-git-http-jekyll

- name: Install build script
  template:
    src: jekyll-build.j2
    dest: /srv/http/jekyll-build

- name: Set the executable bit
  file:
    dest: /srv/http/jekyll-build
    mode: "u+x"
    owner: "{{ distribution_nginx_user }}"

- name: Initial site build
  command: "/srv/http/jekyll-build {{ domain }}"
  become: yes
  become_user: "{{ distribution_nginx_user }}"

- name: Create cron entry
  when: cron is defined
  cron:
    name: "Rebuild {{ domain }}"
    job: "sudo -u {{ distribution_nginx_user }} /srv/http/jekyll-build {{ domain }}"
    # FIXME (arrdem 2019-09-17):
    #   FFS pull these as real parameters
    minute: "*/5"

- name: Install nginx domain
  include_role:
    name: nginx-domain
  vars:
    body: |
      root /srv/http/{{ domain }}/_site;
      index index.html
      charset utf-8;

      location ~* \.(css|js|gif|jpe?g|png)$ {
        expires 168h;
        add_header Pragma public;
        add_header Cache-Control "public, must-revalidate, proxy-revalidate";
      }

Alright awesome. Now with a simple playbook I can lay down all these files and get on with it.

play.yml

---
- hosts:
    - apartment_www
  vars_files:
    - "vars/{{ ansible_distribution }}.yml"
    - "vars/default.yml"
  roles:
    - role: git-jekyll-domain
      repo: /srv/git/arrdem/arrdem.com.git
      domains:
        - arrdem.com
        - arrdem.me
        - www.arrdem.com
        - www.arrdem.me
      ssl: true
      cron: true

And that’s all it takes for autodeploys!

We aren’t quite done yet however.

The other big feature that a CMS offers is automated announcement and of newly posted material. If I just let this cronjob run, posts will go up and unless you’re subscribed to the Atom feed you’ll never notice it. And come on this is 2019 nobody uses Atom anymore and I work for Twitter. I need tweets!

Autoannounce

So let’s build some announcement machinery!

One of Jekyll’s features is hooks. You can write Ruby code which will be executed at certain points in the lifecycle of your blog’s rendering. We’re gonna need two.

I don’t want to check the public and private keys for my Twitter account into git where y’all can see them. Sorry. So I’m gonna need a secret storage story, and then a way to post tweets so y’all see ‘em when the blog finally publishes.

Jekyll just loads whatever code it finds in the _plugins directory, so all we’re gonna have to do here is add a gem "twitter" line to the blog’s Gemfile and away we go.

Let’s do secrets first since it’s easy. This plugin attaches to the :after_init hook, and just tries to load up another file I’ve gitignored and chosen to manage by hand as if it were part of the site’s normal config.

_plugins/secrets.rb

# A way to load secrets from a pair to _config.yml

require 'yaml'

Jekyll::Hooks.register :site, :after_init do |site|
  if File.file?('_secret.yml') then
    site.config.update(YAML.safe_load(File.read('_secret.yml')))
  else
    STDOUT.print("Warning, no _secret.yml found! secrets not loaded.")
  end
end

Now, we need to do tweets. Tweets is tricky because well we’re gonna be using the filesystem to store state between builds. In fact, some of y’all saw me fuck this up and spam about 30 tweets in half a second before I got ratelimited.

me: [[cursing loudly in the apartment ]]

y'all: pic.twitter.com/QBmo7WymFt
— Reid; Yak Hunter (@arrdem) September 17, 2019

Shout out to those of y’all who found some comedy in my testing on main.

So what we’re gonna do is maintain a _tweets.yml file, which maps the URL of a post to the URL of a tweet. When we see a “new” post - one which isn’t in the mapping - we’ll tweet it out and create the requisite map entry.

_plugins/announce.rb

require 'twitter'
require 'yaml'

client = nil
post_to_tweets = {}

# Load the tweet DB and create the client
Jekyll::Hooks.register :site, :pre_render do |site|
  if File.file?('_tweets.yml') then
    post_to_tweets = YAML.safe_load(File.read('_tweets.yml'))
  else
    STDOUT.print("Warning: no tweets database was found!\n")
  end

  client = Twitter::REST::Client.new(site.config['twitter'])

  # So there's an escape hatch for development
  if not site.config['twitter'].fetch("enabled", false) then
    STDOUT.print("Warning: Twitter publishing has been disabled\n")
  end
end

# For each post, if there's a tweet in the DB or in the YAML prefix
# use that with the YAML prefix winning. Otherwise create one and
# update the tweet database either way.
Jekyll::Hooks.register :posts, :pre_render do |post|
  site = post.site
  if post.data["layout"] == "post" then
    full_post_url = site.config["url"] + post.url
    tweet = post.data.fetch("twitter", post_to_tweets.fetch(post.url, nil))

    if tweet == nil and site.config['twitter'].fetch("enabled", false) then
      # Post a new tweet and compute its URL
      STDOUT.print("Found an unpublished tweet - publishing...\n")

      # convert all my tags to hashtags
      tags = post.data["tags"].map { |str| "#" + str.downcase }.join(" ")
      # make the tweet text
      tweet_text = "New blog post! - " + post.data["title"] + " " + full_post_url + " " + tags
      # lob it out and grab the URL
      tweet = client.update(tweet_text).url.to_s
      STDOUT.print("Published as " + tweet + "\n")
    end

    # Write the tweet back so that it can be used in rendering
    if tweet != "skipped" then
      post.data["twitter"] = tweet
    end

    post_to_tweets[post.url] = tweet
  end
end

# Dump the tweet DB back
Jekyll::Hooks.register :site, :post_render do |site|
  File.open('_tweets.yml', 'w') { |file| file.write(post_to_tweets.to_yaml) }
end

This totally works once it gets to a steady state. The problem is initialization. Some of my more recent blog posts had the twitter: entry in their heading fontmatter, and I didn’t want to re-post those tweets or pretend like they didn’t exist. Telling the difference between a really old blog post and a new blog post would be impossible here without bringing the date into consideration, and chuck that.

Instead the bootstrapping process (which I messed up) was to check in TWO versions of this plugin. The first version (should have) had the .fetch("enabled", false) snipped replaced with .fetch("enabled", "skipped"). This will make Jekyll lay down a database of all the existing posts flagged so that they’ll be ignored in future. ‘course I didn’t do that and totally spammed my Twitter account, but I trust y’all to learn from my mistakes.

Then, swap that .fetch default value back to nil so that future new posts (like this one!) will be recognized as missing and automatically posted.

And that’s “all” it takes! To prove the point, this post - when it airs - will have been published using this precise machinery. Check the git log if you don’t believe me!