Automated Publishing with Jekyll
17 Sep 2019This post is somewhat meta - as it concerns a whole bunch of the automation by which I go about writing on this blog.
Alright nerds. You asked for natural numbers and zookeeper but you're gonna get trash world ruby first. Strap in.
— Reid; Yak Hunter (@arrdem) September 17, 2019
I’d like to be able to write more. At present, I’m running a CI/CD setup on one of my servers which - when I push to the blog repo - causes a deploy. Great! No touch deploys! Honestly, it’s served me well for a really long time now.
The not so great part is the semantics of Jekyll under a setup like this. Jekyll is a pretty tried-and-true static site generator, used for among other things Github’s Pages feature. Ruby ain’t my personal cup of tea, but it does a good enough job of taking Markdown, applying some CSS and emitting HTML which is all I need.
But static is the operative word here.
Jekyll only exists at $ jekyll build
time.
Now Jekyll has a feature - date
- which lets you tag a post you’re authoring with an effective date.
If this date is in the future, Jekyll will ignore it unless you’re rendering in --future
mode.
See where I’m going with this?
The problem is I may be up at 2-3 a.m. writing or mucking with the lab because I never entirely kicked those habits. Y’all mostly aren’t, and by posting “late” at “night” I’m largely self-defeating. Y’all won’t see it until the morning, and by then it’ll have been pushed back because it’s “old” relative to the morning tweet flood.
What I’d like to do is to start using a real CMS style workflow for authoring content. I write a post, schedule it for it for 9-10 a.m. the next day and forget about it.
So let’s go throw some more script(s) at this. I’m gonna want some automation for rendering the site, and I’ll need more for announcing changes to the site.
Autopublish
Let’s start with the autodeploys.
I’ll want a wrapper script.
It’d be nice to be able to include the gem
dependencies of my blog setup in the blog itself - and run a gem install
when they change.
Because I’m doing this with Ansible as usual, all this is gonna be parameterized on the precise blog name. It’d be nice after all to run jaunt’s blog (dead) or ox’s blog (not alive yet) off of the same infra.
role/git-jekyll-domain/templates/jekyll-build.j2
#!/bin/bash
# Usage
# bash jekyll-build <blogname>
set -ex
[[ $(whoami) -eq "" ]]
echo "[autodeploy] starting build for $1"
# Set by git when executing post-receive, would stop git from
# detecting it's in a repo
unset GIT_DIR
# Go to the argument site to build
cd "$HOME/$1"
# Not technically race safe but close enough for now
if [ -e build.lock ]; then
echo "Another build is in progress - soft aborting"
exit 0
else
touch build.lock
fi
before=$(git rev-parse HEAD)
git pull origin master && echo "[autodeploy] repo updated"
git checkout -f && echo "[autodeploy] reset complete"
after=$(git rev-parse HEAD)
# If the gemfile has changed, install changes before rendering
if git log --name-only $before..$after | grep "Gemfile"; then
echo "[autodeploy] dependency changes detected, installing"
gem install --file Gemfile
echo "[autodeploy] gem update completed"
fi
echo "[autodeploy] attempting to render"
## FIXME: this is a garbage path hack
JEKYLL=$(find ~/.gem -type f -name jekyll | sort | head -n 1)
## FIXME: how to do an atomic-mv cutover here instead of killing the
## file tree in place?
rm -rf _site
"${JEKYLL}" build && echo "[autodeploy] done rendering!"
echo "[autodeploy] done!"
rm build.lock
Okay so that’s not bad - now we just need to lay down a couple other things.
The git hook for instance.
Git’s hooks are just shell scripts which get run after some event occurs.
In this case I’m leveraging the post-receive
hook which runs after objects have been pushed and refs have been updated.
This means that the state I’ve pushed is fully in the repo, and the above build script will be able to pull it.
role/git-jekyll-domain/templates/post-receive.j2
#!/bin/bash
sudo -u {{ distribution_nginx_user }}\
/srv/http/jekyll-build "{{ domain }}"
But I really don’t want to just grant my git
user sudo, that’d be nuts.
So let’s have a sudoers.d file that’ll allow this one command.
role/git-jekyll-domain/templates/10-jekyll-build.j2
# Grant the git user the right to the static site rebuild script as the http user
git ALL=({{ distribution_nginx_user}}:ALL) NOPASSWD: /srv/http/jekyll-build
Bolting all this together with an Ansible role doesn’t take too much more doing -
role/git-jekyll-domain/tasks/main.yml
# Expected parameters:
# {{repo}} - the absolute path to the source repo
# {{domains}} - a list of domains to serve
# {{domain}} - (default {{domains[0]}} the name of the domain to serve, also the name of its template
# {{ssl}} - whether this is a "normal" domain or an SSL enabled domain
# {{cron}} - whether to run the build on a 5min cron.
---
- name: Install system packages
package: name={{ item }} state=present
with_items:
- python-pygments
- ruby
- rubygems
- git
- name: Install ruby-dev
when: "ansible_distribution == 'Ubuntu'"
package: name={{ item }} state=present
with_items:
- make
- build-essential
- ruby-dev
- name: Clone site
git:
repo: "{{ repo }}"
version: master
dest: "/srv/http/{{ domain }}"
become: yes
become_user: "{{ distribution_nginx_user }}"
- name: check if Gemfile exists
stat:
path: "/srv/http/{{ domain }}/Gemfile"
register: gemfile
- name: Install gems
when: gemfile.stat.exists == True
become_user: "{{ distribution_nginx_user }}"
command: "sudo -u {{ distribution_nginx_user }} gem install -g /srv/http/{{ domain }}/Gemfile"
- name: Install post-receive
template:
src: post-receive
dest: "{{ repo }}/hooks/post-receive"
- name: Set executable bit
file:
dest: "{{ repo }}/hooks/post-receive"
mode: "u+x"
owner: git
group: git
- name: Install sudoers entry
template:
src: 10-git-http-jekyll.j2
dest: /etc/sudoers.d/10-git-http-jekyll
- name: Install build script
template:
src: jekyll-build.j2
dest: /srv/http/jekyll-build
- name: Set the executable bit
file:
dest: /srv/http/jekyll-build
mode: "u+x"
owner: "{{ distribution_nginx_user }}"
- name: Initial site build
command: "/srv/http/jekyll-build {{ domain }}"
become: yes
become_user: "{{ distribution_nginx_user }}"
- name: Create cron entry
when: cron is defined
cron:
name: "Rebuild {{ domain }}"
job: "sudo -u {{ distribution_nginx_user }} /srv/http/jekyll-build {{ domain }}"
# FIXME (arrdem 2019-09-17):
# FFS pull these as real parameters
minute: "*/5"
- name: Install nginx domain
include_role:
name: nginx-domain
vars:
body: |
root /srv/http/{{ domain }}/_site;
index index.html
charset utf-8;
location ~* \.(css|js|gif|jpe?g|png)$ {
expires 168h;
add_header Pragma public;
add_header Cache-Control "public, must-revalidate, proxy-revalidate";
}
Alright awesome. Now with a simple playbook I can lay down all these files and get on with it.
play.yml
---
- hosts:
- apartment_www
vars_files:
- "vars/{{ ansible_distribution }}.yml"
- "vars/default.yml"
roles:
- role: git-jekyll-domain
repo: /srv/git/arrdem/arrdem.com.git
domains:
- arrdem.com
- arrdem.me
- www.arrdem.com
- www.arrdem.me
ssl: true
cron: true
And that’s all it takes for autodeploys!
We aren’t quite done yet however.
The other big feature that a CMS offers is automated announcement and of newly posted material. If I just let this cronjob run, posts will go up and unless you’re subscribed to the Atom feed you’ll never notice it. And come on this is 2019 nobody uses Atom anymore and I work for Twitter. I need tweets!
Autoannounce
So let’s build some announcement machinery!
One of Jekyll’s features is hooks. You can write Ruby code which will be executed at certain points in the lifecycle of your blog’s rendering. We’re gonna need two.
I don’t want to check the public and private keys for my Twitter account into git where y’all can see them. Sorry. So I’m gonna need a secret storage story, and then a way to post tweets so y’all see ‘em when the blog finally publishes.
Jekyll just loads whatever code it finds in the _plugins
directory, so all we’re gonna have to do here is add a gem "twitter"
line to the blog’s Gemfile and away we go.
Let’s do secrets first since it’s easy.
This plugin attaches to the :after_init
hook, and just tries to load up another file I’ve gitignored and chosen to manage by hand as if it were part of the site’s normal config.
_plugins/secrets.rb
# A way to load secrets from a pair to _config.yml
require 'yaml'
Jekyll::Hooks.register :site, :after_init do |site|
if File.file?('_secret.yml') then
site.config.update(YAML.safe_load(File.read('_secret.yml')))
else
STDOUT.print("Warning, no _secret.yml found! secrets not loaded.")
end
end
Now, we need to do tweets. Tweets is tricky because well we’re gonna be using the filesystem to store state between builds. In fact, some of y’all saw me fuck this up and spam about 30 tweets in half a second before I got ratelimited.
me: [[cursing loudly in the apartment ]]
— Reid; Yak Hunter (@arrdem) September 17, 2019
y'all: pic.twitter.com/QBmo7WymFt
Shout out to those of y’all who found some comedy in my testing on main.
So what we’re gonna do is maintain a _tweets.yml
file, which maps the URL of a post to the URL of a tweet.
When we see a “new” post - one which isn’t in the mapping - we’ll tweet it out and create the requisite map entry.
_plugins/announce.rb
require 'twitter'
require 'yaml'
client = nil
post_to_tweets = {}
# Load the tweet DB and create the client
Jekyll::Hooks.register :site, :pre_render do |site|
if File.file?('_tweets.yml') then
post_to_tweets = YAML.safe_load(File.read('_tweets.yml'))
else
STDOUT.print("Warning: no tweets database was found!\n")
end
client = Twitter::REST::Client.new(site.config['twitter'])
# So there's an escape hatch for development
if not site.config['twitter'].fetch("enabled", false) then
STDOUT.print("Warning: Twitter publishing has been disabled\n")
end
end
# For each post, if there's a tweet in the DB or in the YAML prefix
# use that with the YAML prefix winning. Otherwise create one and
# update the tweet database either way.
Jekyll::Hooks.register :posts, :pre_render do |post|
site = post.site
if post.data["layout"] == "post" then
full_post_url = site.config["url"] + post.url
tweet = post.data.fetch("twitter", post_to_tweets.fetch(post.url, nil))
if tweet == nil and site.config['twitter'].fetch("enabled", false) then
# Post a new tweet and compute its URL
STDOUT.print("Found an unpublished tweet - publishing...\n")
# convert all my tags to hashtags
tags = post.data["tags"].map { |str| "#" + str.downcase }.join(" ")
# make the tweet text
tweet_text = "New blog post! - " + post.data["title"] + " " + full_post_url + " " + tags
# lob it out and grab the URL
tweet = client.update(tweet_text).url.to_s
STDOUT.print("Published as " + tweet + "\n")
end
# Write the tweet back so that it can be used in rendering
if tweet != "skipped" then
post.data["twitter"] = tweet
end
post_to_tweets[post.url] = tweet
end
end
# Dump the tweet DB back
Jekyll::Hooks.register :site, :post_render do |site|
File.open('_tweets.yml', 'w') { |file| file.write(post_to_tweets.to_yaml) }
end
This totally works once it gets to a steady state.
The problem is initialization.
Some of my more recent blog posts had the twitter:
entry in their heading fontmatter, and I didn’t want to re-post those tweets or pretend like they didn’t exist.
Telling the difference between a really old blog post and a new blog post would be impossible here without bringing the date into consideration, and chuck that.
Instead the bootstrapping process (which I messed up) was to check in TWO versions of this plugin.
The first version (should have) had the .fetch("enabled", false)
snipped replaced with .fetch("enabled", "skipped")
.
This will make Jekyll lay down a database of all the existing posts flagged so that they’ll be ignored in future.
‘course I didn’t do that and totally spammed my Twitter account, but I trust y’all to learn from my mistakes.
Then, swap that .fetch
default value back to nil
so that future new posts (like this one!)
will be recognized as missing and automatically posted.
And that’s “all” it takes! To prove the point, this post - when it airs - will have been published using this precise machinery. Check the git log if you don’t believe me!
^d