Switching from WordPress to Jekyll

Over the course of a little less than a week, I migrated this blog from WordPress 3.8 to Jekyll. I used WordPress for years, probably since 2005 or 2006 when I switched away from Movable Type. I finally decided I wanted something more lightweight than WordPress, and in particular I wanted a blog that would load faster. Jekyll gives me both of those. The part that took the longest to migrate was customizing the new layout, based on Dbyll, and formatting the code examples in my posts.

I wasn’t happy with how slowly my site loaded when I used WordPress. I had the latest version, and I used plugins like WP Super Cache, but every page still took several seconds before anything even started to appear. I switched from WordPress to Jekyll for my portfolio a while back but that is a much simpler site. All I really need there is a single page listing my major projects and introducing myself. 3till7.net has always been mainly a blog, and it has several years worth of posts, but despite this, Jekyll still does what I need. Jekyll might have been a bad choice if I wanted to be able to post when I’m away from my laptop, but that’s never the case. My posts are usually long enough that I work on them for a while from my laptop; short, quick content goes to my Tumblr, Twitter, or other social networks.

Importing Posts

I first tried to use a plugin to download a copy of my WordPress database from within the WordPress admin dashboard. It seemed to download okay, but when I went to use my Jekyll WordPress importer script (see below), it errored out partway through because of a duplicate key error. Not wanting to fuss with it, I went to phpMyAdmin and exported a copy of my database from there. Again, the Jekyll importer failed, but this time because it said it couldn’t find the wp_users table. Strange, that table was definitely listed in phpMyAdmin. I imported the SQL dump into a local database and, sure enough, there was no wp_users table. A few other tables were missing. For whatever reason, I had to download another dump of my database from phpMyAdmin, this time with just the last few tables in the list included. I loaded both SQL dumps into a local database and this time the Jekyll importer finished successfully.

Jekyll WordPress importer

require "jekyll-import"

# I replaced the dbname, user, and password in my local copy. You can do
# that or pass them as command-line parameters to this script.
JekyllImport::Importers::WordPress.run({
  "dbname" => ARGV.shift, "user" => ARGV.shift, "password" => ARGV.shift,
  "host" => "localhost", "prefix" => "wp_", "clean_entities" => true,
  "comments" => true, "categories" => true, "tags" => true,
  "more_excerpt" => true, "more_anchor" => true, "status" => ["publish"]
})

To get jekyll serve to complete successfully, I had to go through the imported posts and adjust formatting on some of them. I have a lot of code examples in my blog and those had previously been styled inside <pre> tags. That was one of the reasons I wanted to move to Jekyll: writing posts wasn’t fun in the WordPress admin interface. It’s a nice interface, but I prefer to be in Sublime Text, like I am most of the day anyway for work and side projects. With Jekyll, not only am I writing inside my favorite editor, but I get to write in Markdown and skip a lot of HTML cruft.

Anyway, jekyll serve did not like posts that had a bunch of code within <pre> and <code> tags instead of indented or wrapped in backticks. So I fixed the most egregious formatting errors enough to get Jekyll to generate my site.

Disqus Comments

Then I wanted to import my comments. Comments were previously stored in my WordPress database, but since Jekyll produces static HTML pages, you need to use some external commenting system. I chose Disqus and used the WordPress admin dashboard to export all my WordPress comment into a big XML file. Disqus imported that file pretty quickly and all my comments were in. Despite adding their JavaScript to my post template in Jekyll, I wasn’t seeing comments for all my entries. This turned out to be a couple of problems:

Disqus matches URLs

Within the JavaScript I added to make Disqus comments appear, you can provide a disqus_url variable that tells Disqus which comments go with the post. My WordPress URLs looked like http://www.3till7.net/2014/01/13/post-title/ while my Jekyll URLs were http://www.3till7.net/2014/01/13/post-title/index.html. I created a custom filter to strip out the trailing index.html:

module Jekyll
  module Filters
    def disqus_url full_url
      full_url.sub(/index\.html$/, '')
    end
  end
end

Then I used that filter in my post.html template:

  
  <div class="comments">
  <div id="disqus_thread"></div>
  <script type="text/javascript">
  /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
  var disqus_shortname = 'mySiteShortname';
  var disqus_url = 'http://www.3till7.net{{ page.url | disqus_url }}';

  /* * * DON'T EDIT BELOW THIS LINE * * */
  (function() {
    var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
    dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
    (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
  })();
  </script>
  <noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
  <a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>
</div>

Jekyll had different dates for some posts

I used permalink: /:year/:month/:day/:title/index.html for my permalinks in Jekyll so I could preserve the same URLs as what I had in my WordPress blog, so I was surprised when some post URLs weren’t matching up. It turned out that when I ran the Jekyll importer, an additional property was set on each imported post:

date: 2012-09-16 22:36:49.000000000 -04:00

This was a date in addition to the date that was part of the post file name. I noticed that when the date property was at hour 19 or later, the URL generated for that post would be on the next day. So the file name might be 2012-09-16-my-post.markdown, and the date might be 2012-09-16 22:36 like above, but the URL in my generated Jekyll site would be /2012/09/17/my-post. I ended up writing a script (see below) to go through all my Jekyll posts and ensure that each one mapped to an existing URL on my WordPress site. I used this script before I took down the WordPress site and deployed Jekyll, of course. Once I had identified all posts whose permalinks were off by one day, I manually went through those posts and changed their date property so that its hour, minute, and second were 00:00:00. I left the file names alone, since they were correct all along.

Code Formatting

After I got comments showing up and permalinks matching, I went through my posts and adjusted their formatting further. I had to strip out WordPress shortcodes like [caption] because they just show up as plain text in Jekyll. I also had to change how code was presented, which was the majority of the formatting issues I had to fix. In WordPress, I would use <pre> tags with the lang attribute specifying which syntax highlighting rules to use. In Jekyll, you use the highlight block to add syntax highlighting to your code. I discovered you still need to indent your code by four spaces, even within the highlight block. I had the worst time trying to convert some posts with sample PHP code before I figured this out.

Having to indent code within the highlight block caused the code to appear as indented by four spaces in the final generated HTML page. To get around this, I added highlight_indent_fix.rb to _plugins/:

# See https://gist.github.com/zerowidth/5334029
Jekyll::Tags::HighlightBlock.module_eval do
  def render context
    code = strip_leading_space_from super
    if context.registers[:site].pygments
      render_pygments context, code
    else
      render_codehighlighter context, code
    end
  end

  def strip_leading_space_from code
    code.split("\n").map do |line|
      line.sub!(/^\s{4}/, '')
    end.join("\n")
  end
end

Search

Search was something provided in WordPress that I wanted to have with Jekyll, too. Since it’s all static files, I went with a JavaScript solution. Jekyll + lunr.js is what powers the search bar right now. It generates a gigantic JSON file of your site’s content that it searches through.

Generating Assets

For styling my site, I didn’t want to write plain CSS or JavaScript, so I set up LESS and CoffeeScript plugins. For LESS, I added gem 'therubyracer' and gem 'jekyll-less' to my Gemfile, ran bundle, and added bundler.rb to my _plugins/:

1
2
3

require 'rubygems'
require 'bundler/setup'
Bundler.require(:default)

Deploying

For deployment, I started to set up a Git post-receive hook on my server, but that means the server has to have Ruby, Pygments, Bundler, and all the gems I use installed. There’s no sense in that since I generate a copy of my site every time I test it locally. I followed Nathan Grigg’s advice about using rsync and set up the following deploy.sh script:

1
2
3

#!/usr/bin/env bash
# See http://nathangrigg.net/2012/04/rsyncing-jekyll/
rsync --compress --itemize-changes --recursive --checksum --delete _site/ myuser@mydomain.com:my_public_html_dir/

I made sure ~/.ssh/authorized_keys on my server had a copy of my public key. Now any time I want to deploy, I run jekyll serve locally to get an up-to-date copy of my _site folder. Then I run ./deploy.sh and my site gets updated with only modified files getting uploaded. The script prints out a list of which files it uploads.

WordPress Uploads

The easiest way to add images and other media to your WordPress posts is by using the media uploader built into the WordPress dashboard. I wanted to preserve all the files I had linked from my posts when I transitioned to Jekyll. I downloaded my wp-content/uploads directory and put it in my Jekyll site as assets/uploads. I did a search and replace across all my posts to update links. Since I had pruned down which posts I was migrating from my WordPress blog to the new Jekyll blog, though, a lot of files in the uploads directory were no longer used. I wrote the following script to delete files from assets/uploads that were not mentioned in any post:

class UnusedUploadFinder
  attr_reader :root_dir, :upload_dir, :posts_dir, :deleted_file_count,
              :deleted_dir_count

  def initialize
    @root_dir = File.expand_path(File.dirname(__FILE__))
    @upload_dir = File.join(@root_dir, 'assets', 'uploads')
    @posts_dir = File.join(@root_dir, '_posts')
    @deleted_file_count = 0
    @deleted_dir_count = 0
  end

  def file_contains? path, str
    File.readlines(path).grep(/#{str}/).size > 0
  end

  def post_contains? str
    Dir.glob(@posts_dir + '/**/*.markdown') do |post_path|
      return post_path if file_contains?(post_path, str)
    end
    false
  end

  def delete_unused_files
    Dir.glob(@upload_dir + '/**/*') do |item|
      if File.file?(item)
        file_name = item.sub(/^#{@upload_dir}/, '')
        if post_path=post_contains?(file_name)
          post_name = File.basename(post_path)
          puts "Post #{post_name} references #{file_name}"
        else
          puts "No post references #{file_name}, deleting it"
          File.delete item
          @deleted_file_count += 1
        end
      end
    end
  end

  def delete_empty_directories
    Dir.glob(@upload_dir + '/**/*') do |item|
      if File.directory?(item) && Dir[item + '/*'].empty?
        puts "Deleting empty directory #{item}"
        Dir.delete item
        @deleted_dir_count += 1
      end
    end
  end

  def process
    delete_unused_files
    delete_empty_directories
  end
end

finder = UnusedUploadFinder.new
finder.process
puts "----------------------"
puts "Deleted #{finder.deleted_file_count} files"
puts "Deleted #{finder.deleted_dir_count} directories"

URL Mapper

This URL mapper script can be used to check if your new Jekyll site will have the same URLs as the existing site you’re replacing. It expects you to have a sitemap.xml file in your Jekyll site whose entries point to your existing domain. That is, if you’re developing your Jekyll site on localhost:4000, the sitemap.xml file should not have http://localhost:4000 entries, but instead entries at your existing site URL. You can use Michael Levin’s sitemap generator.

require 'net/http'
require 'rexml/document'

class URLMapper
  attr_reader :sitemap_path, :jekyll_urls, :missing_urls, :total_urls

  def initialize
    @jekyll_urls = []
    @missing_urls = []
    @total_urls = 0
    @sitemap_path = File.join(File.expand_path(File.dirname(__FILE__)),
                              '_site', 'sitemap.xml')
  end

  def process
    extract_jekyll_urls
    check_jekyll_urls
    print_summary
  end

  private

  def extract_jekyll_urls
    xml_str = File.read(@sitemap_path)
    doc = REXML::Document.new(xml_str)
    doc.elements.each('urlset/url/loc') do |element|
      # Turn Jekyll-style permalinks into the permalinks used on WordPress
      @jekyll_urls << element.text.sub(/index\.html$/, '')
    end
  end

  def check_jekyll_urls
    @jekyll_urls.each do |url|
      print url
      uri = URI.parse(url)
      print '... '
      request = Net::HTTP.new(uri.host, uri.port)
      response = request.request_head(uri.path)
      if response.code == '200'
        puts "valid!"
      else
        puts "error #{response.code}!"
        @missing_urls << url
      end
      @total_urls += 1
    end
  end

  def print_summary
    puts '---------------------------'
    count = @missing_urls.size
    plural = count == 1 ? '' : 's'
    puts "Found #{count} unmapped URL#{plural} out of #{@total_urls}:"
    @missing_urls.each do |url|
      puts "\t#{url}"
    end
  end
end

URLMapper.new.process

Three till Seven

Search results: