This is a continuation of my series on migrating my web site off WordPress and onto Jekyll
Cleaning up the mess left by WordPress
I've used the [WordpPress to Jekyll] exporter to convert the articles to Markdown. Once you export all of the markdown files you need might think that the work is done and you just need to drop them into your new Jekyll site and build it. Almost, but not quite.
Every single header of every blog post was messed up by some tag that each WordPress plugin or WordPress itself decided to add for its own reason. This was so bad that even the 'author' field was scrambled and Jekyll couldn't properly parse it so it wouldn't show the author in any of the articles if you tried publishing them if it would even build.
A normal Jekyll header looks like this:
--- layout: post title: Information overload date: 2012-02-05 18:19:03.000000000 +01:00 type: post published: true status: publish categories: - Tech author: Valentino Urbano ---
The headers from the WordPress export looked like this:
layout: post title: Information overload date: 2012-02-05 18:19:03.000000000 +01:00 type: post published: true status: publish categories: - Tech -tags: \[\] -meta: -\_wpcom\_is\_markdown: '1' -\_edit\_last: '1' -\_yoast\_wpseo\_focuskw: '' -\_yoast\_wpseo\_title: '' -\_yoast\_wpseo\_metadesc: '' -\_yoast\_wpseo\_meta-robots-noindex: '0' -\_yoast\_wpseo\_meta-robots-nofollow: '0' -\_yoast\_wpseo\_meta-robots-adv: none -\_yoast\_wpseo\_sitemap-include: '-' -\_yoast\_wpseo\_sitemap-prio: '-' -\_yoast\_wpseo\_canonical: '' -\_yoast\_wpseo\_redirect: '' -author: -login: Myshar -email: [email protected] -display\_name: Valentino Urbano -first\_name: '' -last\_name: '' ---
I made a really dumb and inefficient script to clean them all. I'm not going to share it for now, because it's tailored to my installation and it might do huge amount of damage to your posts if their structure is different from mine. Since things like installed plugins (and more) affect the structure of the export the chance of a mistake is quite high. I don't want that responsibility.
What I'm going to do is give you the structure of the script. It scans the file, and stops one line after
categories (I wanted to keep only one category per post, so I'm ditching all the others). It replaces everything before
author: Valentino Urbano. Nothing more.
Fortunately all of the other tags did export correctly so they didn't need any tweaking.