Converting WordPress Export File to Hugo

I have written on the Hugo static site generator here. Now I have written a migration program in the Go programming language to convert from WordPress export format to Hugo format. This program wp2hugo.go is in GitHub. It can be freely downloaded and does not need any further dependencies, except, of course, Go. The Go software is in Arch Linux or Ubuntu.

To convert a blog from WordPress you have to create an export file.

If the blog is not too voluminous one downloads a single XML-file which contains all posts and pages. If the blog in question is larger then you will receive an e-mail from WordPress.com that you can download a ZIP which two or more XML files in them. If you have such a ZIP-file, then unpack it, for example by using p7zip. Then run

go run wp2hugo.go XML1 XML2 ...

This will produce empty directories archetypes, data, layouts, static,
and themes. It will create a directory content which has sub-directories page, and post, possibly private. This setup is similar to hugo new. It will furthermore produce two files config.toml and attachm.txt. Converting this blog will result in the following config.toml for example:


title = "Elmar Klausmeier's Weblog"
languageCode = "en"
baseURL = "https://eklausmeier.wordpress.com"
paginate = 20

[taxonomies]
   tag = "tags"
   category = "categories"
   archive = "archives"

[params]
   description = "Computers and Programming"

File attachm.txt contains a list of all attachments, i.e., in most cases these are images. In my case this file looks like this:

https://eklausmeier.files.wordpress.com/2016/12/cablesurf-speed1.png    cablesurf-speed1.png
https://eklausmeier.files.wordpress.com/2013/06/load99.png      load99.png
https://eklausmeier.files.wordpress.com/2014/09/c10ktitles.jpg  c10ktitles.jpg
...

It lists all files which are actually referenced in your blog. You can download them like this:

cd static
mkdir img
cd img
perl -ane '`curl $F[0] -o $F[1]\n`' ../../attachm.txt

You don’t have to download the files, if you already have your images on your local machine. wp2hugo.go changes all your blog posts and pages so that they reference their images (attachments) in /img/, i.e., static/img/.

I have two blogs on WordPress.com:

  1. Elmar Klausmeier’s Weblog, more than 220 posts, 4 pages
  2. Collected Links, almost 3,000 posts, 2 pages

Converting the first one using wp2hugo.go takes less than 2 seconds for 220 posts, the second, larger blog, takes less than 6 seconds for the 3,000 posts. These timings are on a desktop PC with an AMD octacore FX 8120 clocked at 3.1 GHz.

wp2hugo.go splits the XML export file where each post or page becomes a separate markdown file under content, additionally it handles the following specialities:

  1. Tags and categories
  2. Converts [code] and <pre> to ```
  3. Converts YouTube videos to {{< youtube ... >}}
  4. Handles Google maps
  5. Handles $latex, posts with TeX math get math=true in their frontmatter
  6. Corrects corrupted code in [code] blocks where special characters like lower than, greater than, or ampersand where erroneously transformed to HTML format by WordPress
  7. http converted to https
  8. Draft posts in WordPress are drafts in Hugo

I experimented with the converted files and used the following themes, which showed results without too much fiddling in config.toml:

  1. hugo-academic
  2. hugo-theme-bootstrap4-blog
  3. hugo-tranquilpeak-theme

Hugo, not wp2hugo.go, is a CPU hog. When Hugo reads all 3,000 posts then all 8 cores in my machines are mostly busy.

/tmp/H: time hugo --theme=hugo-theme-bootstrap4-blog
Started building sites ...
Built site for language en:
0 draft content
0 future content
0 expired content
2904 regular pages created
11394 other pages created
0 non-page files copied
6209 paginator pages created
0 archives created
5690 tags created
1 categories created
total in 116501 ms

real    1m56.727s
user    8m5.703s
sys     0m1.877s

I.e., after 2 minutes Hugo has processed all files, but bills 8 minutes because it has used more than one core. I ran this in /tmp, so there is no actual writing to disk; /tmp is mounted as tmpfs in Arch Linux.

Currently wp2hugo.go has the following limitations:

  1. Password protected posts in WordPress have no password in Hugo
  2. Handling for Vimeo shortcode [vimeo]
  3. Inlined TeX equations work, but displayed equations do not, e.g., On Differential Forms
  4. The highlight parameter in [code] is ignored
  5. When a post references to a page this link will be 404, while references to all other posts work fine

wp2hugo.go works as follows:

  1. Iterate over all filenames given as arguments
  2. Fill Go maps config[], frontmatter[], attachm[], etc.
  3. Find posts or pages within item-tag
  4. Use various regular expressions to change the body of posts and pages — they would nicely fit into a configuration file

The larger of the two blogs has previously been migrated from del.icio.us to Collected Links using a Perl script, which generated WordPress import/export format, see Migrating from delicious.com to WordPress.

It deserves another article how to actually bring the converted blog to GitHub, GitLab, Netlify, etc.

Exporting Exchange/Outlook GAL to vCard

This post is about the Microsoft Exchange GAL, i.e., the global address list. The task is to export the data in the GAL to vCard format.

Microsoft Outlook stores local caches of the GAL in %userprofile%\Local Settings\Application Data\Microsoft\Outlook, see Administering the offline address book in Outlook. On my computer they look like this

 Listing of D:\Users\...\AppData\Local\Microsoft\Outlook\Offline Address Books\...

21.05.2016  20:44    <DIR>          .
21.05.2016  20:44    <DIR>          ..
21.05.2016  20:44         3.818.260 uanrdex.oab
21.05.2016  20:44           686.956 ubrowse.oab
21.05.2016  20:44        56.310.184 udetails.oab
21.05.2016  20:44                20 updndex.oab
21.05.2016  20:44         1.373.676 urdndex.oab
21.05.2016  20:44            25.915 utmplts.oab
               6 Files,      62.215.011 Bytes

Continue reading

GCC 6.1 Compiler Optimization Level Benchmarks

In Effect of Optimizer in gcc on Intel/AMD and Power8 I measured speed ratios between optimized and non-optimized C code of three on Intel/AMD, and eight on Power8 (PowerPC) for integer calculations. For floating-point calculations the factors were two and three, respectively.

Michael Larabel in GCC 6.1 Compiler Optimization Level Benchmarks: -O0 To -Ofast + FLTO measured various optimization flags of the newest GCC.

For a Poisson solver the speed ratio between optimized and non-optimized code was five.

HimenoBenchmarkGCC61

Convert ASCII to Hex and vice versa in C and Excel VBA

In Downloading Binary Data, for example Boost C++ Library I already complained about some company policies regarding the transfer of binary data. If the openssl command is available on the receiving end, then things are pretty straightforard as the aforementioned link shows, in particular you then have Base64 encoding. If that is not the case but you have a C compiler, or at least Excel, then you can work around it.

C program ascii2hex.c converts from arbitrary data to hex, and vice versa. Excel VBA (Visual Basic for Applications) ascii2hex.xls converts from hex to arbitrary data.

To convert from arbitrary data to a hex representation

ascii2hex -h yourBinary outputInHex

Back from hex to ASCII:

ascii2hex -a inHex outputInBinary

Continue reading

Performance Comparison C vs. Lua vs. LuaJIT vs. Java

Ico Doornekamp on 20-Dec-2011 asked why a C version of a Lua program ran more slowly than the Lua program. The mentioned discrepancy cannot be reproduced, neither on an AMD FX-8120, nor an Intel i5-4250U processor. Generally a C version program is expected to be faster than a Lua program.

Here is the Lua program called lua_perf.lua:

local N = 4000
local S = 1000

local t = {}

for i = 0, N do
        t[i] = {
                a = 0,
                b = 1,
                f = i * 0.25
        }
end

for j = 0, S-1 do
        for i = 0, N-1 do
                t[i].a = t[i].a + t[i].b * t[i].f
                t[i].b = t[i].b - t[i].a * t[i].f
        end
        print(string.format("%.6f", t[1].a))
end

It computes values for a circle.
lua_perf

Continue reading