Converting WordPress Export File to Hugo

I have written on the Hugo static site generator here. Now I have written a migration program in the Go programming language to convert from WordPress export format to Hugo format. This program wp2hugo.go is in GitHub. It can be freely downloaded and does not need any further dependencies, except, of course, Go. The Go software is in Arch Linux or Ubuntu.

To convert a blog from WordPress you have to create an export file.

If the blog is not too voluminous one downloads a single XML-file which contains all posts and pages. If the blog in question is larger then you will receive an e-mail from WordPress.com that you can download a ZIP which two or more XML files in them. If you have such a ZIP-file, then unpack it, for example by using p7zip. Then run

go run wp2hugo.go XML1 XML2 ...

This will produce empty directories archetypes, data, layouts, static,
and themes. It will create a directory content which has sub-directories page, and post, possibly private. This setup is similar to hugo new. It will furthermore produce two files config.toml and attachm.txt. Converting this blog will result in the following config.toml for example:


title = "Elmar Klausmeier's Weblog"
languageCode = "en"
baseURL = "https://eklausmeier.wordpress.com"
paginate = 20

[taxonomies]
   tag = "tags"
   category = "categories"
   archive = "archives"

[params]
   description = "Computers and Programming"

File attachm.txt contains a list of all attachments, i.e., in most cases these are images. In my case this file looks like this:

https://eklausmeier.files.wordpress.com/2016/12/cablesurf-speed1.png    cablesurf-speed1.png
https://eklausmeier.files.wordpress.com/2013/06/load99.png      load99.png
https://eklausmeier.files.wordpress.com/2014/09/c10ktitles.jpg  c10ktitles.jpg
...

It lists all files which are actually referenced in your blog. You can download them like this:

cd static
mkdir img
cd img
perl -ane '`curl $F[0] -o $F[1]\n`' ../../attachm.txt

You don’t have to download the files, if you already have your images on your local machine. wp2hugo.go changes all your blog posts and pages so that they reference their images (attachments) in /img/, i.e., static/img/.

I have two blogs on WordPress.com:

  1. Elmar Klausmeier’s Weblog, more than 220 posts, 4 pages
  2. Collected Links, almost 3,000 posts, 2 pages

Converting the first one using wp2hugo.go takes less than 2 seconds for 220 posts, the second, larger blog, takes less than 6 seconds for the 3,000 posts. These timings are on a desktop PC with an AMD octacore FX 8120 clocked at 3.1 GHz.

wp2hugo.go splits the XML export file where each post or page becomes a separate markdown file under content, additionally it handles the following specialities:

  1. Tags and categories
  2. Converts [code] and <pre> to ```
  3. Converts YouTube videos to {{< youtube ... >}}
  4. Handles Google maps
  5. Handles $latex, posts with TeX math get math=true in their frontmatter
  6. Corrects corrupted code in [code] blocks where special characters like lower than, greater than, or ampersand where erroneously transformed to HTML format by WordPress
  7. http converted to https
  8. Draft posts in WordPress are drafts in Hugo

I experimented with the converted files and used the following themes, which showed results without too much fiddling in config.toml:

  1. hugo-academic
  2. hugo-theme-bootstrap4-blog
  3. hugo-tranquilpeak-theme

Hugo, not wp2hugo.go, is a CPU hog. When Hugo reads all 3,000 posts then all 8 cores in my machines are mostly busy.

/tmp/H: time hugo --theme=hugo-theme-bootstrap4-blog
Started building sites ...
Built site for language en:
0 draft content
0 future content
0 expired content
2904 regular pages created
11394 other pages created
0 non-page files copied
6209 paginator pages created
0 archives created
5690 tags created
1 categories created
total in 116501 ms

real    1m56.727s
user    8m5.703s
sys     0m1.877s

I.e., after 2 minutes Hugo has processed all files, but bills 8 minutes because it has used more than one core. I ran this in /tmp, so there is no actual writing to disk; /tmp is mounted as tmpfs in Arch Linux.

Currently wp2hugo.go has the following limitations:

  1. Password protected posts in WordPress have no password in Hugo
  2. Handling for Vimeo shortcode [vimeo]
  3. Inlined TeX equations work, but displayed equations do not, e.g., On Differential Forms
  4. The highlight parameter in [code] is ignored
  5. When a post references to a page this link will be 404, while references to all other posts work fine

wp2hugo.go works as follows:

  1. Iterate over all filenames given as arguments
  2. Fill Go maps config[], frontmatter[], attachm[], etc.
  3. Find posts or pages within item-tag
  4. Use various regular expressions to change the body of posts and pages — they would nicely fit into a configuration file

The larger of the two blogs has previously been migrated from del.icio.us to Collected Links using a Perl script, which generated WordPress import/export format, see Migrating from delicious.com to WordPress.

It deserves another article how to actually bring the converted blog to GitHub, GitLab, Netlify, etc.

Using Odroid as IP Router

I purchased an Odroid-XU4 for ca. 80 EUR including power-supply and case from Pollin. The original manufacturer is hardkernel. I intended to use this small ARM computer as a router and firewall. In the past I had used routers from multiple vendors, e.g., Linksys/Cisco, TP-Link, AVM/FritzBox, Netgear, and so on. There is a rule of thumb with all these devices: Usually you have to reboot them once or twice a month, otherwise they misbehave somehow. At least three of these device went completely catatonic. Now I had enough of this, I also wanted a command line interface to the router, ideally a real Linux system with bash, cron, gcc, etc. Although I already own an Intel NUC and I am very happy with this computer, an Intel NUC is a little bit too expensive to be used as just a router.

I recommend to additionally purchase a RTC backup battery. The Odroid has a realtime clock, but looses all date and time information once powered off. This way the log of the computer is garbled.

Continue reading

Exceeding 10.000 km with E-Bike

As described in Commuting to Work with an E-Bike I drive to my work place using an e-bike. As I found out that using a bike is a viable alternative to public transport I bought a better e-bike after about a year. I bought my first e-bike in March 2015, my second in December 2015, and used it since January 2016. With this second e-bike I now exceeded more than 10.000 km within a year, as shown in below photographs from the odometer. In total I have now travelled about 20.000 km with my two bikes within two years, i.e., I cycled half the earth perimeter.

Analyzing above recorded data from Nyon on my smartphone:

My second bike had three flat tires, and a loose crankshaft. I replaced the chain two times, and the sprocket once, furthermore I replaced both pedals. Apart from that I had no further issues.

Wiping Disks

How fast can you wipe a complete disk? For this I dumped zeros to a MS Windows partition based on an SSD formatted with NTFS.

[root@i7 ~]# time dd if=/dev/zero of=/dev/sda1 bs=1M
dd: error writing '/dev/sda1': No space left on device
81920+0 records in
81919+0 records out
85898297344 bytes (86 GB, 80 GiB) copied, 182.699 s, 470 MB/s

real    3m2.703s
user    0m0.033s
sys     0m43.470s

It was a real pleasure to get rid of MS Windows after just 3 minutes 😉

Once more, second partition on SSD formatted with NTFS.

[root@i7 ~]# time dd if=/dev/zero of=/dev/sda2 bs=1M
dd: error writing '/dev/sda2': No space left on device
147015+0 records in
147014+0 records out
154155352064 bytes (154 GB, 144 GiB) copied, 326.108 s, 473 MB/s

real    5m26.110s
user    0m0.097s
sys     1m21.503s

Now a hard disk, i.e., moving parts, also formatted with NTFS.

[root@i7 ~]# time dd if=/dev/zero of=/dev/sdb1 bs=4M
128459+0 records in
128459+0 records out
538796097536 bytes (539 GB, 502 GiB) copied, 5404.54 s, 99.7 MB/s     <--- from kill -10
dd: error writing '/dev/sdb1': No space left on device
238467+0 records in
238466+0 records out
1000202043392 bytes (1.0 TB, 932 GiB) copied, 11876.6 s, 84.2 MB/s

real    197m56.647s
user    0m0.000s
sys     15m49.763s

From the man-page of dd:

Sending a USR1 signal to a running ‘dd’ process makes it print I/O statistics to standard error and then resume copying.

For example, in above scenario I used

kill -10 12523

Signal numer 10 is SIGUSR1, see /usr/include/bits/signum.h, or

$ kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL       5) SIGTRAP
 6) SIGABRT      7) SIGBUS       8) SIGFPE       9) SIGKILL     10) SIGUSR1
11) SIGSEGV     12) SIGUSR2     13) SIGPIPE     14) SIGALRM     15) SIGTERM
16) SIGSTKFLT   17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG      24) SIGXCPU     25) SIGXFSZ
26) SIGVTALRM   27) SIGPROF     28) SIGWINCH    29) SIGIO       30) SIGPWR
31) SIGSYS      34) SIGRTMIN    35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3
38) SIGRTMIN+4  39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7
58) SIGRTMAX-6  59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX

Here is another test with an USB 3 Seagate 2 TB hard disk, previously it had ext4:

[root@C ~]# time dd if=/dev/zero of=/dev/sdc1 bs=1M
dd: error writing '/dev/sdc1': No space left on device
1907729+0 records in
1907728+0 records out
2000397795328 bytes (2.0 TB, 1.8 TiB) copied, 21845.9 s, 91.6 MB/s

real    364m5.890s
user    0m1.020s
sys     47m33.056s

Just for clarification, it is irrespective of the previous underlying filesystem if you use dd on the whole disk. dd will happily wipe everything and just write blocks to the disk.

Make USB Stick Bootable

In various forums one reads that one should use dd to copy a ISO image to an USB stick. Although this works, more often you do not want to use a ISO image but rather copy a Linux system at hand. First you mount the root filesystem of the new USB stick, then mount /boot within a chroot. Finally use grub commands. I.e., type

mount /dev/sdc2 /mnt/stick
arch-chroot /mnt/stick
mount /dev/sdc1 /boot           <--- /boot is local to chroot!
grub-install --target=i386-pc --boot-directory=/boot /dev/sdc
grub-mkconfig -o /boot/grub/grub.cfg
umount /boot                    <--- umount "local" /boot

Also see GRUB in the Arch Wiki.

Setting the bootable flag in the partition table one either uses gparted, a graphical tool, or, as the task at hand is so simple, one just uses parted.

parted /dev/sdc
set 1 boot on
print

Unrelated, but often useful. Just in case you changed something in the initial RAM disk, use

mkinitcpio -p linux

Switching from ext4 to btrfs

Hearing all these wonder stories about btrfs I decided to give btrfs a try. Additionally I encrypted my SSD using LUKS.

The relevant entries in the Arch WIKI are: Setting up LUKS and btrfs. Here is the list of commands I used.

1. I saved my data from SSD using tar and saved tar-file on hard-disk:

cd /
time tar cf /mnt/c/home/klm/ssd1.tar bin boot etc home opt root srv usr var

Alternatively one can use cp -a to copy directories with symbolic links. Option -a is the same as -dR --preserve=all.

2. I used gparted to partition the SSD, creating /boot and / (root). For /boot I directly enabled btrfs in gparted.

3. Encrypt the partition.

cryptsetup -y -v -s 512 luksFormat /dev/sdc2

Then open it and give it an arbitrary name:

cryptsetup luksOpen /dev/sdc2 cryptssd

4. Create filesystem using btrfs. This is the reason for all this effort. Although this is the easiest.

mkfs.btrfs /dev/mapper/cryptssd

5. Adapt /etc/fstab, e.g., with genfstab -U /mnt >> /mnt/etc/fstab

UUID=3b8bb70c-390a-4a9e-9245-ea19af509282       /       btrfs   rw,relatime     0 0
UUID=a8d6c185-0769-4ec5-9088-2c7087815346       /boot   ext4    rw,data=ordered,relatime        0 2

Check results with lsblk -if.

6. Chroot into new system using arch-chroot and put GRUB on it, as usual. Add required directories, first.

mkdir boot proc run sys tmp

Then edit the configuraton file for GRUB:

vi /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="cryptdevice=UUID=5a74247e-75e8-4c05-89a7-66454f96f974:cryptssd:allow-discards root=/dev/mapper/cryptssd"

grub-install --target=i386-pc /dev/sdb
grub-mkconfig -o /boot/grub/grub.cfg

Keyword :allow-discards after cryptssd is for SSD TRIM. For a mechanical hard drive this keyword should be omitted.

7. Install btrfs utilities and programs on new system if not already installed. Add the btrfs executable to the initial RAM disk, i.e., set the entry for BINARIES.

pacman -S btrfs-progs

vi /etc/mkinitcpio.conf
. . .
BINARIES="/usr/bin/btrfs"
. . .
mkinitcpio -p linux

8. Extracting back from the tar-file.

time tar xf /mnt/c/home/klm/ssd1.tar

9. Adding TRIM for SSD:

systemctl status fstrim
systemctl enable fstrim.timer
systemctl start fstrim.timer

Show timers (like crontab -l):

systemctl --type=timer

10. A simple benchmark, as indicated by above time before tar, does not show any performance benefits so far. But performance was not the main motivator, but rather the added functionality, especially taking snapshots.