Generate list of used content tags for Pelican

If your Pelican-generated site uses lots of different tags for articles, it can be difficult to remember or use tag names consistently. Therefore I needed a quick method to print (comma separated) unique tags that were stored in text files.

This shell one-liner from within the content directory will sort and show all tags from reStructuredText ( *.rst ) files:

grep -h '^:tags:' *.rst | sed -e 's/^:tags:\s*//;s/\s*,\s*/\n/g' | sort -u

First grep will filter on the :tags: property and will only print out the matching line (without filename, thanks to the -h flag).

Then sed will remove the :tags: keyword (and trailing spaces), and all tags will be split using newline characters.

Finally, sort takes care of sorting and only printing unique entries.

Analogous, one can do the same for categories:

grep -h '^:category:' *.rst | sed -e 's/^:category:\s*//' | sort -u

As Pelican only allows one category, this is somewhat simpler.

For maximum readability, tr can convert the newlines into spaces, so that the output is one big line:

grep -h '^:tags:' *.rst | sed -e 's/^:tags:\s*//;s/\s*,\s*/\n/g' | sort -u | tr '\n' ' '; echo

The last echo is meant to end …

more ...

Convert WordPress to static site generator Pelican


After a number of years using WordPress as blogging software, I converted the site to a static site generator: Pelican.

Pelican converts reStructuredText into static HTML. No more PHP, no more databases, but straight static HTML.

The process of converting the site was relatively painless. The conversion tool did a great job of converting an XML export of WordPress into reStructuredText pages.

What needed (and still needs) some manual care were/are the code blocks (the biggest reason of the move from WordPress to Pelican) in articles, and the escaping of variables. WordPress gets pretty complex once you're trying to use it for code snippets and console outputs. The reStructuredText is much more flexible and allows you to edit the site using any text editor. There are tools to do that with WordPress and its API, but it always felt like a difficult workaround.

I thought about keeping the URLs as-is: Over the years the number of visitors of the site has steadily risen, as has the level of indexing by search engines. You don't want dead links - but on the other hand, a transition to another content management system would be the perfect moment to 'clean up' the category …

more ...