Generate list of used content tags for Pelican

If your Pelican-generated site uses lots of different tags for articles, it can be difficult to remember or use tag names consistently. Therefore I needed a quick method to print (comma separated) unique tags that were stored in text files.

This shell one-liner from within the content directory will sort and show all tags from reStructuredText (*.rst ) files:

grep -h '^:tags:' *.rst | sed -e 's/^:tags:\s*//;s/\s*,\s*/\n/g' | sort -u

First grep will filter on the :tags: property and will only print out the matching line (without filename, thanks to the -h flag).

Then sed will remove the :tags: keyword (and trailing spaces), and all tags will be split using newline characters.

Finally, sort takes care of sorting and only printing unique entries.

Analogous, one can do the same for categories:

grep -h '^:category:' *.rst | sed -e 's/^:category:\s*//' | sort -u

As Pelican only allows one category, this is somewhat simpler.

For maximum readability, tr can convert the newlines into spaces, so that the output is one big line:

grep -h '^:tags:' *.rst | sed -e 's/^:tags:\s*//;s/\s*,\s*/\n/g' | sort -u | tr '\n' ' '; echo

The last echo is meant to end the line correctly, or else it would be a partial line. Shells like zsh can automatically detect (and show) whether a line is correctly terminated, but it's nicer to make sure that the output is correctly formatted.

One-liners to rule 'em all...

Note

The \s character set is a sed GNU extension, so it'll only work on GNU sed. When using non-GNU compatible sed, you can replace \s with [[:space::] , which is the same whitespace character set.

Related Posts:

Comments