Compiling WordNet on Windows to use with Emacs

WordNet running on MSYS2 on Windows

As a non-native English speaker who reads, writes and reviews lots of English texts, I frequently look up definitions as well as synonyms of words. Of course there are numerous online sources available to do this, but I like to decrease my online 'footprint' due to privacy reasons. It also takes extra time to switch to a browser window and enter a search query.

Fortunately the fine folks at Princeton University compiled WordNet [1], a large lexical database of English, which can be used offline - together with a tool to search that database. Even better, somebody wrote a package to use WordNet inside my favorite editor Emacs [2]. This means that just by hovering the cursor over a word inside Emacs, the definition as well as synonyms can be shown. The source code [3] is kindly provided by Princeton University.

Compiling WordNet using MSYS2 on/for Windows

As is usually the case, compiling on/for Windows using the MSYS2 subsystem [4] can be done, with a few minor tweaks.

First, start a MSYS2 shell and install the required dependencies (build tools, as well as the programming language Tcl and its widget toolkit Tk ):

pacman -Sy --noconfirm base-devel mingw-w64-x86_64-tcl mingw-w64-x86_64-tk

Then …

more ...

VirtualBox Does Not Automatically Resize Disk Image

I use VirtualBox [1] a lot as (local) virtualization software. It is a full-featured virtualization host, and supports multiple underlying disk image file types for guests.

One of those is VirtualBox' native Virtual Disk Image or VDI file type. An advantage of this type is that one can create a dynamically allocated image. This image will initially be very small and not occupy any space for unused virtual disk sectors, but will grow when a disk sector is written to for the first time. VirtualBox does this by checking for unused sectors.

However, this poses issues for disks with multiple partitions. If the last partition is say a (unused) swap partition, then VirtualBox does not automatically grow the underlying image. Even though the first partition is full, VirtualBox will not grow and therefore the host disk will be full without having reached its full potential.

To solve this issue, the machine needs to be partitioned using one big happy partition. Then VirtualBox will dynamically resize according to expectations.

I use packer [2] to prepare disk images for Debian, together with a preseed [3] file. Using preseeding to partition the disk is limited to what is supported by the partition tool …

more ...

Setting Up a New Sphinx Documentation Framework

When having to write documentation for different formats, I always use the reStructuredText [1] (or reST) format. As this is something that happens quite often, it made sense to put some effort in automating the set up of a new documentation framework, a reusable set up script.

Setting up a framework

The standard documentation framework that I use consists of Sphinx [2], which takes care of converting source pages written in reST into several formats: For example HTML, but also PDF or something more exotic like ePub files. Note that Sphinx already comes with a setup script, sphinx-quickstart [3] - but this doesn't take care of deploying files.

In order to be able to create a reusable framework, I split the necessary files into three groups:

  • The Sphinx configuration itself,
  • version information, and
  • a LaTeX formatting template.

The Sphinx configuration

This part consists of two different files; A generic Makefile [4] to build the different artifact types - as well as a Sphinx configuration file (conf.py [5]) containing basic information about the project, and plugin details. These files rarely change after having initialized the framework.

Version information

The version information (version, or build number) can change per release, and is therefore contained in a separate …

more ...

Customize and theme tmux the easy way

tmux

Terminal multiplexers allow you to view multiple separate terminal sessions within a single terminal window. Tmux is my terminal multiplexer of choice, as it has more features than the 'original multiplexer' GNU Screen. The default setup gives you some information, but its appearance is, well...

tmux-default

Fortunately you can theme, or customize pretty much everything: From the colors to the information being shown in the status bar.

tmux-dracula

In order to make it easier to theme tmux, I split the tmux configuration file into two separate files. One file contains the main configuration ( ~/.tmux.conf ), and another file contains only theming (visual) variables ( ~/.tmux.THEMENAME.theme ). This setup makes it easier to switch different themes, without changing the main tmux configuration file.

As I wanted to automatically load a theme based on a shell environment variable, I added a small piece of code to the main tmux configuration file. This executes a shell command, which in turn loads the correct theme file.

run-shell "tmux source-file ~/.tmux.\${TMUX_THEME:-default}.theme"

The theme file is loaded dynamically, based on the environment variable $TMUX_THEME . If the environment variable is not set or empty, then the default theme is loaded: ~/.tmux.default.theme .

Loading a different …

more ...

Improving cross-subsystem git workflow: The different git configuration files

Cross-platform

Git configuration settings can be stored in three different files: The system configuration file, the global configuration file and the repository's local configuration file. See git on Windows - location of configuration files [1] for their locations.

When you use multiple subsystems on Windows (like MSYS2, Cygwin or any of the the Windows Subsystem for Linux distributions) it can be a chore to keep the git configurations synchronized. In other words: The less configuration files to maintain, the better.

Whether it's git for Windows, or one of the subsystem-specific git binaries:

Each of the git binaries that runs on Windows expands the tilde ( ~ ) to the home directory, and the path separator is always a slash ( / ).

These features can be used in our advantage in order to simplify the git configuration files between all subsystems.

Re-defining the system

The system configuration file is meant to store all system-specific configuration settings, which will be applied to all users and git repositories on the system.

If you're the only user of your workstation, it makes sense to re-define system as subsystem:

All subsystem-dependent git configuration settings should be set in the system git configuration file.

This means that settings depending on underlying binaries, like …

more ...

Diff binary files like docx, odt and pdf with git

conversion_tools

Working with binary file types like the Microsoft Word XML Format Document docx , the OpenDocument Text odt format and the Portable Document Format pdf in combination with git has its difficulties. Out of the box, git only provides diffing for plain text formats. Comparing binary files in textual format is not supported.

With a simple configuration change and some open source, cross-platform tools, git can be adapted to diff those formats as well.

Installing the tools

First, one needs the tools which can convert the binary files to plain text formats. For most formats like docx and odt , the open source tool Pandoc [1] will do the trick. It can even export those files to Markdown format, or (my personal choice) reStructuredText [2]. A markup language like reStructuredText makes it possible to make a detailed comparison between structured documents, for instance when the heading level changed.

For PDF, there's the open source tool pdftotext , which is part of the Poppler [3] utils package and available for (almost) all operating systems. This can convert a PDF file to plain text.

There's a tiny catch with pdftotext , as it has issues using stdout as output, instead of writing to files. This is …

more ...

Generate list of used content tags for Pelican

If your Pelican-generated site uses lots of different tags for articles, it can be difficult to remember or use tag names consistently. Therefore I needed a quick method to print (comma separated) unique tags that were stored in text files.

This shell one-liner from within the content directory will sort and show all tags from reStructuredText ( *.rst ) files:

grep -h '^:tags:' *.rst | sed -e 's/^:tags:\s*//;s/\s*,\s*/\n/g' | sort -u

First grep will filter on the :tags: property and will only print out the matching line (without filename, thanks to the -h flag).

Then sed will remove the :tags: keyword (and trailing spaces), and all tags will be split using newline characters.

Finally, sort takes care of sorting and only printing unique entries.

Analogous, one can do the same for categories:

grep -h '^:category:' *.rst | sed -e 's/^:category:\s*//' | sort -u

As Pelican only allows one category, this is somewhat simpler.

For maximum readability, tr can convert the newlines into spaces, so that the output is one big line:

grep -h '^:tags:' *.rst | sed -e 's/^:tags:\s*//;s/\s*,\s*/\n/g' | sort -u | tr '\n' ' '; echo

The last echo is meant to end …

more ...

Convert WordPress to static site generator Pelican

Pelican

After a number of years using WordPress as blogging software, I converted the site to a static site generator: Pelican.

Pelican converts reStructuredText into static HTML. No more PHP, no more databases, but straight static HTML.

The process of converting the site was relatively painless. The conversion tool did a great job of converting an XML export of WordPress into reStructuredText pages.

What needed (and still needs) some manual care were/are the code blocks (the biggest reason of the move from WordPress to Pelican) in articles, and the escaping of variables. WordPress gets pretty complex once you're trying to use it for code snippets and console outputs. The reStructuredText is much more flexible and allows you to edit the site using any text editor. There are tools to do that with WordPress and its API, but it always felt like a difficult workaround.

I thought about keeping the URLs as-is: Over the years the number of visitors of the site has steadily risen, as has the level of indexing by search engines. You don't want dead links - but on the other hand, a transition to another content management system would be the perfect moment to 'clean up' the category …

more ...

zsh shell inside Emacs on Windows

Configuring Emacs (on Windows) to use the zsh shell can be tricky, especially when you use ( oh my zsh) plugins or fancy prompts. Emacs sets an environment variable when running a shell, which can be used to selectively disable plugins and change prompts. Configuring the SSH client and server to set and accept that variable makes ssh-ing inside Emacs to remote servers possible as well.

more ...

Use Emacs to create OAuth 2.0 UML sequence diagrams

OAuth 2.0 abstract protocol flow

It seems that the OAuth 2.0 framework is more and more being used by web (and mobile) applications. Great !

Although the protocol itself is not that complex, there are a number of different use-cases, flows and implementations to choose from. As with most things in life, the devil is in the detail.

When reviewing OAuth 2.0 implementations or writing penetration testing reports I like to draw UML diagrams. That makes it easier to understand what's going on, and to spot potential issues. After all, a picture is worth a thousand words.

This can be done extremely easy using the GPL-licensed open source Emacs editor, in conjunction with the GPL-licensed open source tool PlantUML (and optionally using Eclipse Public Licensed Graphviz).

Emacs is worlds' most versatile editor. In this case, it's being used to edit the text, and automatically convert the text to an image. PlantUML is a tool which allows you to write UML in human readable text and does the actual conversion. Graphviz is visualization software, and optionally - in this case, it's used to show certain images.

Download the compiled PlantUML jar file, Emacs and optionally download and install Graphviz.

Once you have Emacs installed and running …

more ...