Diff binary files like docx, odt and pdf with git

conversion_tools

Working with binary file types like the Microsoft Word XML Format Document docx , the OpenDocument Text odt format and the Portable Document Format pdf in combination with git has its difficulties. Out of the box, git only provides diffing for plain text formats. Comparing binary files in textual format is not supported.

With a simple configuration change and some open source, cross-platform tools, git can be adapted to diff those formats as well.

Installing the tools

First, one needs the tools which can convert the binary files to plain text formats. For most formats like docx and odt , the open source tool Pandoc [1] will do the trick. It can even export those files to Markdown format, or (my personal choice) reStructuredText [2]. A markup language like reStructuredText makes it possible to make a detailed comparison between structured documents, for instance when the heading level changed.

For PDF, there's the open source tool pdftotext , which is part of the Poppler [3] utils package and available for (almost) all operating systems. This can convert a PDF file to plain text.

There's a tiny catch with pdftotext , as it has issues using stdout as output, instead of writing to files. This is …

more ...

Generate list of used content tags for Pelican

If your Pelican-generated site uses lots of different tags for articles, it can be difficult to remember or use tag names consistently. Therefore I needed a quick method to print (comma separated) unique tags that were stored in text files.

This shell one-liner from within the content directory will sort and show all tags from reStructuredText ( *.rst ) files:

grep -h '^:tags:' *.rst | sed -e 's/^:tags:\s*//;s/\s*,\s*/\n/g' | sort -u

First grep will filter on the :tags: property and will only print out the matching line (without filename, thanks to the -h flag).

Then sed will remove the :tags: keyword (and trailing spaces), and all tags will be split using newline characters.

Finally, sort takes care of sorting and only printing unique entries.

Analogous, one can do the same for categories:

grep -h '^:category:' *.rst | sed -e 's/^:category:\s*//' | sort -u

As Pelican only allows one category, this is somewhat simpler.

For maximum readability, tr can convert the newlines into spaces, so that the output is one big line:

grep -h '^:tags:' *.rst | sed -e 's/^:tags:\s*//;s/\s*,\s*/\n/g' | sort -u | tr '\n' ' '; echo

The last echo is meant to end …

more ...

Convert WordPress to static site generator Pelican

Pelican

After a number of years using WordPress as blogging software, I converted the site to a static site generator: Pelican.

Pelican converts reStructuredText into static HTML. No more PHP, no more databases, but straight static HTML.

The process of converting the site was relatively painless. The conversion tool did a great job of converting an XML export of WordPress into reStructuredText pages.

What needed (and still needs) some manual care were/are the code blocks (the biggest reason of the move from WordPress to Pelican) in articles, and the escaping of variables. WordPress gets pretty complex once you're trying to use it for code snippets and console outputs. The reStructuredText is much more flexible and allows you to edit the site using any text editor. There are tools to do that with WordPress and its API, but it always felt like a difficult workaround.

I thought about keeping the URLs as-is: Over the years the number of visitors of the site has steadily risen, as has the level of indexing by search engines. You don't want dead links - but on the other hand, a transition to another content management system would be the perfect moment to 'clean up' the category …

more ...

zsh shell inside Emacs on Windows

Configuring Emacs (on Windows) to use the zsh shell can be tricky, especially when you use ( oh my zsh) plugins or fancy prompts. Emacs sets an environment variable when running a shell, which can be used to selectively disable plugins and change prompts. Configuring the SSH client and server to set and accept that variable makes ssh-ing inside Emacs to remote servers possible as well.

more ...

Use Emacs to create OAuth 2.0 UML sequence diagrams

OAuth 2.0 abstract protocol flow

It seems that the OAuth 2.0 protocol is more and more being used by web (and mobile) applications. Great !

Although the protocol itself is not that complex, there are a number of different use-cases, flows and implementations to choose from. As with most things in life, the devil is in the detail.

When reviewing OAuth 2.0 implementations or writing penetration testing reports I like to draw UML diagrams. That makes it easier to understand what's going on, and to spot potential issues. After all, a picture is worth a thousand words.

This can be done extremely easy using the GPL-licensed open source Emacs editor, in conjunction with the GPL-licensed open source tool PlantUML (and optionally using Eclipse Public Licensed Graphviz).

Emacs is worlds' most versatile editor. In this case, it's being used to edit the text, and automatically convert the text to an image. PlantUML is a tool which allows you to write UML in human readable text and does the actual conversion. Graphviz is visualization software, and optionally - in this case, it's used to show certain images.

Download the compiled PlantUML jar file, Emacs and optionally download and install Graphviz.

Once you have Emacs installed and running …

more ...

It's about the journey: Compiling 64-bit Unison / GTK2 on Windows

Unison File Synchronizer

The excellent MSYS2 (mingw64 and mingw32) subsytem makes compiling native Windows compilations "as easy as compilation can be". However, as with everything in life, sometimes when trying to do one thing (compile a program), you end up chasing other vaguely related issues (one exotic compile error after another).

For synchronizing files between servers and workstations I use the open source GPLv3 licensed Unison File Synchronizer [1]. Although the text interface version of Unison compiles straight-out-of-the box on mingw64, the GTK2 interface proved to be a bit more cumbersome.

To compile Unison with the GTK2 interface, lablgtk [2] is needed, an OCaml interface to GTK.

So, the journey began with firing up a shell in a fresh mingw64 environment, and installing the build prerequisites:

pacman -Sy --noconfirm base-devel git \
mingw-w64-x86_64-{glib2,gtk2,ocaml,pango,toolchain}

After downloading the latest source (2.18.5 [3]) and trying to compile it (using make ) after running

./configure --prefix=/mingw64 --disable-gtktest

the first error message is shown:

mingw64/include/gtk-2.0/gdk/gdkwin32.h:40:36: fatal error: gdk/win32/gdkwin32keys.h: No such file or directory

It seems that GTK2 version 2.24.31 contains an error [4], and incorrectly still references the file …

more ...

Zen provisioning: Bootstrap the installation of Ansible using Vagrant

image0

I'm a big fan of the DevOps attitude of "cattle" versus "pets": machines should be built in a repeatable, automated and consistent way. If there's something wrong, don't be afraid to replace a sick "cow" instead of trying to revive your "pet".

This Zen mindset also helps when preparing for demo's, trainings and workshops: Usually I need a number of machines, and what better way than create them by using automation ? For that I'm using the tools Ansible, Packer, Vagrant and VirtualBox - they are all Open Source and can be used on a number of platforms (e.g. Windows, Linux and Mac OS X).

Ansible is a tool for managing systems and deploying applications, licensed under the GNU General Public License version 3 (my personal favorite).

Vagrant is a tool for managing virtual machines and is licensed under the MIT license.

VirtualBox is a virtualization environment for local use, licensed under the GNU General Public License version 2.

Packer creates a machine image by installing an operating system to a multitude of local and cloud platforms, for example VMWare, VirtualBox as well as Docker, Amazon EC2 and DigitalOcean. Packer is licensed under the Mozilla Public License Version 2.0.

How …

more ...

Compile Emacs for Windows using MSYS2 and mingw64

Emacs 25.1

Emacs 25.1 was officially released on September 17th, 2016. The excellent MSYS2 subsystem and the open source gcc compiler make it super-easy to build binaries on/for Windows (7, 8, 10). In three easy steps from source to binaries:

1: Install and prepare the MSYS2 subsytem

Download and run the installer at http://repo.msys2.org/distrib/msys2-x86_64-latest.exe After installing, run MSYS2 64bit which drops you in a Bash shell. Update all packages using the following command:

pacman -Syuu

Sometimes updates of the runtime/filesystem can cause update errors. This is no cause for panic - kill and restart the terminal. For building 64-bit Windows binaries, always use mingw64.exe to start the terminal.

Install all packages necessary for building:

pacman -Sy --noconfirm base-devel git \
mingw-w64-x86_64-{giflib,gnutls,jbigkit,lib{jpeg-turbo,png,rsvg,tiff,xml2},toolchain,xpm-nox}

2: Clone the Emacs source

To simplify building, you can define the environment variables BUILDDIR (where the binaries are built), INSTALLDIR (where the binaries will be installed to), and SOURCEDIR (where the source lives, the git repository). Note that since you're in the MSYS2 subsystem, paths are Unix-style, using forward slashes. This command creates SOURCEDIR if it doesn't exist yet, clones the …

more ...

Automating repetitive git / setup tasks

repetitite work

Imagine you work on a large number of projects. Each of those projects has its own git repository and accompanying notes file outside of the repo. Each git repository has its own githooks and custom setup.

Imagine having to work with multiple namespaces on different remote servers. Imagine setting up these projects by hand, multiple times a week.

Automation to the rescue ! Where I usually use Bash shell scripts to automate workflows, I'm moving more and more towards Python. It's cross-platform and sometimes easier to work with, as you have a large number of libraries at your disposal.

I wrote a simple Python script that does all of those things more or less 'automated'. Feed the script the name of the repository you want to clone, and optionally a namespace, patchfile and templatefile variable (either command-line or using a configuration file). The script will then:

  • clone the repository
  • modify the repository (e.g. apply githooks)
  • optionally modify the repository based on given variables
  • create a new notes file from a template
  • optionally modify the notes file based on given variables

The advantage is that you can use a configuration file containing the location of the remote git repository, the patchfile …

more ...

automatic XML validation when using git

XML
Recently I worked on a project which involved manually editing a bunch of XML files. Emacs is my favorite ~operating system~ editor, and it has XML validation built in (using the nXML mode). It highlights validation errors while-you-type. Unfortunately, even with Emacs showing potential issues in RED COLOR, I managed to commit a number of broken XML files to my local git repository. Subsequently when I pushed my errors to the remote 'origin' git repository, the errors broke builds.

Of course this can be completely prevented by locally using pre-commit hooks. If your local git repository validates XML files before you can commit them, and denies invalid XML files, then one part of the problem is solved.

A pre-receive hook on the receiving server side can do the same as a pre-commit hook locally: Validate XML files before letting somebody push a commit which can break the build process.

I looked around the Internet but couldn't find a lightweight quick script to do only and exactly that. That's the reason I whipped up a basic pre-commit and pre-receive hook, written in Python.

You can find the very basic and rough code at https://github.com/PeterMosmans/git-utilities.
By changing the …
more ...