Automating vimdiff's HTML diff (TOhtml)

Proof of concepts are often done pretty differently than other kind of work. The rules described in (in)famous “Effective engineer” are even more important to succeed with PoCs. At the end of a day it all gravitates around “value added divided by effort spent”.

I was doing some quick PoC project that lets user paste log fragment and see what are other log fragments in the database that are most similar to the pasted one. But apart from showing the list user wants to inspect the differences too.

Simplest solution is to generate diff in e.g. HTML and present it to the user. But what is the best way to achieve that quickly? By using existing tool!

This post is about automating vimdiff’s TOhtml using headlessvim library. Apart from presenting actual solution I cover some of the details on how Vim loads configuration files, how can we pass commands to it, etc.

vimdiff in action

Problem that my PoC is solving is Nokia-specific. No worries, though. We can use Linux kernel logs to explain what is going on.

Imagine there’s a bug in the kernel you’re using. You inspect dmesg and see some multi-line crash. Now, to make it hard, there’s no Google/DuckDuckGo. But you have access to gazillions of logs from executions from other machines with a description of the solution attached. What if you could search for similar crashes in these logs and check if the root cause is the same?

Describing my algorithm that finds most similar log fragments is out of scope of this post. I’m just gonna describe the last step done by the user: inspecting the diff between queried log fragment and the match that has been found.

Subprocess

Standard way to compare two files in vimdiff is:

vimdiff fileA fileB

and that is equivalent to:

vim -d fileA fileB

Now, the problem is that we need to somehow instruct vim to automatically execute some commands right after loading files. Fortunately there’s -c switch:

  -c {command}
               {command} will be executed after the first file has been read.  {command} is interpreted as an Ex
               command.  If the {command} contains spaces it must be enclosed in double quotes (this depends  on
               the shell that is used).  Example: Vim "+set si" main.c
               Note: You can use up to 10 "+" or "-c" commands.

So it becomes:

vim -d fileA fileB -c 'TOhtml' -c 'sav! output.html' -c 'qall!'

We also want for it to work on “vanilla vim”, so without plugins and rc files. Additionally it would be bad if the invocation had side effects on the disk. We achieve these goals by adding -i NONE -n -N -u vimrc, where:

-i NONE - disables writing viminfo file
-n - don’t use swap files. Thanks to this we can execute parallel vim processes without risk of seeing the recovery message
-N - disables vi compatibility
-u vimrc - instructs vim to load configuration from the vimrc file in current work dir

We need to create vimrc file for this to work. If we used NONE as an argument to -u then TOhtml command wouldn’t work. What we need to do instead is to mimic how vim is initialized in our distro.

I’m working on Debian, so my vimrc file consists of just single line:

runtime! debian.vim

This debian.vim file comes from the runtime path which is set to /usr/share/vim/vim82/ on my box. I’m not a magician. I’ve copied that line from /etc/vim/vimrc.

The final version is:

vim -i NONE -n -N -u vimrc -d fileA fileB -c 'TOhtml' -c 'sav! output.html' -c 'qall!'

And voille-a! You can now open output.html file and see what was generated. Upon running the command you’ll see vimdiff interface for a fraction of second.

Putting this into Python script is pretty straighforward:

import subprocess

p = subprocess.Popen([
  "/usr/bin/vim", "-i", "NONE", "-n", "-N", "-u", "vimrc", "-d", "fileA", "fileB", "-c", "TOhtml",
  "-c", "sav! output.html", "-c", "qall!"
])
p.communicate()

But this solution is bad, because:

it’s fragile as hell
it explicitly uses subprocess so it literally asks for some thin wrapper
it requires TTY and takes over it. Watching server logs looks really funny :)

headlessvim

Fortunately there’s a library that uses pyte (fake terminal implemented in Python) and wraps the Vim process into handy API. Just look at how better the code is when we use that library:

import headlessvim

with headlessvim.open(args=f"-N -i NONE -n -u ./vimrc -d {fileA_path} {fileB_path}") as vim:
    vim.command("TOhtml")
    vim.command(f"sav! {output_path}")

The library was initially developed to help write unit tests for Vim plugins, but can be used in many more scenarios, including ours.