Automating vimdiff's HTML diff (TOhtml)
Proof of concepts are often done pretty differently than other kind of work. The rules described in (in)famous “Effective engineer” are even more important to succeed with PoCs. At the end of a day it all gravitates around “value added divided by effort spent”.
I was doing some quick PoC project that lets user paste log fragment and see what are other log fragments in the database that are most similar to the pasted one. But apart from showing the list user wants to inspect the differences too.
Simplest solution is to generate diff in e.g. HTML and present it to the user. But what is the best way to achieve that quickly? By using existing tool!
This post is about automating vimdiff
’s TOhtml
using headlessvim
library. Apart from
presenting actual solution I cover some of the details on how Vim loads configuration files, how can
we pass commands to it, etc.
Problem that my PoC is solving is Nokia-specific. No worries, though. We can use Linux kernel logs to explain what is going on.
Imagine there’s a bug in the kernel you’re using. You inspect dmesg
and see some multi-line crash.
Now, to make it hard, there’s no Google/DuckDuckGo. But you have access to gazillions of logs from
executions from other machines with a description of the solution attached. What if you could search
for similar crashes in these logs and check if the root cause is the same?
Describing my algorithm that finds most similar log fragments is out of scope of this post. I’m just gonna describe the last step done by the user: inspecting the diff between queried log fragment and the match that has been found.
Subprocess
Standard way to compare two files in vimdiff
is:
vimdiff fileA fileB
and that is equivalent to:
vim -d fileA fileB
Now, the problem is that we need to somehow instruct vim to automatically execute some commands
right after loading files. Fortunately there’s -c
switch:
-c {command} {command} will be executed after the first file has been read. {command} is interpreted as an Ex command. If the {command} contains spaces it must be enclosed in double quotes (this depends on the shell that is used). Example: Vim "+set si" main.c Note: You can use up to 10 "+" or "-c" commands.
So it becomes:
vim -d fileA fileB -c 'TOhtml' -c 'sav! output.html' -c 'qall!'
We also want for it to work on “vanilla vim”, so without plugins and rc files. Additionally it would
be bad if the invocation had side effects on the disk. We achieve these goals by adding
-i NONE -n -N -u vimrc
, where:
-i NONE
- disables writing viminfo file-n
- don’t use swap files. Thanks to this we can execute parallelvim
processes without risk of seeing the recovery message-N
- disablesvi
compatibility-u vimrc
- instructsvim
to load configuration from thevimrc
file in current work dir
We need to create vimrc
file for this to work. If we used NONE
as an argument to -u
then
TOhtml
command wouldn’t work. What we need to do instead is to mimic how vim is initialized in
our distro.
I’m working on Debian, so my vimrc file consists of just single line:
runtime! debian.vim
This debian.vim
file comes from the runtime path which is set to /usr/share/vim/vim82/
on my
box. I’m not a magician. I’ve copied that line from /etc/vim/vimrc
.
The final version is:
vim -i NONE -n -N -u vimrc -d fileA fileB -c 'TOhtml' -c 'sav! output.html' -c 'qall!'
And voille-a! You can now open output.html
file and see what was generated. Upon running the
command you’ll see vimdiff
interface for a fraction of second.
Putting this into Python script is pretty straighforward:
import subprocess
p = subprocess.Popen([
"/usr/bin/vim", "-i", "NONE", "-n", "-N", "-u", "vimrc", "-d", "fileA", "fileB", "-c", "TOhtml",
"-c", "sav! output.html", "-c", "qall!"
])
p.communicate()
But this solution is bad, because:
- it’s fragile as hell
- it explicitly uses
subprocess
so it literally asks for some thin wrapper - it requires TTY and takes over it. Watching server logs looks really funny :)
headlessvim
Fortunately there’s a library that uses pyte
(fake terminal implemented in Python) and wraps
the Vim process into handy API. Just look at how better the code is when we use that library:
import headlessvim
with headlessvim.open(args=f"-N -i NONE -n -u ./vimrc -d {fileA_path} {fileB_path}") as vim:
vim.command("TOhtml")
vim.command(f"sav! {output_path}")
The library was initially developed to help write unit tests for Vim plugins, but can be used in many more scenarios, including ours.