This article is my response to the Ministry of Testing’s blogging challenge: How we hacked a tool to make it work for us. First, I’ll go into tools in general a bit, and then give two examples of how I have used tools in slightly non-standard ways. I’ve written a bit about tools already, but not in this context. Using tools in interesting ways or combining them is something that happens in the physical world, so it’s not surprising that we carry that idea over to software. This is things like saw + mitre block or hammer + chisel – the tools enhance each other.
There are tools such as Emacs, Visual Studio and VS Code that are like Swiss Army knives, in that they try to solve many problems on their own. I’ve used all three and like all of them, and often they’re just the tool for the job. But sometimes they have a gap, or it’s quicker or easier to use a different approach. This is often where tool hacking comes in.
It’s not a term I’ve used before, but I’m taking it to mean keeping software tools as they are (rather than re-wring them, for instance) but using them in interesting ways. In my experience, this has usually been through combining two tools to solve a problem better than either tool could do on its own.
This fits with the UNIX philosophy of tools that can be composed, where each tool is focussed on doing one job well and can play nicely with other tools. In the case of UNIX, this playing nicely with others is a standard set of input / output interfaces between a tool and other tools / the file system / the screen / the keyboard. Tools can be combined easily into pipelines, where the output of tool A becomes the input to tool B and so on. This philosophy has been carried over into PowerShell, which can make it nice to use.
The two examples below don’t take advantage of the UNIX or PowerShell architecture, but they still have to deal with the issue of the interface between tools.
Before I describe the tools for this, I need to explain the problem the tools were helping to solve. In a previous job we had a large code base written in C, divided up into many separate libraries (.so and .a files). Previously this code assumed that text could be stored and processed fine if each character in the text were represented by a single byte. This is great for English and a few other languages, but doesn’t work for all countries and languages.
In order to sell our software in more countries, we had to convert the code so it could use many bytes per character of text i.e. we had to make it multibyte code. The difficulty with this is to do with the compatibility of single byte and multibyte code. Single byte code can call multibyte code, but not vice versa, as single byte code’s behaviour is a subset of multibyte code’s. If we had library A calling library B:
- If we converted B to multibyte and then A, the code would continue to build and pass its tests
- If we converted A and then B, the code wouldn’t compile until both A and B were converted.
So we had to identify all the dependencies between libraries, in terms of A calls B. Not only that, we effectively had to work out the transitive closure of these dependencies. If A calls B and B calls C, we have to do C first, then B, then A. We need to know which libraries are at the bottom of the heap in that they don’t call any other library. Then, when they’re converted, we need to know which libraries call only libraries that have been converted and so on.
With that description of the problem out of the way, here is how we tackled it. We wrote a small C program that would examine the meta data (symbol table) in each library, that kept track of which code in other libraries it called. Each library was examined in isolation, rather than worrying about the big picture in any way. The output of this tool was a text file giving the name of each library and A -> B pairs for all the other libraries it called.
The format of this text file was the input format for GraphViz. GraphViz doesn’t know anything about C libraries, multibyte or single byte code etc. What it can do is take a text file describing blobs and the lines between them, and lay those blobs and lines out on a page so that the blobs aren’t on top of each other and the lines are as short as possible.
Once we had a text file showing the libraries and dependencies between them in GraphViz format and fed it into GraphViz, the output of GraphViz was a PERT chart for the project. We knew that libraries at the bottom of the diagram could be tackled first, and then libraries in the row above that and so on.
Being able to wrangle or organise and shape data is useful surprisingly often. You might be looking at the output from some code to see why it hasn’t behaved the way you expected. Or you might be preparing the input to some code and it’s more complicated than just a few words of text. You might have tools that can do exactly the wrangling you need, but in my experience I’ve had to roll my own (or hack) tools quite often.
I often find myself reaching for Notepad++ and Excel, including as a pair as they can each do things that the other can’t. Sometimes it’s easier to select the data you want to keep, and other times it’s easier to delete the data you don’t want. The tools let you work in either way.
Notepad++’s strengths are its light weight, macros and find and replace. Because it’s lightweight, it starts quickly and can cope with large files. Macros let you press Record, perform a series of normal operations with keyboard and/or mouse, and then press Stop. The sequence of operations between Record and Stop can then be played back N times, including as many times as necessary to get the cursor to the end of the file.
The find and replace has three modes – normal, extended and regular expression. Sometimes I use find and replace as find and delete by leaving the replace box empty – whatever matches the find box is replaced with nothing, which means it’s deleted.
Extended lets you put “\n” in the find and/or replace boxes, to represent a line break. This means you can easily match the end or start of lines, and join or split lines by adding or removing line breaks. For instance, if you have some XML or HTML that is all on one line, you can do quick and dirty pretty printing by replacing “><” with “>\n<”. This will put a line break between the end of one element and the start of the next, so each element will be on its own line.
Regular expressions are a world of their own (sometimes a world of pain) but give you a lot of flexibility.
Excel works well once your data is separated into rows and columns, e.g. separated by line breaks and commas. If it’s not already in this structure, Notepad++ can often get it into the structure fairly easily. Excel is good for deleting or reordering columns (it can also temporarily hide columns). It can also sort rows, and use auto filtering to show / hide rows. I sometimes go back and forth between the tools, and the only thing to remember if you’re doing this is to make sure Excel saves the file in CSV format rather than e.g. XLSX format.
It can be frustrating when a tool doesn’t solve all of a problem. However, combining tools or filling in the gaps around tools can be creative and rewarding. Designing tools so that they tackle one problem well and can be composed with other tools will help to give you a toolbox that’s greater than the sum of its parts, due to the ways in which the tools can easily help each other.