In this article I’ll go into two related operations or kinds of queries you can do to data, that are both to do with grouping things – aggregation and window functions. I’ll describe how they both work, how they’re similar but different, and give examples of when you might use them including how you might … Continue reading Aggregation and window functions for data
Category: Data processing
Mental models for data engineering and data science
For programmers like me, it can be a bit of wrench when you get more into data work, particularly data engineering and data science. You’re used to data being around (in the background) and so think everything will be OK. This wasn’t the case for me, and so here are some mental models (glorified metaphors) that … Continue reading Mental models for data engineering and data science
Testing a data pipeline
There are several approaches to testing a data pipeline - e.g. one built using an ETL tool such as SSIS or Azure Data Factory. In this article I will go through three, plus refer to another (unit testing components of the pipeline). For simplicity sake I will refer to only database tables, but other forms … Continue reading Testing a data pipeline
Finite state machines
This is the second article in a series about some classic computer science: Regular expressionsFinite state machinesComparing regular expressions and finite state machines Finite state machines are a way of checking that a series of inputs is valid, potentially doing some actions while you’re doing this checking. They’re not something I use all that often, … Continue reading Finite state machines
How permanent is your data?
This article was inspired by a video from the British Museum, where a conservator discusses a 500-year-old khipu. A khipu is a document, used for keeping records or accounts, made of knotted strings. https://www.youtube.com/watch?v=-mvjiMjZf-4 I recommend you watch the video – I found it really interesting and well-presented. I hadn’t come across khipus before, and … Continue reading How permanent is your data?
Using tools in interesting ways (tool hacking)
This article is my response to the Ministry of Testing’s blogging challenge: How we hacked a tool to make it work for us. First, I’ll go into tools in general a bit, and then give two examples of how I have used tools in slightly non-standard ways. I've written a bit about tools already, but … Continue reading Using tools in interesting ways (tool hacking)
Looking for copyright music in live streams
My friend Ted has recently started exercising in earnest, to get fighting fit for when he can go back into schools, museums etc. to blow children's minds about creative writing. While he exercises, he plays music from CDs to motivate him and he live streams it to Facebook for accountability. Sometimes the live stream is … Continue reading Looking for copyright music in live streams
The compounding value of information
Information is one of those things where sometimes the whole is greater than the sum of the parts. That is, you get extra value from combining bits of information, on top of the value from the separate bits of information on their own. I’ll illustrate this with an example to do with spies, but then … Continue reading The compounding value of information
User experience (UX) and data quality
Someone I know was moaning recently about a lot of tedious electronic form filling they had to do for work. It was something that happened once a year, but it was much more lengthy and tedious this year than before. It struck me that this was a sharply focused example of when user experience (UX) … Continue reading User experience (UX) and data quality
GB roads with gaps
I was driving recently, and realised that I was near a road that appeared to have a large gap. By that I mean: road A joins road B and stops, but some distance further along B there’s a bit more road A. It’s as if A has been chopped into two by B, and the … Continue reading GB roads with gaps