In a data pipeline (an ETL or ELT pipeline, to feed a data warehouse, data science model etc.) it is often a good idea to copy input data to storage that you control as soon as possible after you receive it. This can be known as copying the data to a staging table (or other … Continue reading Staging input data to improve testability in data pipelines
Improving testability and observability of look-ups in data pipelines
Often in data pipelines (ETL or ELT pipelines for feeding a data warehouse, data science model etc.) we need to look up reference data that relates to the main flow of data through the pipeline. If this isn't done carefully, there can be problems for checking how the system is running. Before the system is … Continue reading Improving testability and observability of look-ups in data pipelines
The big and small idea
I was talking with a Cambridge University student recently, in particular about their University Card. It’s a very useful card, that in one way can be described very simply. As far as I understand, the card lets students, academics and staff across the university access rooms and services, by proving their identity electronically. That’s something … Continue reading The big and small idea
Analogies and objectives for testing
I guess if I had to define my role at work it would be: programmer. However, I have learned a lot from people who wouldn't call themselves programmers, such as testers (Michael Bolton, Jerry Weinberg, the Ministry of Testing community etc.), user experience experts (Paul Boag, Jared Spool, Don Norman etc.), and data people of … Continue reading Analogies and objectives for testing
Testing a data pipeline
There are several approaches to testing a data pipeline - e.g. one built using an ETL tool such as SSIS or Azure Data Factory. In this article I will go through three, plus refer to another (unit testing components of the pipeline). For simplicity sake I will refer to only database tables, but other forms … Continue reading Testing a data pipeline
The seven (or four) ages of man
This article is mostly about visualising some data from the 15th and 16th centuries, about how someone's lifespan can be divided up into stages. It happened because my friend Tamsin Lewis (a historical music expert) pointed me at a tweet by Dr Alun Withey (a history lecturer). The tweet had a photo of some lovely … Continue reading The seven (or four) ages of man
‘Roughly’ and ‘better’ can help usability
In this article I'll go into some fuzziness that we often encounter in the everyday world, that's often missing in the world of computers. Unfortunately we're used to this fuzziness, and so its lack can make computers hard to use. Star Trek shields I remember watching episodes of Star Trek where the Enterprise was in … Continue reading ‘Roughly’ and ‘better’ can help usability
An introduction to parameterised types
This article is about parameterised types, which are also known as generics or parametric polymorphism. I first came across them in the functional programming language ML, but they have spread beyond the functional programming world, to languages like Java, C#, and TypeScript. Parameterised types let you define a family of similar but different types What … Continue reading An introduction to parameterised types
Encapsulation
This is the second of the things requested by Jesper. To me, the software engineering term encapsulation is part of the bigger term modularisation. Modularisation is chopping a big lump of code into smaller parts or modules. It’s important to get the boundaries between parts in the right place. Once there are modules, they can … Continue reading Encapsulation
Modularisation – cohesion at many levels
This article builds on the previous article, so if you are new to the terms coupling and cohesion as they apply to software, please look at that first. In this article I’m going to look at cohesion as it applies to methods (or functions, if that’s what you call such things). Specifically, I’m going to … Continue reading Modularisation – cohesion at many levels