This is part of a short series of articles about computer science while doing the laundry: Merge sortBin sort In the previous article I used doing a lot of laundry to illustrate merge sort, which is probably an impractical way of doing the laundry. In this article I will suggest a way that might actually … Continue reading Computer science while doing the laundry 2: Bin sort
Computer science while doing the laundry 1: Merge sort
This is part of a short series of computer science involving laundry: Merge sortBin sort In this article I will explain merge sort, which is a way of sorting things when there are so many of them it’s awkward or impossible to use other approaches. I’ll use doing the laundry as a way of explaining … Continue reading Computer science while doing the laundry 1: Merge sort
Introduction to Azure Data Factory
Azure Data Factory (ADF) is a tool from Microsoft that lets you move data from one place to another, optionally changing it too. This activity is sometimes described as data engineering or ETL (Extract Transform Load) or ELT. There’s an older tool from Microsoft that also does ETL, called SQL Server Integration Services (SSIS). They … Continue reading Introduction to Azure Data Factory
Visualising sauces in French cuisine
Classic French cuisine, as defined by e.g. Escoffier, has a set of base sauces such as velouté from which other sauces like normande can be derived. This article is an attempt at visualising the sauces and the relationship between them. The motivation behind it is someone I know who is studying catering, and as part … Continue reading Visualising sauces in French cuisine
Connecting Azure Data Factory code to an external database table
In this article I will talk about how to connect Azure Data Factory (ADF) to a database table. This can be surprisingly complex, so I will start with the simplest version and work towards more complex versions. I won't go into connecting ADF to other types of data store such as APIs, blob storage etc, … Continue reading Connecting Azure Data Factory code to an external database table
Staging input data to improve testability in data pipelines
In a data pipeline (and ETL or ELT pipeline, to feed a data warehouse, data science model etc.) it is often a good idea to copy input data to storage that you control as soon as possible after you receive it. This can be known as copying the data to a staging table (or other … Continue reading Staging input data to improve testability in data pipelines
Improving testability and observability of look-ups in data pipelines
Often in data pipelines (ETL or ELT pipelines for feeding a data warehouse, data science model etc.) we need to look up reference data that relates to the main flow of data through the pipeline. If this isn't done carefully, there can be problems for checking how the system is running. Before the system is … Continue reading Improving testability and observability of look-ups in data pipelines
The big and small idea
I was talking with a Cambridge University student recently, in particular about their University Card. It’s a very useful card, that in one way can be described very simply. As far as I understand, the card lets students, academics and staff across the university access rooms and services, by proving their identity electronically. That’s something … Continue reading The big and small idea
Analogies and objectives for testing
I guess if I had to define my role at work it would be: programmer. However, I have learned a lot from people who wouldn't call themselves programmers, such as testers (Michael Bolton, Jerry Weinberg, the Ministry of Testing community etc.), user experience experts (Paul Boag, Jared Spool, Don Norman etc.), and data people of … Continue reading Analogies and objectives for testing
Testing a data pipeline
There are several approaches to testing a data pipeline - e.g. one built using an ETL tool such as SSIS or Azure Data Factory. In this article I will go through three, plus refer to another (unit testing components of the pipeline). For simplicity sake I will refer to only database tables, but other forms … Continue reading Testing a data pipeline