Connecting Azure Data Factory code to an external database table

April 29, 2022May 5, 2022 ~ Bob ~ Leave a comment

In this article I will talk about how to connect Azure Data Factory (ADF) to a database table. This can be surprisingly complex, so I will start with the simplest version and work towards more complex versions. I won't go into connecting ADF to other types of data store such as APIs, blob storage etc, … Continue reading Connecting Azure Data Factory code to an external database table

Staging input data to improve testability in data pipelines

April 24, 2022October 23, 2022 ~ Bob ~ 3 Comments

In a data pipeline (an ETL or ELT pipeline, to feed a data warehouse, data science model etc.) it is often a good idea to copy input data to storage that you control as soon as possible after you receive it. This can be known as copying the data to a staging table (or other … Continue reading Staging input data to improve testability in data pipelines

Improving testability and observability of look-ups in data pipelines

April 23, 2022April 23, 2022 ~ Bob ~ 1 Comment

Often in data pipelines (ETL or ELT pipelines for feeding a data warehouse, data science model etc.) we need to look up reference data that relates to the main flow of data through the pipeline. If this isn't done carefully, there can be problems for checking how the system is running. Before the system is … Continue reading Improving testability and observability of look-ups in data pipelines

The big and small idea

April 16, 2022April 16, 2022 ~ Bob ~ Leave a comment

I was talking with a Cambridge University student recently, in particular about their University Card. It’s a very useful card, that in one way can be described very simply. As far as I understand, the card lets students, academics and staff across the university access rooms and services, by proving their identity electronically. That’s something … Continue reading The big and small idea

Analogies and objectives for testing

April 9, 2022April 10, 2022 ~ Bob ~ 2 Comments

I guess if I had to define my role at work it would be: programmer. However, I have learned a lot from people who wouldn't call themselves programmers, such as testers (Michael Bolton, Jerry Weinberg, the Ministry of Testing community etc.), user experience experts (Paul Boag, Jared Spool, Don Norman etc.), and data people of … Continue reading Analogies and objectives for testing

Testing a data pipeline

April 6, 2022April 6, 2022 ~ Bob ~ Leave a comment

There are several approaches to testing a data pipeline - e.g. one built using an ETL tool such as SSIS or Azure Data Factory. In this article I will go through three, plus refer to another (unit testing components of the pipeline). For simplicity sake I will refer to only database tables, but other forms … Continue reading Testing a data pipeline

The seven (or four) ages of man

April 4, 2022July 2, 2022 ~ Bob ~ Leave a comment

This article is mostly about visualising some data from the 15th and 16th centuries, about how someone's lifespan can be divided up into stages. It happened because my friend Tamsin Lewis (a historical music expert) pointed me at a tweet by Dr Alun Withey (a history lecturer). The tweet had a photo of some lovely … Continue reading The seven (or four) ages of man

‘Roughly’ and ‘better’ can help usability

March 24, 2022March 24, 2022 ~ Bob ~ 2 Comments

In this article I'll go into some fuzziness that we often encounter in the everyday world, that's often missing in the world of computers. Unfortunately we're used to this fuzziness, and so its lack can make computers hard to use. Star Trek shields I remember watching episodes of Star Trek where the Enterprise was in … Continue reading ‘Roughly’ and ‘better’ can help usability

An introduction to parameterised types

January 27, 2022January 28, 2022 ~ Bob ~ Leave a comment

This article is about parameterised types, which are also known as generics or parametric polymorphism. I first came across them in the functional programming language ML, but they have spread beyond the functional programming world, to languages like Java, C#, and TypeScript. Parameterised types let you define a family of similar but different types What … Continue reading An introduction to parameterised types

Encapsulation

January 20, 2022January 20, 2022 ~ Bob ~ Leave a comment

This is the second of the things requested by Jesper. To me, the software engineering term encapsulation is part of the bigger term modularisation. Modularisation is chopping a big lump of code into smaller parts or modules. It’s important to get the boundaries between parts in the right place. Once there are modules, they can … Continue reading Encapsulation