Building computer systems via problems rather than solutions

When it comes to building computer computer system, even something as simple as storing the name and address of universities can be surprisingly complicated and messy.  While the mess and complication often can’t be avoided, knowing what the end user needs are can help you come up with the best way of tackling them. “Just” … Continue reading Building computer systems via problems rather than solutions

Fuzzy matching – context and testing

This is the third article in a short series on fuzzy matching:  Introduction  Example algorithms  Testing and context  In this article I will consider the difference between context-dependent and context-independent fuzziness, and think about how fuzzy matching systems can be tested.  Context-dependent and context-independent fuzziness  If you are trying to do fuzzy matching of strings, … Continue reading Fuzzy matching – context and testing

Improving testability and observability of look-ups in data pipelines

Often in data pipelines (ETL or ELT pipelines for feeding a data warehouse, data science model etc.) we need to look up reference data that relates to the main flow of data through the pipeline. If this isn't done carefully, there can be problems for checking how the system is running. Before the system is … Continue reading Improving testability and observability of look-ups in data pipelines