Comparing and aligning data models

Sometimes you are trying to bring two worlds together – two bits of code, each with its own data model. The two worlds could be a computer system you work on and the system from a partner you’re trying to integrate with. The data models can be the most prominent bits of this integration, but also the most confusing to compare and align. In this article I will give some suggestions that might be helpful when you try to do this.

In this context, data model could mean the logical or physical data model of a relational database, the resources returned by a set of REST APIs, etc. It’s which bits of data the system has chosen to store, and the relationships between those bits.

What’s in a name? That which we call an Account table by any other name would smell as sweet

It might be tempting to start with the data model from one side of the comparison and see if you can find the same names in the other data model. I have an Account table, so I’ll look for your Account table etc. I suggest that you don’t do this. Often, things with the same name can have unhelpfully different meanings for the two different systems. Things with the same meaning could have different names in the two different systems. This is an area where there can be an unhelpful thicket of synonyms, e.g. customer, account, subscriber, service, product, package, user, installation, bundle, etc.

Instead, I suggest you use a version of duck typing (if it walks like a duck and talks like a duck, it’s a duck). I’ll describe these aspects in more detail below, but the main ways that parts of a data model can walk/talk the same as each other relate to:

Business requirements
Structure
Uniqueness

A white duck walking on grass — Often it’s a quackers idea to rely on names when trying to compare data models
Image by Ernst Vikne, licenced under Creative Commons Attribution-Share Alike Generic 2.0 licence.

Business requirements

These can often be unearthed by asking “Why do you have table / field X?” – you might need to ask Why? several times until you get to something in the world of business requirements rather than technical requirements. Example answers could include things like:

This is the thing that incurs debt, that must be paid off or will be chased
This is the smallest / biggest separate thing that the user can choose to pay for
This represents the smallest / biggest thing that is allowed to use thing X, e.g. the smallest thing that is allowed to use the mobile phone network.

Taking the last one, one side of the comparison might assume everything is a simple mobile phone, and so has one entity that contains details of the phone and its SIM card together. However, the other side might allow for dual SIM mobile phones, or SIM cards in other things like vending machines, smart meters or cars. So in that world the smallest thing that can access the mobile phone network is the SIM card, which might be contained in other things like phones etc.

Structure

It’s often not a good idea to worry about the fine details of structure to start with, such as the maximum length that strings can be. Instead, it’s better to concentrate on things like:

X is contained in Y (You must have a Y before you can have an X.)
An X can have 0 or 1 / 0+ / 1+ / etc. Ys
What can vary over time? What can / must vary with what else?

Uniqueness

This often relates to properties rather than whole entities. E.g. just an account number, rather than all of a row in the account table. Sometimes a single property must be unique; other times a set of properties must be combined to ensure uniqueness. For instance, just a mobile phone number, or the combination of (service type, mobile phone number) where service type is something like: phone call, text message, WhatsApp etc.

Another important consideration is the scope of uniqueness. Some possibilities include:

Something bigger than the system, e.g. all the mobile phones in a country
The system as a whole
A tenant of the system (if it’s a multi-tenant system)
Some entity within the data for a tenant, e.g. a tenant may define as many reports as they like, but all the reports’ names for a given tenant must be unique.

Set your expectations low

In my experience, there are often limitations to what you can get when you bring two systems together. For instance, there is a constraint on one side that can’t be removed, and this forces the other side to be similarly constrained if the two systems want to work together. So, for example, in one system you can have as many Ys per X as you like, but the other system needs each A to have 0 or 1 B (and A <-> X, B <-> Y), so the first system is contrained to allowing only 0 or 1 Y per X.

Another example is where one system has a string property that’s unique on its own, but there’s no single string property in the other system that’s guaranteed to match its uniqueness. Instead, this system has to concatenate two or more string properties to have a single value that can be relied on to be unique.

Summary

Bringing two systems, particularly their data models, together can be hard. Names can lead you astray, and so a more useful approach can be based on duck typing. Why is this entity here, how does this entity fit into the bigger data model, and what needs to be unique? By matching up the answers to these kinds of questions, you’re more likely to find the important relationships (or their lack) between the two systems.

One thought on “Comparing and aligning data models”

MilesT says:

October 15, 2022 at 7:01 pm

Then there is the consideration of what kind of outcome do you need? Do you just need an agreement of equivalence/alignment within the context of a domain of systems or part of the business, or pair of domains (with a language meaningful only locally), or whether there will ultimately be a need to generate an alignment across the enterprise, aligning to an enterprise data model, in which case there starts to be a need to create a canonical data model that models the enterprise in as unconstrained a way as possible, and all current and future systems align to the canonical model (and, hence, transitively, achive alignment to other systems across the enterprise or industry).

When creating wider alignments, it is worth researching if there is an industry standard model which should be used as a basis (e.g. OAGIS)

LikeLiked by 1 person