How permanent is your data?

This article was inspired by a video from the British Museum, where a conservator discusses a 500-year-old khipu. A khipu is a document, used for keeping records or accounts, made of knotted strings.

I recommend you watch the video – I found it really interesting and well-presented. I hadn’t come across khipus before, and they raised all sorts of interesting ideas and questions. So, in this article I’ll use the khipu as a jumping-off point for a more general waffle about data.

The problems with the khipu

There are several problems with the khipu:

When the conservator first saw it, the strands were tangled together so it couldn’t be read.
The strings are decaying – turning to powder in places.
While the pattern of knots, colour of strings etc. is clear, the meaning attached to those knots, colours etc. is lost.
It’s a complex physical object that’s quite big:
1. It can’t be in two places at once.
1. It would take time and effort to copy, and the copying could introduce errors.
1. It would take time and effort to move from e.g. one end of the Inca world to the other, and care would need to be taken to stop it getting tangled or broken en route.

The third one reminds me of the OSI network model. Sending a stream of 1s and 0s correctly over a network is part of the problem. However, you also need to know whether these 1s and 0s are part of a streaming video or part of PDF file, so that you can interpret them correctly. If you don’t know how to interpret the 1s and 0s it’s like having the khipu but without knowing what the knots and colours mean. (It’s also like having an encoded message without knowing how to decode it.)

It’s easy to consider the khipu to be a relic of a more primitive time, look at its list of problems and be a bit smug. By contrast we have developed sophisticated tools for storing, copying and moving information around. IT = Information Technology, after all.

Not so fast. Can you still play all the video games you bought when you were much younger? Do you know what will happen to your online photo albums, music collections etc. when you die? Are there important physical objects in and around your home that you think you can use but will stop working if you stop paying a subscription? Have you ever forgotten a password to something that doesn’t have a handy facility to send you a password reset email?

These are interesting and important questions, with complicated answers that involve materials science, economics and law, as well as computing.

The Old World

By “old” I don’t mean the time when the khipu was made, but the pre-Cloud computing era. In order to access some information you probably needed all of these:

Information storage – USB stick, CD / DVD, floppy disk, VHS tape, game cartridge etc.
Hardware – PC, floppy drive, VHS machine, games console etc.
Software – the operating system and the relevant application

You needed to own them or otherwise have access to them, and they needed to work OK. You might also need passwords or other credentials as part of the access.

The information is encoded in the physical world in some way – regions of different magnetism, regions where some dye is burned, etc. The physical stuff could get damaged, e.g. by a power surge. Or the hardware or software could have some kind of bug such that information is stored in an invalid way. This is the equivalent of someone tying an unexpected kind of knot in the khipu. The knot hasn’t got jumbled in transit – it’s in the form that the person made it – but it doesn’t make sense to someone else trying to read it.

What makes things worse is you need a combination of things to work before you can access the information, i.e. several things need to play nicely with each other. A file saved by a program running on a PC being read by another computer running UNIX? Nope. Betamax tape in a VHS machine? Sorry. A program that used to run fine on version X of an operating system, trying to run on version X+1 or X+2? That might work.

To guard against problems with the stored information, you might take back-ups. However, it’s important to remember that these are simply more bits of information created by hardware and software. They could have problems of their own e.g. a corrupt or incomplete back-up, or backing up the wrong stuff. It’s not the back-up that’s valuable – it’s the ability to successfully restore from back-up that is. Unless you test this, you don’t know how good your back-up is.

There are other problems with back-ups – how often do you take them? (How much are you prepared to lose if you have to rely on restoring from back-up?) If you are paranoid (e.g. guarding against some physical problem like theft, flood or fire) and have many back-ups in parallel in different physical places – how much are those in sync with each other? How do you get them away from the computer to somewhere else for safety?

The New World 1 – the Cloud

The old world sounds like a quaint bygone era to many ears. It’s important to remember that even though you see something newer and shinier, there’s still the old world behind the scenes. The big difference is that you pay someone else to take away some of the costs and risks of the old world. This introduces new risks and costs:

What if your hosting provide is incompetent and deletes lots of your data with no back-up?
What if they go bust?
How secure are the systems you’re renting?
What data does your service provider have access to about you, and what are they doing with that?
How much does the service work if there’s no network connection? How nicely do things restore or complete if actions were interrupted by a drop in connection to the network?

So it’s probably worth having an offline back-up of your blog posts, photos, podcasts etc. The back-up might not be a perfect replica, but you’ll at least have something in the worst case. Or, if you’re building a service on top of the rented stuff, it might be worth building something like Netflix’s Simian Army, to make sure your code can cope with some of the residual risks.

Also, it’s worth noting who has power in this new world. Who is depended on by whom, and is their trust well-placed? Is there a liquid enough marketplace or, in its absence, regulation that encourages good action and fair pricing by providers? (I’m not saying it’s any better or worse than in the old world, but it’s not the same and doesn’t have to be any particular way.)

The New World 2 – Information-enabled physical things

This is an interesting and different case to the previous section. Here, there’s something in the physical world, other than a computer or phone, that has been blended with information in some way. The physical object might be a car, a personal fitness monitor, a gas or electricity meter etc.

The physical object has some obvious function, e.g. getting you from one place to another, telling you how much activity you’re getting, or allowing gas or electricity into your home and then helping a supplier bill you for how much you’ve used. The information (and IT systems around it) are meant to improve or extend these functions.

What I find interesting is when information stops the physical object doing what you expect it to do. The physical object still works fine – the moving parts still move correctly, electrical circuits are fine and there’s charge in batteries etc. But information says no. This is things like a smart meter remotely being told to disconnect your gas or electricity supply, an otherwise working car not starting because of overdue bills, or an activity tracker that doesn’t let you see your activity.

I’m not going into the rights and wrongs of this – contracts might have been signed, but were those contracts ethical? Who has power and how is it held to account? Also, not all the cases, even the cases I’ve described above, are alike in their rights and wrongs.

The reason why I highlight this is because it’s the reverse of the old world. In the old world, a physical object (e.g. a bad sector on a disk) would block information. In the new world, information can block a physical object.

Sometimes this could be unintentional. You expect a physical object to work without information flowing correctly, but it has been designed such that it relies on this flow. For instance, there might be a device in your home that normally talks to your phone via both of them connecting to the internet. Without the internet, they can’t talk to each other because the device doesn’t support Bluetooth and has no physical help such as screens, buttons and sockets so that you can interact with it by directly plugging things in. This might make the device more secure, but it’s also brittle. The lack of a direct interface might have just been to save costs.

Mitigations – migration, emulation and repair

One of the many things I liked about the video that inspired this article was that it included data migration. For me this usually means copying data from set of database tables A to set of database tables B, because set A supports an older version of some software and B supports a newer one. (The newer software can’t read from data in the old form, so it needs to be copied and adapted to the new form.) In the video this meant recording the details of the strings and knots in a spreadsheet, so that the information represented by the strings could be accessed without needing the khipu in front of you.

In other contexts it can mean buying your favourite music first on vinyl, then on cassette tape, then CD etc. The information (the song) is the same, but it’s being migrated from one form to another. This is to fit with changing hardware you use to access that information. You might miss the crackle from the vinyl playing, but to most ears it’s the same information. However, it might not be as valuable to you after migration, if there’s some value beyond the information on its own. That LP might be important to you because:

It was signed by a member of the band;
It was a gift from someone special;
It’s a rare edition;
It was the first thing you bought with your pay from a weekend job.

So even though the vinyl’s scratched to you it’s the best version.

Another thing that can help is emulation. You can’t play that ZX Spectrum game any more because your ZX Spectrum died long ago. However, you can get your laptop running Windows to pretend to be a ZX Spectrum, and once more you can waste time playing Jet Set Willy.

Finally, if you rely on physical things that can go wrong, can they be repaired? Have they been made so that it’s easy to repair them – can parts be accessed and replaced separately, are there instructions, are there spare parts? Or is it like the screen of a smart phone, that takes specialist tools and skills that mere mortals don’t have? Ease of repair fits with extending the lift of an object, potentially reducing its lifetime environmental costs. Is a right to repair protected by law where you live?

Economics

I’m aware that this is already a long and rambling article, but I think it’s important to continue where the previous section ended. Law can protect (or not) your access to goods that you think you own, via things like right to repair. This is a non-technical issue, that is a force acting on the design of technical information systems.

Another non-technical issue is economics. If software is delivered in the physical world (rather than e.g. used via a website) then there will be a series of versions out in the wild. As this series extends further into the future from the version a user is on, the less profitable it will become for the supplier to make changes to that version.

I’ve been on both sides of this relationship – working in a provider of software, and using software that someone else has made. I’ve noticed that, if I’m honest, I have double standards. As a provider of software I think that it’s perfectly reasonable to concentrate resources on the new stuff. As a user of software I’d like the software I’ve bought to be decent, so any bugs in it need fixing (regardless of how many versions have been produced since I bought it). I shouldn’t be forced to move to a newer version, migrating data in the process.

If there’s a physical device involved, then the business model of razors might apply. The razor is sold for not much money (maybe even at a loss) but the manufacturer makes money from selling replacement blades over a long time. So the physical device arrives for little or no money, so that the user is motivated to start a subscription to the information-based service. The supplier hopes to make money from the subscription over a long time. The cheaper the physical object can be made, the sooner the supplier will make a profit. However, this is likely to mean that the physical object will do less on its own without a connection to the service and its information.

What’s the point?

I’m not trying to make you a Luddite. I’m also not encouraging you to take a homestead approach to information, where you run a hand-crafted application that uses a home-made web server, database and operating system, running on chips you built yourself and so on. (Although open source is a relatively painless way to get part of the way there for at least some areas.)

I guess I’m encouraging you to think again about the costs, benefits and risks associated with information, and acknowledge that everyone, including you and me, has to make trade-offs. It’s worth making sure you have an accurate picture of your situation, and check that you’re happy with the trade-offs you’ve made.