# Sankey diagrams to explain Coronavirus and Covid19

There’s a kind of diagram, called a Sankey diagram, that can be used to show relationships between things.  I will briefly introduce it, and then use one to illustrate Coronavirus and Covid19 in the UK.  It will simplify things, but I hope will still help you get a better understanding of how the various numbers fit together that you’ve heard relating to Coronavirus and Covid19.

Please note that I’m not a doctor, an epidemiologist, a public health policy maker, or anyone else remotely qualified to make pronouncements on the virus and the harm it can do.  So please continue to follow the laws and recommendations that apply where you live.

Also, while this is an example of data visualisation, it’s also about people.  As of when I’m writing this, about 98,000 people have died in the UK of Covid19, which is 98,000 too many.  Other people have survived but suffer either directly from the disease, or from the wider harms such as losing their job etc.  All I can say to you if you’re in this position is I’m sorry to hear that, and I hope you recover soon.

## Sankey diagrams

A Sankey diagram is a collection of blobs connected by lines.  The height of the blobs can reflect how many of something there are to do with each blob.  The width of the lines shows how many things to do with the blob at the start of a line become things to do with the blob at the end of the line.  I hope this becomes clearer shortly, if it isn’t already clear.

One example of a Sankey diagram is to show how far people make their way through a sales funnel, such as negotiating the various pages involved in buying from an ecommerce website.  In this case the blobs in the diagram represent the pages involved in the process, and their height shows how many people visit each page.  The line between page A and page B shows how many people who were on page A went to page B.

They can be useful to pinpoint which step is the problem in a series of steps.  For instance, if lots of people arrive at the home page of your website, but only a small number of people end up checking out successfully, where are they dropping out?  Which page is the problem?  A Sankey diagram could show you this information via a nice thick line (a lot of people) arriving at a page, but only a much thinner line (fewer people) moving onto the next page.

This kind of information for a website can be displayed in a Sankey diagram using Google Analytics, among other options.  In the diagram I will show below about Coronavirus and Covid19 there will be a main flow from end to end, and then things will branch off that flow and immediately stop.  Sankey diagrams don’t have to be that simple – in general, flows can split and merge in any combination at any stage in the diagram.

## Coronavirus and Covid19

This is where the caveats resume.  I will be presenting a simplified view of things, to make things clearer.  For instance, I will assume that everyone who has died of Covid19 was admitted to hospital first, rather than e.g. dying in a care home.  This is because it was too hard to find this data.

I’ve had to make a decision about what data to show – is it a current snapshot, or totals over time?  I decided to show totals over time.  Neither is perfect – while totals over time gathers up as much data as possible, the problem is that the world has changed in important ways since the beginning of the pandemic.  New variants of the virus have appeared, but also medical treatments have improved.

I’m also making huge assumptions about susceptibility – if you’ve had it already, can you get it again?  How much protection do you get after one or two doses of the vaccine? (I’m assuming that there’s 100% protection from even one dose, which I’m fairly sure is unfortunately an exaggeration.) Etc.  Also, some people who have been vaccinated have previously been infected (and so were in the has been infected blob and possibly blobs that flow from it), which I’m not showing.

Despite my assumptions above, I’ve tried getting the best data I can from the UK government website, but it must be remembered that early on at least, even the best data was not as accurate as we’d like.  This is just because in the real world it’s impossible to know e.g. how many people have been infected by a novel virus but displayed no symptoms.  Also, I tried splitting up has been admitted to hospital into needed mechanical ventilation / not but, while I could find current numbers for that, I couldn’t find a good total value.  So I am treating hospital admissions as the same, which I know isn’t accurate.

Finally, as I mentioned in the introduction, there’s more to Covid19 than a binary survived / died and I’m not making that distinction.  Even limiting things to just the direct effects of the disease rather than indirect things like mental health problems due to isolation, some poor people suffer for ages once they have left hospital such as the poet, author and educator Michael Rosen.

With all that out of the way, here is the Sankey diagram of Coronavirus and how it has affected the UK as of late January 2021. I created it using Google Charts.

Even with all the assumptions and caveats, I think it’s worth pointing out a few things.

The had one / two vaccinations blobs will grow over time, and the unvaccinated blob will shrink, as more and more people get vaccinated.  Note that vaccinated possibly just means that you won’t get sick, but it could mean that you could still infect others, so please continue to follow your government’s recommendations, even after vaccination.  Also, this makes assumptions about the effectiveness of vaccines against novel variants of the virus, unfortunately.

The other big split (after the vaccinated / not one) is the infected / not split.  This is the one most affected by hand washing, mask wearing, social distancing, staying at home and so on.  It’s not totally under your control, but your behaviour can influence it.  By way of contrast, your behaviour probably can’t influence e.g. whether you get an asymptomatic vs. a symptomatic infection.  Note that your behaviour influences both if you get into the has been infected blob and (if you’re infected) if other people get into it too.

## Percentages and rates

Another thing that I hope the diagram shows is that it’s important to keep track of what numbers you use to calculate statistics.  For instance, if you want to know how lethal coronavirus is, you probably want to compare the number of deaths with the number of infections (to compute a percentage).  You probably don’t want to compare the number of deaths with the size of the population (the everyone blob).

Also, to compare across countries you probably want to compare normalised admissions (number of people who needed hospital divided by population size) or normalised deaths (number of deaths divided by population size), to take account of the fact that countries have different sized populations.

## Summary

This is definitely an area where life is messy, and so data about it is messy too.  I hope that you and those whom you love don’t suffer from Coronavirus and all its harms, and I’m sorry if you have.  I also hope that this article has helped you to have a slightly clearer view of the big picture.