Introduction
In this post I analyse a pack of Top Trumps. I’ve long had an itch in my brain that it would be interesting to do this. Playing with several different packs as child, I built up an impression that there was a general pattern in the packs, and I wanted to see if this pattern were actually true. I also wanted to try to visualise it.
I analysed one pack, found the pattern I predicted (more on that below) but the details of the result contained an unexpected outlier. This made me go back over the data to check I hadn’t made a mistake, by doing supplementary analyses. This is a pattern of work I’ve found myself following several times when doing analysis in my (previous) day job.
The details of how I did this are in the next post.
Top Trumps
Top Trumps are a card game for 2 or more players. One pack will have about 30 cards on a given theme – cars, Chemistry, sportspeople, TV or film characters etc. A pack will have picked 5 or so characteristics relevant to the theme, e.g. for cars it might be acceleration, top speed, fuel efficiency etc. Each card shows one member of the set of things for the theme, e.g. the car pack would have cards for the VW Beetle, the Tesla Model X etc. Each card has a picture and text describing the thing (the car etc.) and gives the thing’s values of the pack’s characteristics.
Your aim is to have all the cards. The player whose turn it is will pick a characteristic, and all players read out the value for that characteristic on the top card in their hand. Usually the highest value wins, although there are some exceptions such as 0-60 mph time for a car. The winner takes all the top cards, puts them all at the back of their pack and has the next turn.
The feeling I’d built up from playing with several different packs is that there are 1 or 2 cards that beat most other cards, 1 or 2 cards that lose to most other cards, and then the majority of cards are somewhere in the middle – winning or losing depending on what the other cards are in the round and what characteristic is chosen.
The pack in question
The pack I happened to find lying around the house was the 2012 Olympics pack. There might have been a few cards missing – I had 30. The cards showed Olympic gold medal winners from 1920 to 2008, and the characteristics were:
- Number of Olympic gold medals
- Year of first medal
- Height (cm)
- Hall of fame
- Number of Olympic games competed in
The rules said that the highest number won for all characteristics. This is a bit weird, but what I went with. It meant that tall people won, which would mean that on average men would beat women. Also, it would favour more modern athletes, whose first medal was recent.
How to visualise the cards
Unless you have a wins-against-everything card or a loses-against-everything card, a given pair of cards A and B will have some ways for A to win (by picking its best characteristics) and some ways for B to win. This means that for a given pair of cards (A, B), there are two bits of information – how often A will win and how often B will win.
This pair of bits of information per pair of objects initially made me think of using a force directed graph in D3 with a pair of links between each pair of nodes, like this excellent visualisation of mobile phone law suits.
After thinking about it a bit more, I realised that this wouldn’t be helpful because in a pack of N cards, each card will be paired up against all N-1 other cards. In the mobile phone law suits graph, not all companies are suing all other companies, which means the graph isn’t too dense and cluttered. If I used this approach for Top Trumps there would be too many lines to read anything – a bit like the Death Star diagrams for large micro-services systems.
Time for a plan B. What I did was use Excel to do the following:
- List all the cards as a table
- Work out another table where each card is compared to each other card. For a given pair of cards (A, B), I calculated the number of characteristics where A would beat B.
This table included the pair the other way around – (B, A) – on purpose. For characteristics with a limited range of values, e.g. number of games competed in, there’s a decent chance that a pair of cards will tie. This means that if there are 5 characteristics and A beats B on 3 characteristics, we can’t assume that B will beat A on the other 2 characteristics. So we need to calculate each pair both ways around. - For a given pair of cards (A, B), calculate the net number of characteristics in which A beats B. This is the number of characteristics in which A beats B minus the number in which B beats A.
- For a given card A, calculate its average net number of winning characteristics. I.e. find all the pairs (A, x) and average the net number across all x. A wins-against-everything card would have a big positive value, a loses-against-everything card would have a big negative value, and a card that sometimes wins and sometimes loses would have a value nearer to zero.
The results
I tried a few different options for this, including a bar chart. I picked this approach because it showed how the scores fall into clusters:
- A low outlier (Muhammad Ali – see below)
- A cluster between this and 0
- A cluster between 0 and +1
- 1-3 clusters above 1, depending on where you draw the lines
I have already spent more time on this than I planned to, which is why I didn’t go further with analysing the clusters, e.g. using k-means in R.
“I am the greatest” is the opposite of true for this pack – why?
Muhammad Ali claimed “I am the greatest”. Why is he such a low outlier in this pack? Is the analysis wrong?
The stats on the card for Muhammad Ali are:
- Golds: 1
- First medal: 1960
- Height: 187 cm
- Hall of fame: 81
- Number of games: 1
Remember, this is looking at his record as an Olympian, not at all of his boxing career or cultural impact. The values for many of his characteristics put him at the low end of the pack.
The pack is skewed towards modern athletes, which means that 74% of the pack is more recent than 1960 (and so beat him) and he will beat only 23% of the pack. (He is the only person with 1960 – as there are 30 cards in the pack, one person is roughly 3% of it.)
Despite his general significance and boxing championships, Muhammad Ali won only 1 gold medal. This puts him in the bottom 17% of the pack, i.e. 83% of pack will beat him.
I picked stacked bar charts for both of these for a reason. A pie chart would also show how a total set (the whole pack) is broken up into different-sized pieces (e.g. by number of gold medals). However, in both of these cases there is an order to the pieces, and so stacking them up like this helps you see how much is above or below a piece.
I checked the other characteristics by eye but don’t have graphs for them. I also checked the highest outlier – Carl Lewis – and he does really beat most other cards most of the time, quite often on all 5 characteristics. So I’m confident enough that the analysis is correct.
This pattern of working is one I have followed several times:
- Think about what final visualisation you want
- Work through a process of transforming the data you have into a form that supports that visualisation
- Notice something unexpected about the visualisation
- Go back to earlier stages of the data processing and perform extra analysis on parts of it to check if the unexpected thing is definitely in the data or an error produced by a bug in the processing.
Conclusion
The pattern in the analysis does confirm the prediction I had about how cards are distributed in the pack. The shiny visualisation I first had in mind turned out to not be suitable for this application, and so I went for a more humble approach using Excel. This seemed to produce acceptable results for a reasonably small amount of effort.
The visualisation produced an unexpected outlier. I went back to earlier stages in the data processing that supported the visualisation, and did some extra analysis to check how genuine the outlier was. The outlier was confirmed as genuine, so I am comfortable with the main visualisation.
I hope that this was interesting, and possibly a useful example of how you might do some gentle analysis on everyday data to get an insight into it.