International languages?

There are several things that are either official or de facto international languages. (I’m using a more elastic meaning of the word language than you might expect.) It’s interesting to look at the history and influences behind these languages, because they often leave fingerprints on the language. Who are the languages for? Who decides on what’s in a language and what’s not?

The languages I’ll be looking at briefly are:

Symbols for chemical elements, as in the periodic table of elements
The International Phonetic Alphabet – the odd-looking symbols in a dictionary that say how to pronounce a word
Internet domain names
Country codes as part of an international phone number
Computer programming languages such as JavaScript

These might not all strike you as languages, but I’m taking language to mean a set of symbols with meaning behind them and rules for combining them to make other meanings, that together help people communicate or get a job done. For instance, in chemistry there are symbols such as H, S and O (meaning hydrogen, sulphur and oxygen), and rules that let you combine them into things like H₂SO₄ (meaning sulphuric acid).

Chemical elements

The periodic table lists all the chemical elements. By that I mean: each time you add a proton to the nucleus of an atom (and a matching electron, and possibly some neutrons), you get a new element and hence a new cell in the table. The periodic table is an excellent example of the conscious choices that go into representing data. Isotopes aren’t listed (for instance, deuterium and tritium for hydrogen), because they would make it even more complicated. Once the contents have been fixed, their arrangement highlights the trends in properties of the elements.

If you look at the periodic table, the cells in the table have abbreviations for the elements, such as H, rather than their full names, such as hydrogen. As you scan over the table, you might find yourself trying to remember or work out the full names from the symbols.

Some full names are quite easy to guess or remember: H = hydrogen, He = helium etc. However, there are some that are harder: Na = sodium, K = potassium, Hg = mercury, Au = gold, Pb = lead, W = tungsten etc. This might trigger two questions:

Why are some of those symbols so weird?
Why are only some of those symbols weird, and how weird are the symbols in general for people from other countries?

Part of the weirdness comes from practical things, which I’ll get out of the way first. All the symbols are up to two letters, and sometimes taking the first two letters from a name will lead to clashes. Ca is the beginning of both calcium and caesium, so calcium gets the symbol Ca and caesium is Cs. But that doesn’t explain the weird examples I listed above.

I’ll briefly give the history of those symbols, before looking at the more general point.

Na is short for the Latin word natrium, which comes from the Egyptian word natron, which refers to a natural mineral salt that contains sodium compounds.
K comes from kali, which comes from alkali, which comes from the Arabic word al-qalyah meaning plant ashes. (The full name, potassium, is separately derived from potash because potassium salts came from adding water (in a pot) to ash from burning wood or leaves.)
Hg is an abbreviation of hydrargyrum, which is the Latin form of the ancient Greek hydrargyros meaning water-silver.
Au and Pb come from the Latin words aurum and plumbum respectively
W comes from wolframite, which is a mineral containing tungsten. Wolframite comes from the German wolf rahm, meaning wolf’s soot or cream. This ultimately came from the extraction of tungsten consuming lots of tin, as if wolfram were a hungry wolf.

The general point is why are most of the elements’ names in English, with most of the exceptions derived from Greek or Latin? There is a long history of Indian, Chinese, and Arabic scientists, and yet there’s very little evidence of their influence in the periodic table.

International Phonetic Alphabet

The International Phonetic Alphabet (or IPA) is a way of representing the sounds in human speech. It aims to cope with all variations in all languages. The important thing to remember with it is that it records sounds (in a regular way – there’s only one way to record a given sound, and each symbol in IPA represents only one sound). It doesn’t necessarily match up to letters.

It also reveals details that we often ignore or take for granted. For instance, there are two groups of vowels – monophthongs and diphthongs. Monophthongs are things like the vowel in meet. The sound doesn’t really change over time, and you can stretch it out as long as you like without it being all that odd (for instance, if you’re singing and hold a note for a long time).

Diphthongs are a bit like two monophthongs stuck together. These are things like the vowel sounds in boy, night, loud etc. They start off as one sound and then change to another. This becomes more obvious if you try stretching them out for a long time – do you stretch out the first sound, the second, or both? Do the different ways of stretching it sound equally good to you?

The consequence of all this is that IPA looks like words from an alien language. There might be bits that sort of look familiar, but most of it looks odd. That’s because it includes things that look like normal letters – p, t, etc. – but also things that don’t (as you can see below). Rather than go for a completely new set of symbols that aren’t in any language – for instance a collection of different geometric shapes like triangles and rectangles, IPA uses letters or things like letters.

In IPA the words meet, boy, night and loud are written (assuming British English) as miːt, bɔɪ, naɪt, and laʊd.

This comes to the crux of this section – which symbol do you use to represent which sound? In fact, one symbol was the inspiration for this whole article. It is the symbol j. The symbols jel represent the sound of the word yell, because j is the representation of the sound of y.

Why isn’t the symbol y, and why is it j? If you look at Swedish, Norwegian, German, and Dutch, words like fjord and ja have the letter j making a sound like the English y. So the question could equally be: why can’t it be j? Why should English be the default?

Internet domain names

Here are some websites:

www.army.mil – U.S. Army	www.army.mod.uk – British Army	www.defense.gouv.fr – French Ministry of the Armed Forces
web.mit.edu – Massachusetts Institute of Technology (U.S.A.)	www.manchester.ac.uk – Manchester University (UK)	www.sorbonne-universite.fr – Sorbonne University (France)
www.usa.gov – U.S. Federal government	www.gov.uk – UK government	www.gouvernement.fr – French government

You might notice that the British websites all end in .uk and all the French ones end in .fr. The American ones end in a variety of things – .mil, .edu and .gov. The last part of the web address (before any slash, question mark etc.) is the most important part, and must be a valid Internet Top-Level Domain (TLD).

The internet grew out of a project by the US Advanced Research Projects Agency. It wasn’t designed as a country-neutral international network, but has grown out of a US one. This is at least one reason behind the US-centrism of the top-level names.

It’s similar to postage stamps, which were first produced by the UK. By tradition, the UK is the only country that doesn’t put its name on its stamps.

International phone numbers

This is even further from a normal language than the other examples in this article, but I think it’s still useful to look at. If you are in one country and want to phone another, you need to add a number to the start of the number you want to dial, that says which country that number’s in. For instance, the USA and Canada share 1, the UK is 44, Ukraine is 380, and Morocco is 212.

There is a list of which number goes with which country, which is managed by the International Telegraphy Union, which is part of the UN. How are the numbers allocated? You’ll notice from the examples above that some countries have short numbers (like 1) and others have longer (like 380). The numbers can be sorted in the normal way – which numbers are near the top of the list and which nearer the bottom?

The numbers are broadly geographic, but within that there’s still a bit of oddness. The US / Canada is 1 and Russia is 7, but China is 86 and India is 91. So, the shortness of the number doesn’t relate to how many people live in the corresponding area.

Programming languages

Programming languages act as a bridge between humans and computers. On one side of the bridge, computers understand only numbers. For instance, in the Z80 chip inside e.g. a ZX Spectrum the number 201 (C9 in hexadecimal) will tell the chip to finish working on the set of instructions it’s currently working through and go back to where it was previously. On the other side of the bridge, humans are used to communicating using words and punctuation. Attempts have been made in the past to allow the human side of the bridge to be quite similar to everyday language. For instance, in COBOL you can say:

MULTIPLY OLD_SALARY BY PAY_RISE GIVING NEW_SALARY

In a more modern language such as Java or C#, the same thing would usually be written more like

new_salary = old_salary * pay_rise

Programming languages come with a program behind the scenes – a compiler or interpreter that will translate the words of the programming language into the numbers that the computer understands.

The computer side of things seems fairly universal (as long as you understand Arabic numbers). However, the human side of things seems to be English a lot of the time, rather than e.g. Arabic, Serbo-Croat, Mandarin, or Swahili. As part of learning about e.g. the C# keyword while, you need to imagine it written as tra (if you’re a Welsh speaker, as while is tra in Welsh). That’s before you get to all the fiddly computer stuff about brackets etc.

JavaScript is, some ways, an extreme example of this. It powers the behaviour of websites all around the world, and has been standardised as ECMAScript. It’s governed by Ecma, which used to be known as ECMA. ECMA is short for the European Computer Manufacturers Association. A body originally set up to standardise computer technology across Europe has its name associated with a programming language that needs users to know English (if you speak only French, German, Spanish etc. you’re a bit stuck). One way to see this clearly is to look at the French wiki for JavaScript – most of the content is French, but then there’s a snippet of code and it’s all in English.

Summing up

This could all be viewed as grumpy old man shakes fist at the sky. These are big things, used by many people, many backed by important organisations – I have as much chance of changing them as I do of changing the clouds by shaking my fist at them.

I don’t think I’m trying to change them, just to point out a pattern. It’s easy to think of these languages as somehow like platonic solids – an inevitable part of the universe that transcends humanity. I think it’s more accurate to describe them as reflecting their human origin; like people they are wonderful but flawed.

Some, like Arpanet becoming the Internet, are things that solved local problems so well that they were adopted to solve global problems. The limitations or irregularities of the global solution were baked into the local solution, but when the scope was restricted to a local problem this was fine. Many attempts at global products or solutions fail, and I can see the sense in starting small and then growing, because it allows risks to be managed more easily.

However, it still irks me when English or the West is seen as the default. This puts unnecessary barriers in the way of many people, and suggests that many people are less important than those who happen to fit the default.