# Why isn't English the world's most irregularly spelled language?



## Encolpius

Hello, I ran into this article about English spelling. Here is a passage: ...and found that 60 percent of them had one or more unpredictably used letters. *No one knows for sure, but the Spelling Society speculates that English may just be the world’s most irregularly spelled language.*
I really cannot understand why they are afraid to say: English IS the world's most irregularly spelled language.
No other language is mentioned, since we are talking about written languages, I think we cannot say we do not know all the "6000" languages spoken on Earth.
Do you know about any other language with more irregular spelling? Or at least with similar difficult spelling (French? Irish? Danish?)
And I think we can leave out languages of extraterrestrials. 
Thanks.


----------



## Nino83

Japanese? 
こうこう ‎(_romaji_ kōkō)
高校: high school, 工高: industrial education high school, 口腔: oral cavity, 坑口: pithead; minehead, entrance to a coal mine, 航行: sailing, 孝行: filial piety, 膏肓: incurable disease, 交媾: sexual intercourse.

Among alphabetic writing systems, French.
sans = s'en = c'en = (je, tu) sens = (il) sent = sang = cent = [sɒ̃]
final [ɛ] = -ais, -ait, -aie, -ep (salep, cep), -êt (acquêt), -et, -ev (Lev)
final [e] = -er, -ez, -é, -és, -ée, -ées, -es, est, -ed (pied)
final [o] = -eau, -eaux, -aux, -aut, -au, -aud (cabillaud), -ot (cabillot), -op (trop), -os


----------



## Mr.Dent

I found the following image at Does English have the most inconsistent pronunciation system in the world? - Quora
Perhaps it helps answer your question.


----------



## Nino83

You can predict very well how a word is pronounced looking at writing in French if you know some "rules" (like the "CaReFuL" rule), but I think it's very difficult to write down a word, because you cannot know if that word has a final silent consonant or if it's written with "eau, au, o", "an, en" and so on.


----------



## Englishmypassion

I wonder why Sanskrit and Hindi haven't been included as highly regular languages in the table linked to in post #3. Hindi, Sanskrit, Nepali and Kumauni-- all using the Devnagari script-- are highly regular languages as far as spelling is concerned.


----------



## Stoggler

I suppose because it's a Eurocentric view.  Or they limited the number of languages for brevity sake, or confined it to a few languages that use the Latin alphabet for ease of study.


----------



## M Mira

Tibetan wants a word.


----------



## Dib

Indeed, Tibetan spelling is also very inconsistent. Irish as well.

But having grown up with Bengali and English as my main languages (though English as a distant second), I have never figured out which spelling system is more inconsistent. I suppose (purely based on impression), spelling to pronunciation _may _be a bit more predictable in Bengali(A), but pronunciation to spelling in Bengali is horrible. To take a (somewhat trumped up) example, the word "sanniddho" (সান্নিধ্য = proximity, company) could arguably be spelt in thousands(B) of ways without changing the pronunciation. But of course only one of them is actually "valid".

----------

(A) But still it is not all that predictable. The nominal spelling sɔrɔlɔ (সরল) stands for both /sɔrol/ (easy, straight) and /sorlo/ (it moved). The spelling kɔrɔ (কর) may stand for /kɔr/ ("do!" present imperative 2nd person intimate), /kɔro/ ("you do" present simple/imperative 2nd person neutral) and /koro/ ("do!" future imperative 2nd person neutral). Though the orthography allows for disambiguating some of them, it is not done regularly or consistently.

(B) Here is how I get to the "thousands":
1) The initial "s": There are 3 different letters pronounced "s" [ʃ], and in this context it is possible to have two different silent letters (b,m) after it. So, that gives 9 different ways of spelling the initial "s".
2) The geminate "nn": There are 2 different "n" letters, and the gemination can be signaled by the spellings hn, ny, nb, nn (and even trigraphs hnb, nny, etc. which I am not counting just to keep it a bit saner). So that's 8 ways of spelling it.
3) There are 2 "i" letters.
4) The geminate "ddh" could be spelt like that or dhy, dhb => 3 ways (Ignoring again the trigraphs ddhy, ddhb. dh is a single letter and sound).
5) Final o can be spelt in 2 ways.
=> Total 864

Counting the trigraphs for the geminates (which are not that uncommon actually), and more arcane features, which are strictly speaking possible but not usually used, e.g. using the vowel nasalization diacritic around "nn" (contexts where nasalization is nonphonemic), the spelling options could come close to 10,000.


----------



## Encolpius

Are YOU guys also afraid to say it IS English. 
I have been considering French, too, but dropped it.


----------



## Dib

Encolpius said:


> Are YOU guys also afraid to say it IS English.



As for me personally, more than afraid, I'd feel stupid to say that. There are clearly other contenders like Bengali, Tibetan, Irish, maybe even French (Remember "Les poules couvent souv(ent) au couv(ent)" from the film, _Amélie_?), and likely others, even if we ignore (partial) logographic scripts, like Chinese, Japanese, etc. But I know of no quantitative comparison between English and these languages (rather, orthographies). So, taking a stand would be premature - at least for me.


----------



## Wilma_Sweden

I'm rather pleased that English spelling is so irregular. Consider 'station' - it is clearly regognisable in writing to anyone familiar with romance languages and has usually been borrowed into other languages as is with the original spelling. Now, imagine a spelling 'reform' where you decide to convert all English spelling to phonemic spelling, what would it look like: stayshon? staysh'n (to honour the schwa)? What if the pronunciation is 'wildly' different in some dialects - you would have to either enforce the Queen's English or some other variant upon them, or end up with several acceptable spellings! I don't think so...!

The Norwegians have 'reformed' their spelling of station to 'stasjon', and 'central' to 'sentral' and it may make sense to Norwegians, but it makes slightly less sense to tourists looking for Oslo's central railway station: sentralstasjon.

I might seem conservative or even reactionary for suggesting that English speakers retain their irregular spelling, but with the amount of English speakers around the world, it would be the most practical solution. 

For the same reason, the French and the Danes might as well keep their current spelling, too - it can only get worse!


----------



## Encolpius

Well, English spelling reform is a different cup of tea. But language reform doe snot mean you change all the words, station is a very common word so it is OK, but as we have mentioned it in another thread, reforming gaol to jail was a good step. I think *rare words* or rare spelling could be reformed e.g duiker. It really took me a while to find the word in the dictionary. Station is not such an irregular word. Some nations reform even names, e.g.: Szekspir (Polish).


----------



## Wilma_Sweden

Encolpius said:


> Well, English spelling reform is a different cup of tea. But language reform doe snot mean you change all the words, station is a very common word so it is OK, but as we have mentioned it in another thread, reforming gaol to jail was a good step. I think *rare words* or rare spelling could be reformed e.g duiker. It really took me a while to find the word in the dictionary. Station is not such an irregular word. Some nations reform even names, e.g.: Szekspir (Polish).


This could be an endless discussion, but I believe that loan words that violate the phonetic or spelling pattern of the receiving language tend to get adapted to fit their new "home", particularly common words. E.g. E-mails in Sweden went from mail to mejl, i.e. the pronunciation has been retained but the spelling has been adjusted to suit Swedish spelling rules. I didn't know that duikers existed until you mentioned them, and they may as well be called diver antelopes, they dive for cover as the original Afrikaans/Dutch word suggests.


----------



## iezik

Wilma_Sweden said:


> Consider 'station' - it is clearly regognisable in writing to anyone familiar with romance languages. ... The Norwegians have 'reformed' their spelling of station to 'stasjon', and 'central' to 'sentral' and it may make sense to Norwegians, but it makes slightly less sense to tourists looking for Oslo's central railway station: sentralstasjon.


If somebody searches for "railway station" in old-fashioned paper phone dictionary, the precise word "station" is not much useful. Germans use Bahnhof, French Gare, etc. The Swedish järnvägsstation and the Norwegian sentralstasjon are about as incomprehesnible for Chinese if they're not used to Germanic spelling customs, the part "station/stasjon" is hidden in the middle.

It's useful to see the amount of variation of spelling adaptations of word "station" in different languages:

az stansiyası Azeri
cat estació Catalan
de German Bahnhof
en station English
eo stacidomo Esparanto
es estación Spanish
fr station French Gare feroviarie
ga estacíon Galician
id stasiun Indonesian
it stazione Italian
la statio Latin
lmo stazion Lombard
nl station Dutch
no stasjon Norvegian (both spellings)
pt estação Portuguese
ro stație Romanian Gară
sv station Svedish
vec stažion Venetian
vls stoatie West Vlams

So the Romance languages add, remove or change letters. e.g. Portuguese adds initial "e" and removes final "n". Instead of the final "n", there is nasalization sign "~". The letter after "sta" has changed into many variants, /czçțž/, so there are 5 possibilities. The stress is mostly not marked, so for the easternmost representative, stație, it's best to look up into a dictionary. Also the stress in English version is not marked, so I can imagine a Spaniard that is starting to learn English, to say something like /itrein isteisyón/. Spelling and pronunciation are similar just to an extent between the languages.

So I guess that adding a letter, changing a letter or removing a letter from "station" would leave a word that is about equally recognizable.


----------



## Wilma_Sweden

iezik said:


> az stansiyası Azeri
> cat estació Catalan
> de German Bahnhof
> en station English
> eo stacidomo Esparanto
> es estación Spanish
> fr station French Gare feroviarie
> ga estacíon Galician
> id stasiun Indonesian
> it stazione Italian
> la statio Latin
> lmo stazion Lombard
> nl station Dutch
> no stasjon Norvegian (both spellings)
> pt estação Portuguese
> ro stație Romanian Gară
> sv station Svedish
> vec stažion Venetian
> vls stoatie West Vlams
> 
> So the Romance languages add, remove or change letters. e.g. Portuguese adds initial "e" and removes final "n". Instead of the final "n", there is nasalization sign "~". The letter after "sta" has changed into many variants, /czçțž/, so there are 5 possibilities. The stress is mostly not marked, so for the easternmost representative, stație, it's best to look up into a dictionary. Also the stress in English version is not marked, so I can imagine a Spaniard that is starting to learn English, to say something like /itrein isteisyón/. Spelling and pronunciation are similar just to an extent between the languages.
> 
> So I guess that adding a letter, changing a letter or removing a letter from "station" would leave a word that is about equally recognizable.


I stand corrected. I was thinking mainly of the word station imported to non-Romance languages, but I didn't express that idea clearly. It still backfired in the railway context as I see from your impressive list. 

My theory was that in a linguistic/etymological context, we would expect the Romance languages to have their own equivalents of the suffix -tion as it is native to them, they are related and have evolved, while non-Romance languages would import the suffix and be more likely to retain the original spelling. This is true for English, German, Dutch, Swedish and Danish. In *writing*, it's easily recognisable across these languages. It's less helpful to tourists, but possibly helpful to some learners...


----------



## Red Arrow

I want to point out that back when I was in Norway with a group, everyone could understand the word 'sentralstasjon'. Of course, this might not be true for tourists who can't read Dutch or a Scandinavian language.



Wilma_Sweden said:


> It's less helpful to tourists, but possibly helpful to some learners...


Not at all. It doesn't make learning Swedish easier, but it doesn't make it any harder either. I suppose it's only a burden for Swedish children who have to learn about fifteen ways to write the normal sj-sound and about ten ways to write the light sj-sound [ʂ or ʃ].

The only thing that bugs me about Swedish spelling is the use of o. I don't mind multiple ways to write a certain sound as long as there is one predictable way to pronounce written words. That is not the case with the Swedish o.


----------



## arn00b

Any reform would need to think of what the main priority(ies) is(are).

Spelling simplicity/predictability vs pronunciation predictability.  (It's not always the same thing)  Wymin (for women) is predictable to pronounce, but not to spell.  Women makes more sense (Man > men, wo-man, wo-men).  Child (chayld) is also strange, when the plural is childrin.

Simplification (stayshin) would sever links between other languages (Romance, for example).
It would also sever internal links (for natives and learners alike) between words (station, static, status;  man, woman, men, women = wymin?) and create false links in people's minds (stayshin < stay?  staydyum < stay?). 

The spelling can never be 100% phonetic, since there are so many variants of English.  Some versions of English would have Rs appearing all over the place and others missing them, same with H (herb vs erb), tomato/tomahto variations (vayz vs vaaz), diphthong vs no (root, rowt; ee-thur, ay-thur, leežur, ležur), etc.  There's no one way of pronouncing T (glottal stop, alveolar flapping, etc).  Then there's vowel length and quality.

Then there's a whole issue with homophones - yes, "two" is weird, but using something like "tu" for "to" "two" and "too" is crazy, (made/maid, genes/jeans).

I once read a paper about using diacritics for English which would make pronunciation entirely predictable without changing any of the spelling.

It was something like this:

stäṭıøn
çentràl
wïnẹ
čḥééșẹ
dõg
rôgụẹ
šugàr

Or something of that sort.

It would take the guessing out of the pronunciation (but not the spelling).  Native speakers and advanced learners would simply not use diacritics, just as Arabic newspapers don't use tashkeel to mark short vowels and gemination, or how 90's and early 2000's Serbo-Croatian and French SMS users texted without diacritics (çøàáâ etc.; čđćžš)

Having said that, English spelling is not that complicated.  It's just the influx of foreign words that are used as-is without changing the spelling (pizza, attaché, Schadenfreude) that creates the more complex irregularities)

Anyway, I'm surprised no one mentioned Akkadian or Middle Persian.   MP would write "KLB" and pronounce it as "sāg" "MLK" as shāh "LḤM" as "nān."  These are called heterograms, a type of logograms, but I'm not sure if readers recognized saw "KLB" as a single symbol (like this: 狗) that is pronounced as "sāg" or if they translated foreign words as they read them.  Such as "The viande is not fresh with viande pronounced as 'meat'"


----------



## Red Arrow

@arn00b: I think it is perfectly possible to make one spelling for all English varieties as long as you keep the letter A untouched. You don't need any diacritics. It would also be possible to keep links between words, again, if you don't change the letter A.


arn00b said:


> Simplification (stayshin) would sever links between other languages (Romance, for example).


What's wrong with English spelling looking less like French?

I am not a fan of a full reform, though. The words you have mentioned are written perfectly fine. But I would appreciate it a lot if spelling would be slightly simplified. Why is there a difference between American and British spelling, anyway?


----------



## Hulalessar

Red Arrow :D said:


> Why is there a difference between American and British spelling, anyway?



Noah Webster.


----------



## Penyafort

Verba volant, scripta manent. When an old spelling system remains pretty much the same, even if the shifts in the language have been many, the logical result is a highly irregular spelling. 

If your language had its spelling fixed -or an important reform happened- in the last two centuries (many have), or if the spelling is old but the changes in the language have not been too radical (Spanish, for instance), then you are lucky. But English and French use old spellings and have changed a lot, so the irregularities are just logical.


----------



## M Mira

arn00b said:


> Anyway, I'm surprised no one mentioned Akkadian or Middle Persian.   MP would write "KLB" and pronounce it as "sāg" "MLK" as shāh "LḤM" as "nān."  These are called heterograms, a type of logograms, but I'm not sure if readers recognized saw "KLB" as a single symbol (like this: 狗) that is pronounced as "sāg" or if they translated foreign words as they read them.  Such as "The viande is not fresh with viande pronounced as 'meat'"


"lb" is pronounced "pound"

-----
I don't think English spelling can be fully "regularized" due to how derivation works, with cases like "record" (n.) & (v.) and "nation" & "national" standing in the way.


----------



## Dib

M Mira said:


> "lb" is pronounced "pound"



Thank you Mira. I think, this is a perfect parallel to Pahlavi heterograms.


----------



## iezik

M Mira said:


> I don't think English spelling can be fully "regularized" due to how derivation works, with cases like "record" (n.) & (v.) and "nation" & "national" standing in the way.


I believe that English cannot be modified with a decree as there is no central authority for English language. There are some languages that changed spelling fast and to a large extent: Turkish around 1930, Chinese ~1950, Greek ~1980. In all such cases, there was an authority. English is a world language.

There are no obstacles to use the methods of surrounding languages also for English. With similar methods, similar results can be achieved. I see no problems in making English spelling as good as e.g. Spanish. So, let's see.

1) *record (n)* and *record (v)*. English often reduces vowels in unstressed syllables so that from the full number of simple vowels (about 10-15, depending on counting), a smaller number of distinguishing vowels is left. There are other languages with such a feature: Russian, Portuguese, Catalan. A spelling can be such that it's only necessary to know the stressed syllable. Spanish is good at marking stress. In Spanish, there is a rule how to find a stressed syllable, starting from just the written form. A similar rule for English can be "Put an acute mark to a stressed vowel of a word if it has at least two syllables and the first syllable is not stressed". Then, the letter /c/ that is pronouced as /k/ can also be written this way, as already in words of Germanic origin. Then, these two words are* rekord (n)* and *rekórd (v)*. For the rhotic and non-rhotic variants, see below.

2) *nation* and *national* show alternation between traditionally-long vowel and short vowel. The short vowels /æ,e,i,o/ had several centuries long counterparts /ææ, ee, ii, oo/ that are often written the same way and are nowadays pronounced /ei, ii, ai, ou/. Chomsky once tried to devise a way for distinguishing between both cases using only the current spelling, but I don't think it's possible. Other languages usually introduced some additional marks, grave, circumflex etc. We can than write these letters as /à, è, ì, ò/ when the stress is not needed to be written and as /â, ê, î, ô/ when stress needs to be written. Given that -tion, -sion and -ssion are usually pronounced /ʃən/ and occasionally /ʒən/, we can reduce learning /t/s/ss/ difference by using customs of another language, Czeck, und use -šon and -žon. /š/ and /ž/ can be used for writing /ʃ, ʒ/ when a single consonant (possibly doubled) is currently used. The sequences /sh/ and /zh/ are just fine. The two words are then *nàšon* and *našonal*. It's fine that the previous pair of words show marking the stress. Now adding a diacritical mark to a related word is very similar to the marking of some French verbs that have a variance between /é/ and /è/ as céder.

Did I mention Catalan? For my taste, they have the best method to mark that two consecutive letters should be read separately and not a digraph? /ll/ has in Catalan usually the IPA [ʎ]. If there's a long /ll/ to pronounce, the spelling inserts a separator, a middle dot, so the spelling is /l·l/. English could use it for occasional /ph/ sequence of letters, so *shepherd* could be marked as *shep·herd.
*
I'm not sure if the limitations of e.g. Spanish spelling are known. From the written form to the pronounced form, the way is nearly uniform. The other way, from pronunciation to spelling, is harder.
- Letter /h/ is inserted according to the history of the language
- letters /bv/ are pronounced (if I recall correctly) the same, the distinction is made according to the history and/or the neighbouring languages
- letters /szc/ are in American Spanish pronounced the same. The letters /zc/ are written according to the next vowel. Here, the European standard Spanish user has an easier task as prnonciation of consonant and the next vowel specifies the writing.

The last point is similar to /r/ pronunciation in American and European English. The dialects are different here and one version is to be taken. As seen from the first example pair, American (more distinguishing) version can be used.

It's fine that Mira's native language is Mandarin, a language that can easily produce thousands of characters using only 26 letter keys on keyboard. Such a system for English is not yet created, but let me explain it. It would work like entering Chinese in word mode, where e.g. 妈妈 can be entered by typing the four keys /m,a,m,a/, pressing a non-letter key (space), and the computer uses a dictionary to convert four latin letters to a Chinese word. So, the English examples above could be entered as e.g. record, record2, nason, nasonal, shepherd. The computer would provide the additional information.

So, just making sure that similar sounds, similar inflected forms, similar roots, similar dialects also look similar on a screen or paper, this is rather easy. But its very costly to teach one more spelling to all the English speaking (writing) people.


----------



## Nino83

Would it be _assòciâšon_ or _ësòsjâšën_? 
Too much diacritics would be heavy.


----------



## Dymn

I think Thai is another contender for the most irregularly spelled language.


----------



## Encolpius

So, *Bengali and Thai* mentioned, but I know nothing about them, so cannot make comments.


----------



## Dib

Encolpius said:


> So, *Bengali and Thai* mentioned, but I know nothing about them, so cannot make comments.



And Modern Standard *Tibetan*.


----------



## Nino83

Encolpius said:


> So, *Bengali and Thai* mentioned, but I know nothing about them, so cannot make comments.


Maybe Diamant7 is referring to the fact that some consonants that once had different pronunciations now sound the same. 
See this table. At the same time, many consonants that are different in initial position merge in final position (see how many letters represent [k̚], [t̚] and [p̚]).


----------



## Lugubert

Another vote for Tibetan as the worst one. It's bad enough that the 'b' is silent in _blama_, and that the yak is _gyag_, but I find it really difficult that written _dbus-gtsang_ is pronounced Ü-Tsang.


----------



## iezik

Nino83 said:


> Would it be _assòciâšon_ or _ësòsjâšën_?
> Too much diacritics would be heavy.


In line with examples above and to allow quick reading,* asòçiâšon*. The list that arn00b showed above is more an exercise in typing Unicode characters. Existing spellings minimize the number of symbols.

A note on using /ç/: I've read plenty of articles using such spellings. Changing the base characters of word slowed down my reading, expecially if the base letters are not from the same interchangeable set. E.g. we're used to changing /s/ to /z/, I'm often not aware of -ise or -ize forms of the verbs that I read. So converting /s/ to /z/ doesn't slow down the reading. But converting /c/ to /s/ would make many words unrecognizable, çell/sell, Çhikágò/Shikágò, çenter/senter, çent/sent. So it's better to either use a well-established convention that uses the same base letter (here French ç) or to invent something different from existing. It also seems fine not no retain /c/ to have a sound depending on the next sound. Such dependencies are fine e.g. for German devoicing at the end of word as long as Germans do it regularly, as a phonetic rule. As soon as it's possible to pronounce /c/ in two ways independently of the following vowel, the phoneme split is finished and the time has arrived for a new letter.


----------



## Delvo

I don't know of any Bengali issues, but here's a description of what happened to Thai:


----------



## Lugubert

Delvo,

Thanks a lot for the entertainment. Anyway, despite for example the multiple Thai t's, I still think Tibetan is waaay worse.


----------



## Delvo

This guy agrees:


----------



## Delvo

(English might not be the worst... just the worst that foreigners ever need to deal with .)


----------



## Encolpius

Yes, that¨s the point, English is very popular and difficult.


----------



## Kevin Beach

M Mira said:


> "lb" is pronounced "pound" .....



So is "£".

Both being abbreviations of Latin _libra_ 

I mean, really, if we hadn't already learned it, could we ever have guessed it?


----------



## Stoggler

Kevin Beach said:


> I mean, really, if we hadn't already learned it, could we ever have guessed it?



Nope! 

(Greetings from the western end of Sussex!)


----------

