# Orthographic depth



## Villeggiatura

The whole idea of spelling competitions is predicated on the orthographic depth: the deeper the orthographies, the more challenging the spellings, the more interesting the competitions.

A contest for the most phonemic orthography could be just as interesting.

Having peculiar spellings is fine as longs as they're phonemically representative and consistent (like _gli_ in Italian, _nh_ in Portuguese), their being butchered by foreigners is irrelevant, so is the phonemically unrepresentative and inconsistent spellings being correctly pronounced by native speakers.

I'll only explore three areas where three serious contenders outscore one another, please expand.

1. Proparoxytones
with diacritic: basílica ópera pélago (Portuguese) basílica ópera piélago (Portuguese)
without diacritic: basilica opera pelago (Italian)

Without prior knowledge of the correct stress, proparoxytones can be mistaken for paroxytones in the Italian orthography because the stress defaults on the penult.
Thus, Portuguese and Spanish outscore Italian.

2. Oxytones with diphthongs at the end
with diacritic: dinastía geografía melodía (Sp.)
without diacritic: dinastia geografia melodia (P.& I.)

Without prior knowledge of the correct stress, such oxytones can be mistaken for paroxytones in the Italian orthography for the same reason; however, they can't be mistaken for paroxytones in the Portuguese orthography because every paroxytone takes an accent mark (likedicionário, pronúncia).
Thus, Portuguese and Spanish outscore Italian.

3. Psilosis
Initial _H_'s are largely preserved in Portuguese and Spanish orthographies (like historia, heroe, heroi) but phonemically irrelevant, same for the intervocalic _h'_s in Spanish orthography (like cohorte, vehículo, and I think intervocalic _h_s are mostly obsolete in Portuguese); in constrast, initial and intervocalic _h_s have disappeared almost completely in Italian orthography.
So Italian scores the highest here, Spanish scores the lowest, Portuguese somewhere in between.


----------



## bearded

I agree with Villeggiatura on the above.  Like with many languages (e.g. Arabic), in Italian you cannot guess the correct pronunciation of all words just by seeing the writing.  Anyhow, in those cases where (for us) there are doubts, we do indicate the stress (example: sùbito (=at once) vs. subìto (=suffered..).


----------



## Dan2

Hi Villegiatura,

You start by mentioning spelling competitions (where contestants are given the pronunciation of a word and must produce its correct spelling), but then in your points 1 and 2 seem to be focusing on the opposite challenge (given the spelling, determine the pronunciation).  (I agree with you that Spanish is more transparent than Italian in this respect.  Italian also fails to specify the voiced-or-unvoiced quality of the letter 'z'.)

Then in 3 we seem to be back to the "given pronunciation, what is spelling" problem, where you point out that Spanish h presents a difficulty in this regard.

Another problem for Spanish in the pronunciation -> spelling direction is the homophony of 'b' and 'v' and (in Latin American varieties) of 's' and 'z' (and 'c' before 'i' and 'e').  So the word for "kiss", given its pronunciation, might be spelled "beso", "veso", "bezo" or "vezo".  You often see errors based on these ambiguities on the part of Spanish speakers without much formal education.

This question of "which direction" you're examining is particularly important in the case of French, which is _extremely _difficult to spell correctly, given the pronunciation, but is not too bad in the opposite direction ("chaud" and "eaux" are bizarre spellings for [ʃo] and [o], but given  those spellings, the pronunciations are clear).

Anyway, no criticism intended, but I just wanted to make explicit, for any continuation of the thread, that there are two "directions" to consider.


----------



## Nino83

Yes, in Spanish and Portuguese it is easier to know where the stress is.
In paroxytones in Italian and Portuguese you can't say if the vowel is mid-open or mid-closed.

In Portuguese you can't know if the /w/ is pronounced in /qu/ + /e, i/, for example _frequente_ /kwe/ but _quente_ /ke/ (the _trema_ was abolished, before 1990 the orthography was _freqüente_).

As Dan2 said, in Portuguese and Spanish you can't say if a word is written with a _c_ (before /e, i/) or _s_, for example [ˈsĩⁿtɐ] can be both _sinta_ (feel, 1 person singular present subjunctive) and _cinta_ (belt), with an _s_ or an _x_, _ct_ or _t_ like [iˈzatu] (it could be written _esacto_, _exacto_, _esato_ or _exato_). The same for /ʃ/, for example you write _macho_ but also _mexe_, or for /s/, _poço_ and _posso_, _moça_ and _mossa_.


----------



## francisgranada

....

4. Assimilation

An example: the adjective _Serbian _is spelled in Serbian/Croatian _sr*p*ski _(phonetically)_, _while the noun _Serbs _(plural)  is spelled _Sr*b*i_. However, e.g. in Slovak both are spelled with _*b*_ (etymologically): _sr*b*ský and Sr*b*i _even if the adjective is pronounced ['srpski:] because of the assimilation of _b_ before the voiceless consonant _s_. So an illiterate Slovak might spontaneously write down "incorrectly" *_srpskí_. 

There are also words of this kind having different meanings, though pronounced equally but spelled differently in Slovak, e.g. _plo*d*_ (fruit) and _plo*t* _(fence). Now, which scores the higher in this case, the Serbian or the Slovak? ...


Villeggiatura said:


> Initial _H_'s are largely preserved in Portuguese and Spanish orthographies ...  So Italian scores the highest here ...


Yes, but e.g. for a European foreigner who is used to the (partially) etymological spelling of many words of Latin origin in his own mother tongue (or other languages, e.g. English) it is not necessarily true. For example for a Hungarian/Czech/German etc ...the words spelled _heroe, texto, srb_ ... are surely more easily "decipherable" than _eroe, testo, srp_ ... because of the presence of words like _heroism(us)_, _text_, _Serbia_, etc ...  in their own (or other "known") languages.


----------



## Villeggiatura

Dan2 said:


> Hi Villegiatura,
> 
> You start by mentioning spelling competitions (where contestants are given the pronunciation of a word and must produce its correct spelling), but then in your points 1 and 2 seem to be focusing on the opposite challenge (given the spelling, determine the pronunciation).  (I agree with you that Spanish is more transparent than Italian in this respect.  Italian also fails to specify the voiced-or-unvoiced quality of the letter 'z'.)
> 
> Then in 3 we seem to be back to the "given pronunciation, what is spelling" problem, where you point out that Spanish h presents a difficulty in this regard.
> 
> Another problem for Spanish in the pronunciation -> spelling direction is the homophony of 'b' and 'v' and (in Latin American varieties) of 's' and 'z' (and 'c' before 'i' and 'e').  So the word for "kiss", given its pronunciation, might be spelled "beso", "veso", "bezo" or "vezo".  You often see errors based on these ambiguities on the part of Spanish speakers without much formal education.
> 
> This question of "which direction" you're examining is particularly important in the case of French, which is _extremely _difficult to spell correctly, given the pronunciation, but is not too bad in the opposite direction ("chaud" and "eaux" are bizarre spellings for [ʃo] and [o], but given  those spellings, the pronunciations are clear).
> 
> Anyway, no criticism intended, but I just wanted to make explicit, for any continuation of the thread, that there are two "directions" to consider.



Great clarification, absolutely .
The (un)predictability in look-then-read and in listen-then-spell determine orthographic depth.


----------



## Nino83

Villeggiatura said:


> The (un)predictability in look-then-read and in listen-then-spell determine orthographic depth.



And for the latter, it seems that Italian orthography is more easy than the Portuguese and Spanish ones.


----------



## Penyafort

Villeggiatura said:


> I'll only explore three areas where three serious contenders outscore one another, please expand.
> 
> 1. Proparoxytones
> with diacritic: basílica ópera pélago (Portuguese) basílica ópera piélago (Portuguese)
> without diacritic: basilica opera pelago (Italian)
> 
> Without prior knowledge of the correct stress, proparoxytones can be mistaken for paroxytones in the Italian orthography because the stress defaults on the penult.
> Thus, Portuguese and Spanish outscore Italian.



In *Catalan*, as in Portuguese and Spanish, all proparoxytones are marked (à, è/é, í, ò/ó, ú):
_llàgrima, època, feréstega, basílica, òpera, góndola, esdrúixola
Himàlaia, Hèlsinki, Sicília, Colòmbia_, etc.​
Words ending in unstressed -ia, such as _Sicília _and _Colòmbia _above (or _dàlia, Natàlia, Grècia, Alícia, història, Glòria_, etc), are considered proparoxytone in Catalan (and in Portuguese, I think), but paroxytone in Spanish, which consequently does not use a diacritic for them.



Villeggiatura said:


> 2. Oxytones with diphthongs at the end
> with diacritic: dinastía geografía melodía (Sp.)
> without diacritic: dinastia geografia melodia (P.& I.)
> 
> Without prior knowledge of the correct stress, such oxytones can be mistaken for paroxytones in the Italian orthography for the same reason; however, they can't be mistaken for paroxytones in the Portuguese orthography because every paroxytone takes an accent mark (likedicionário, pronúncia).
> Thus, Portuguese and Spanish outscore Italian.



That is exactly what happens with Catalan too. No diacritic in _dinastia, geografia, melodia_, etc., because the stress is not on any of the a's. Catalan spelling looks somewhat close to the Portuguese, generally speaking, only that we make much use of apostrophe and diaeresis but don't use the circumflex at all.



Villeggiatura said:


> 3. Psilosis
> Initial _H_'s are largely preserved in Portuguese and Spanish orthographies (like historia, heroe, heroi) but phonemically irrelevant, same for the intervocalic _h'_s in Spanish orthography (like cohorte, vehículo, and I think intervocalic _h_s are mostly obsolete in Portuguese); in constrast, initial and intervocalic _h_s have disappeared almost completely in Italian orthography.
> So Italian scores the highest here, Spanish scores the lowest, Portuguese somewhere in between.



Catalan, unlike its Occitan sister and Italian but like the rest, also preserves written h's, more than Spanish when it comes to etymology, so differences may appear between both: 
_Helena/Elena, hivern/invierno, ham/anzuelo, hendecasíl·lab/endecasílabo, hissar/izar, subhasta/subasta, filharmònica/filarmónica, cacauet/cacahuete, orfe/huérfano, orxata/horchata, ou/huevo, os/hueso, Osca/Huesca_, etc.​
(Note: In all honesty, I will always be surprised at the elimination of h's in Italian. For someone who appreciates etymology like me, seeing, say, _omosessuale, eterosessuale, esagono, orticoltura... _without an h is somewhat funny.)

Catalan also uses h to represent the aspirated sound in _aha, ha ha ha, ehem_, for which Spanish writes a _jota _(_ajá, ja ja ja, ejem_).



Villeggiatura said:


> Having peculiar spellings is fine as longs as they're phonemically representative and consistent (like _gli_ in Italian, _nh_ in Portuguese), their being butchered by foreigners is irrelevant, so is the phonemically unrepresentative and inconsistent spellings being correctly pronounced by native speakers.



One of the inconsistencies in Catalan, much debated back on the day, was the use of LL for the palatal lateral when NY is used for the palatal nasal. That forced the appearance of a new peculiarity for the double l sound, the so-called geminated l (L·L, l·l), very common (col·legi, intel·ligent, aquarel·la) even when speakers tend to say a simple l.

Problems may also arise with the use of S, C, SS and Ç for the voiceless s, the use of G/J and TG/TJ for fricatives and affricates, the use of IG and TX for /tʃ/ at the end of words, the writing of a silent -R at the end of words, as well as with B and V wherever they've merged into one sound.


----------



## merquiades

The elimination of H in Italian is also something I will never understand.  Knowing etymology it seems wrong, even tragic to see Eroe, Armonia, Arpa, Uomo, Oggi, Ostile, Ospedale, Orribile.....


----------



## Nino83

If we wanted to safeguard etymology Spanish should write masculo instead of macho. This argument doesn't make sense. .


----------



## Villeggiatura

(Visual) Aesthetics of spelling, a very interesting topic.
I do have my own, but not necessarily etymology-oriented.


----------



## Hulalessar

Nino83 said:


> If we wanted to safeguard etymology Spanish should write masculo instead of macho. This argument doesn't make sense. .



The etymological elements of Spanish orthography are very limited. Etymology is not allowed to interfere with the simple rules that predictably assign values to letters (other than <x>) or require phonemes to be represented in a particular and very limited number of ways.

Not always consistently:

· Where Latin, Greek, Germanic or Arabic has /h/ or where Latin /f/ became /h/ a written <h> has been preserved.

· <b> and <v> represent the same sound and etymology decides which is used.

The elimination of <h> and <b> or <v> would not worry native Spanish speakers anymore than the (almost complete) absence of <h> worries native Italian speakers.

I am not sure there is an outright winner between Spanish and Italian in any competition as to which has the shallowest orthography.


----------



## Angelo di fuoco

Spanish has canta*b*a (etymological), whereas Catalan and Italian have canta*v*a, which is unethymological, but at least in Italian it's consistent with the pronunciation, whereas in Catalan that's not true for all dialects. It both Spanish and Catalan it could be otherwise, given the betacism of both languages (OK, I know that some Catalan dialects distinguish b and v in pronunciation).
Spanish has viga and Catalan has biga. Who's right there?


----------



## Penyafort

Nino83 said:


> If we wanted to safeguard etymology Spanish should write masculo instead of macho. This argument doesn't make sense. .



Why? Writing _másculo _for _macho _(or for _maschio_) wouldn't make sense at all.



Villeggiatura said:


> (Visual) Aesthetics of spelling, a very interesting topic.
> I do have my own, but not necessarily etymology-oriented.



I agree that it is mostly an aesthetic thing, but the h is useful as a diacritic in many cases. In fact, even Italian uses h as a diacritic in a few words.

The etymological thing is a different issue. What I meant by seeing things like _omosessuale _without h is that it looks funny to me because it etymologically means 'shoulder-lover' (compare prefixes homo- and omo-). We could say the same thing about Spanish _sicología _instead of _psicología_. Both forms are accepted, but I prefer to write it always with the p even if it is not pronounced, because without a p, it etymologically means 'study of figs' instead of 'study of souls' (prefixes psycho- and syko-). All in all an aesthetic thing, no doubt, as most people are unaware of etymology.



Angelo di fuoco said:


> Spanish has canta*b*a (etymological), whereas Catalan and Italian have canta*v*a, which is unethymological, but at least in Italian it's consistent with the pronunciation, whereas in Catalan that's not true for all dialects. It both Spanish and Catalan it could be otherwise, given the betacism of both languages (OK, I know that some Catalan dialects distinguish b and v in pronunciation).



Old Castilian used v's for the imperfect tense too. In that sense, medieval Spanish was just like the rest of Romance languages (_cavallo_, _haver_, _provar_, etc.)

Betacism in Catalan, specially in Eastern Catalan, is a relatively modern thing, probably due to Spanish and Occitan influence. In Valencia, I guess it had much to do with the Aragonese influence there.



Angelo di fuoco said:


> Spanish has viga and Catalan has biga. Who's right there?



Catalan, as it comes from Latin BIGA. 

Generally speaking, even if Spanish had a process of relatinization in the spelling, Catalan is closer to etymology, except in words traditionally pronounced with a v (cavall, llavi, etc.). I wrote some examples above where Catalan h's are more etymological -Spanish has several unetymological h's. It also happens with b's and v's. Spanish uses unetymological b/v in 'grandfather' _a*b*uelo _(Catalan _avi_, Portuguese _avô_) < AV(IOL)US, 'forget' _ol*v*idar _(Cat. _oblidar_) < OBLITARE, 'mobile' _mó*v*il_ (Cat. mòbil) < MOBILIS, 'rubbish' _*b*asura _< VERSURA, 'sweep' _*b*arrer _< VERRERE, etc.

There are also a few examples in which Spanish is more etymological than Catalan, though: 'sheath' Ct. _beina _/ Sp. _vaina _(< VAGINA), 'overturn' Ct. _bolcar _/ Sp. _volcar _(< *VOLVICARE), 'man/male' Ct. _baró _/ Sp. _varón _(< VARONE), 'bald' Ct. _calb, calba_ / Sp._ calvo, calva_ (< CALVUS CALVA), etc.


----------



## Nino83

Penyafort said:


> Why? Writing _másculo _for _macho _(or for _maschio_) wouldn't make sense at all.



Someone was complaining of the fact that in Italian we write _eroe_ instead of _heroe_. 
If /h/ is not pronounced in Italian, it doesn't make sense to write it for mere etymological reasons (it would be like to write _masculo_ instead of _macho_).


----------



## francisgranada

Nino83 said:


> ... If /h/ is not pronounced in Italian, it doesn't make sense to write it for mere etymological reasons (it would be like to write _masculo_ instead of _macho_).


Ciao Nino, credo che io ti capisca benissimo, nevertheless it is not the same case ... We can state a simple rule that _the letter "h" _is _never pronounced_ (both in Italian and Spanish), but we hardly state the rule that _"scul" has to be  pronounced the same way as "ch"_ in Spanish without violating other "rules" .... Otherwise _músculo _should be pronounced as _mucho and mayúsculo_ as_ mayucho _(for example) ...

I think that the question/solution is a certain _practical/optimal equilibrium_ between  the strictly etymological spelling versus the strictly phonetic spelling (both impossible in a _really perfect _way).


----------



## Nino83

Ciao, Francis, obviously I was exaggerating.  
The fact is that one says he wants a bit more etymology and silent letter after silent letter you end up having an orthography similar to that of English or of French.  

One can't have at the same time both respect for etymology and a phonetic orthography.


----------



## sotos

Nobody mentioned yet the irrelevance of the French accents to the actual pronounciation. 

Another area for contest is the double R in the middle of words. Why "diarrhea" and not "diar(h)ea"?


----------



## apmoy70

sotos said:


> Nobody mentioned yet the irrelevance of the French accents to the actual pronounciation.
> 
> Another area for contest is the double R in the middle of words. Why "diarrhea" and not "diar(h)ea"?


Ιt's the transliteration of -ῤῥ- διάῤῥοια.
Of course no-one seems to be bothered by the fact that the first half of the word retains the Classical pronunciation of ῥ as /rh/ while the other half with the diphthong of -oι- follows the Byzantine pronunciation, the transliteration of diarrhoea [dīəˈroiə] is more consistent with the Classical Greek pronunciation


----------



## Nino83

sotos said:


> Nobody mentioned yet the irrelevance of the French accents to the actual pronounciation.



They are useful:
- in unstressed syllables in order to know if /e/ is _muette/caduc_ or closed: t*é*l*é*foner  [t*e*l*e*foˈne] app*e*ler [ap*ə*ˈle] [apˈle]
- in stressed syllables in order to know if /e/ is open or closed: compl*è*te [cõpˈl*ɛ*t] all*é* [aˈl*e*]

Yes, the circumflex accent is used:
- for homophones: sur/sûr, du/dû and so on
- for etymological reasons: château < ca*s*tello

Trema is used to distinguish between a single vowel and a hiatus: f*ai*t [f*ɛ*] na*ï*f [naˈ*i*f]

In French there are very few proparoxytones, so the accent is often on the last/penultimate syllable, for example _àbito_ (Italian) and _habìt_ (French).


----------



## Nino83

From pronunciation to the written language I'd say that there are these ambiguities:

Spanish:
- equal pronunciation of /b/ and /v/
- non pronounced initial /h/ from Latin /f/: es. hoja < folium/folia
- equal pronunciation (except some rural zones in Castilla y León and in Andean Spanish) of /ll/ and /y/: calló/cayó
- confusion between /s/ and /ce, ci, z/ in American and Andalusian Spanish
- aspiration of implosive /s/ and /j/ in Andalusian and Carribean Spanish: mimo < mi[plain][s]mo[/plain], hoa < ho[x]a, but in Spanishjotais always pre-vocalic while implosive /s/ is pre-consonantic, but it could lead to some mispelling, mijmo instead of mismo

Portuguese:
- equal pronunciatio between /s, -ss-/ and /ce, ci, ça, ço, çu/:*c*inta/*s*inta, mo*ç*a/mo*ss*a
- equal pronunciation (sometimes) of /-ch-/ and /-x-/: ma*ch*o/me*x*e
- equal pronunciation of /-s-/ and /-z-/: me*s*a/bele*z*a

Italian:
Those words ending with unstressed -cia in the plural can end in -cie or -ce but they are pronounced in the same manner, [ʧa] in the singular and [ʧe] in the plural. Once, the presence or the absence of the "i" was etymological, so there wasn't any rule in order to know how to spell these words.
In 1949 a rule was introduced (by Migliorini): if, before /c/ there is a consonant, the plural is -ce while if there is a vowel, the plural is -cie, so now it is possible to know when there is the "i" in the plural (the same rule applies to those words ending with unstressed -gia).
Example: provì*n*cia/provì*n*ce, cam*ì*cia/cam*ì*cie

Some grammarians, like Serianni, say that a reform is needed and we should write these plurals like they are pronounced, i.e -ce. 

Now, due to the fact that in Spanish and Portuguese there are a lot of phonological innovations, I think these spelling ambiguities will remain, because if orthography were to follow pronunciation there would be, due to these mergers, an high number of homographs.


----------



## Hulalessar

Nino83 said:


> From pronunciation to the written language I'd say that there are these ambiguities:
> 
> Spanish:
> - equal pronunciation of /b/ and /v/
> - non pronounced initial /h/ from Latin /f/: es. hoja < folium/folia
> - equal pronunciation (except some rural zones in Castilla y León and in Andean Spanish) of /ll/ and /y/: calló/cayó
> - confusion between /s/ and /ce, ci, z/ in American and Andalusian Spanish
> - aspiration of implosive /s/ and /j/ in Andalusian and Carribean Spanish: mimo < mi[plain][s]mo[/plain]_, hoa < ho[x]a _, but in Spanish_jota_is always pre-vocalic while implosive /s/ is pre-consonantic, but it could lead to some mispelling, _mijmo _instead of _mismo_




The first two points always apply. They show that Spanish orthography is not 100% phonemic. (For the record all h's are silent.)

When it comes to the remaining points if we are asking how phonemic a script is we have, I think, to restrict ourselves to considering one variety. That variety will inevitably be whichever is considered standard. Accordingly, even if only a minority of Spanish speakers distinguish between _calló _and _cayó_, we still have to accept that the standard makes the distinction.

We can get into a bit of difficulty when talking about standards, especially when a pluricentric language like Spanish is involved. In the case of Spanish all varieties have the same orthography, but different phonologies. Where there has been a merger of phonemes in a variety we can, if so minded, say that the orthography is less phonemic with respect to that variety. The point is though that if an orthography sets out to be phomemic it can only be useful if based on one variety. These considerations do though only apply to languages with a shallow orthography. With a language like English the polyvalence is so complex (over 1000 ways of writing 40 phonemes) that any question of what sounds a grapheme or combination of graphemes represents rather pales into insignificance. It can be noted in passing that the minor differences between American and British spelling do not in fact represent differences in pronunciation.

We can take another angle by remembering that whilst speech and writing are connected, they are two distinct things. We have the thing, say "cup". This is represented in speech by /kʌp/. We take it that when we write <cup> we are representing /kʌp/, that it that /kʌp/ is a representation of a representation. And of course it is because every alphabetic writing system has as its basic premise that its graphemes represent phonemes. However, we can also say that /kʌp/ directly represents "cup" in the same way that a picture of a cup represents "cup" even if no picture can ever represent speech. So, whilst writing can be articulated the articulation of writing is not the same thing as speech. It is perfectly possible to learn a (written) language without knowing how it is articulated and if you do any question of whether the orthography is deep or shallow is irrelevant. Every orthography can be regarded as a sort of platonic realm which includes all possible pronunciations of every word.


----------

