# Similiarity between languages (in percentage)



## Kraus

Hello! I'd like to know if there are some sites which provide statistical data about the similarity - relatively to words - between Neo-Latin and/or Indo-European languages (or other family languages), i.e. French-Italian 89%, Spanish-Portuguese 90%, Swedish - Danish 92% and so on.  

Thanks in advance for your help!


----------



## robbie_SWE

Buona sera Kraus! 

I don't know if it helps, but this site might have some answers. 

The article on Romanian shows that the Romanian language is closest to Italian (77%), French (75%) and Sardinian (74%). 

 robbie


----------



## sokol

No I don't, but I wouldn't post here anything if I couldn't contribute to your question: thing is that *it is almost impossible to attribute percentages of similarity* - or, if one does so this rather could not be taken seriously.

And why is this?

Take *Nordic Languages:* Norvegian, Swedish and Danish are similar enough that it is possible to talk to each other if each uses his or her own dialect of the respective language: so if a Norvegian, a Swede and a Dane meet (and not necessarily from border regions of each country), it is possible for them to communicate to a degree if all three of them show willingness.

But then take *German,* or *Italian:* both are considered *one *language respectively, but what would you get:
- imagine someone Swiss German from Wallis, a Northern German from Holstein and an Austrian from Southeastern Styria meet and talk to each other in their own dialect; even if all three of them would be willing to communicate they would fail completely: the only way for these three persons to communicate would be either German standard language (or an approximation to it) or another language (English, for example: probably English would even be the easier solution here ...)
- and as you are Italian, I guess you could imagine a similar situation for Italy, probably three people from Venezia, Roma and Sardegna respectively (or take Sicily as Sardic could be considered a separate language): would they have any chance of communicating if speaking their dialect?

Another example, extreme in the other direction, would be the *Slavic languages *with so many different standard languages.
My guess would be that even Slovenes and Russians would be able to communicate if speaking their own dialect, if only they would be prepared to take the pains (certainly some misunderstandings would occur, but in principle they'd get by, I think).
Also there exists some piece of Austrian literature (don't remember the author anymore, a Carinthian women whose mother tongue was Slovenian, besides German) where some Slovene women from Carinthia living in Vienna was able to communicate with Russian soldiers during the occupation after World War II.

So, what percentages one should give here?

Languages like Italian and German couldn't even be considered 'one' language if one applies percentages, and languages like Letzeburgisch (the former German dialect of Luxemburg which developped into a standard language) is so homogenuous (as it is based on an extremely small dialect region) that it still is vastly different from the German language as a whole while Letzegurgisch still is hardly different at all from bordering 'German' dialects (= from the dialects just west of the border of Luxemburg).


Well, Robbie, you were faster:


robbie_SWE said:


> The article on Romanian shows that the Romanian language is closest to Italian (77%), French (75%) and Sardinian (74%).


And before you think about phrasing an objection: what I said above does not necessarily mean that any given percentages like these are not correct, I will add the following:
They could be very well 'correct' in a scientific sence insofar as e. g. they could show percentages of similarities in the lexicon of both languages (77% similarity between Romanian and Italian could for example mean that this is a percentage of similarity based on a basic vocabulary of probably 2.000 words and 100 grammatical features).
However, this percentages (any percentages, really) never could show the whole picture.


----------



## robbie_SWE

sokol said:


> No I don't, but I wouldn't post here anything if I couldn't contribute to your question: thing is that *it is almost impossible to attribute percentages of similarity* - or, if one does so this rather could not be taken seriously.
> 
> *[snipped by mod]*


 
I fully agree with your statement! I don't think that these surveys contribute to a better understanding of languages and I doubt that they are accurate. I only posted the numbers available for Romanian, excluding my personal opinions. 

The problem is as you stated in your post, that they only use a limited amount of words when they do these surveys and they don't give the whole picture.

 robbie


----------



## sokol

robbie_SWE said:


> I only posted the numbers available for Romanian, excluding my personal opinions.



Well yes, wasn't meant as a critique - and after all it seems we are of one mind concerning this question.


----------



## Outsider

One thing I've never been able to understand (but then I've never reasearched it) is how exactly one compares two lexicons (lexica?) Just by finding how many letters the words have in common in the two languages? (_porta - porte - puerta, uomo - homme - hombre_) But different languages use different spelling conventions! Or is it by comparing common sounds? Then surely Spanish should be more similar to Italian than to Portuguese! And which dialect do you pick for the comparison, anyway?...


----------



## Kraus

robbie_SWE said:


> Buona sera Kraus!
> 
> I don't know if it helps, but this site might have some answers.
> 
> The article on Romanian shows that the Romanian language is closest to Italian (77%), French (75%) and Sardinian (74%).
> 
> robbie


Tack så mycket Robbie!  I'm aware that these stats are not the Bible, however I was interested in them out of curiosity... Thanks again for the link, it's very useful


----------



## JGreco

I have always wondered myself on how they consider those percentage numbers? Do you compare shared gramatical structures, root words, ancestry deriviation, deriviation from Latin, mutual intelligibility, uninteligibility. There are so many ways to compare because you could basically say that when comparing mutual intelligibility then Spanish and Italian are the closest of Romance languages but if you make a phonetic comparison then Spanish and Portuguese or Portuguese and Galician win.


----------



## Tolovaj_Mataj

sokol said:


> Another example, extreme in the other direction, would be the *Slavic languages *with so many different standard languages.
> My guess would be that even Slovenes and Russians would be able to communicate if speaking their own dialect, if only they would be prepared to take the pains (certainly some misunderstandings would occur, but in principle they'd get by, I think).


I don't agree with your example. Do you want to say that Russian dialects and Slovene dialects are closer than German dialects among themselves? 
Believe me there are quite some differences even among the dialects inside Slovene and, here I'm talking from my own experience, that me being from Ljubljana had a hard time understanding the dialect of Prekmurje. I understood the context, but was unable to get the meaning of each word. With a dialect from Resija it's even harder - even the context is noncomprehansible for me. 
On the other side I cannot imagine somebody from Bled or Jesenice talking to an average Russian. I bet that Russian person would not even consider the speech of Gorenjska as being Slavic. 
And finally: my understanding of Russian is limited to the recognition of the grammatical forms. Words? Yes, some sound equal, some sound similar, but majority mean nothing to me. Sorry, Sokol.


----------



## Kraus

Outsider said:


> One thing I've never been able to understand (but then I've never reasearched it) is how exactly one compares two lexicons (lexica?) Just by finding how many letters the words have in common in the two languages? (_porta - porte - puerta, uomo - homme - hombre_) But different languages use different spelling conventions! Or is it by comparing common sounds? Then surely Spanish should be more similar to Italian than to Portuguese! And which dialect do you pick for the comparison, anyway?...


Actually that's not an easy question. I think they decide there is similarity if the words involved have the same origin (so "hombre" is not so far from "uomo" or "homme", as well as "работать" from "arbeiten"). It's a questionable method, but IMHO it's the only (more or less) scientific one...


----------



## Outsider

So, pronunciation is completely ignored? Yet pronunciation is crucial to determine whether, or how much, two languages are mutually intelligible!

And how do they control for words that are cognates, but whose meaning has changed, or that are used with different frequency?

It seems like a scientific, but very limited way of making comparisons.


----------



## Athaulf

Outsider said:


> So, pronunciation is completely ignored? Yet pronunciation is crucial to determine whether, or how much, two languages are mutually intelligible!


 
I don't think the purpose of these studies is to predict mutual intelligibility, but merely to provide some raw statistical data that might be useful for other purposes (or at least I hope so). Pronunciation is certainly far more important for mutual intelligibility than the raw percentage of cognates, except when it comes to reading a related language whose spelling minimizes the differences in pronunciation (an effect drastically apparent if you speak some Spanish and then try to understand both spoken and written Portuguese and Italian). 



> And how do they control for words that are cognates, but whose meaning has changed, or that are used with different frequency?


This is definitely a huge problem with this methodology. To see how problematic it really is, one just needs to imagine it applied to two very distantly related languages, like for example a Slavic and a Germanic language. Based on how liberal criteria one applies for admitting "relevant" cognate words, the percentage of "shared vocabulary" might turn out to be anywhere from zero to a respectable two-digit percentage. 

Another problem is whether one should count only true cognates or also common borrowings, but that depends on what one wants to do with these data.



Tolovaj_Mataj said:


> I don't agree with your example. Do you want to say that Russian dialects and Slovene dialects are closer than German dialects among themselves?
> Believe me there are quite some differences even among the dialects inside Slovene and, here I'm talking from my own experience, that me being from Ljubljana had a hard time understanding the dialect of Prekmurje. I understood the context, but was unable to get the meaning of each word. With a dialect from Resija it's even harder - even the context is noncomprehansible for me.
> On the other side I cannot imagine somebody from Bled or Jesenice talking to an average Russian. I bet that Russian person would not even consider the speech of Gorenjska as being Slavic.


 
Actually, in my experience, this is a very curious feature of Slavic languages. South Slavic dialects can become very different across distances as small as a few dozen kilometers, but when you move more than a thousand kilometers north/northeast, you may find that the local language isn't anywhere so drastically more different as you might expect based on distance. In other words, the differences definitely don't increase linearly with distance -- they rise sharply at first, but much more slowly as you go further.



> And finally: my understanding of Russian is limited to the recognition of the grammatical forms. Words? Yes, some sound equal, some sound similar, but majority mean nothing to me. Sorry, Sokol.


When it comes to Slovenian or Croatian vs. Russian, I'd say that similarities are indeed greater than between the most distant German dialects. Sure, you won't be able to establish much more than some rudimentary pidgin-level communication, and even that will require lots of effort -- slow speech and careful listening, rephrasing, guessing, pointing, etc. However, some intelligibility definitely exists. From what I've heard about the differences between distant German dialects, they seem to be really greater, perhaps as great as between a South Slavic language and Polish. (And even in Polish, I can understand some bits and pieces.)

However, this discussion should probably be continued in the Slavic forum (in which there has already been a mutual intelligibility thread).


----------



## modus.irrealis

Kraus, you might want to take a look at the Swadesh list. It has often been used in attempts to calculate similarity and I have seen percentages based on it. A very quick google search found things like this but there should be more. Now, I don't think it's all that valuable a scientific tool but it should at least lead you to some numbers.


----------



## sokol

Tolovaj_Mataj said:


> I don't agree with your example. Do you want to say that Russian dialects and Slovene dialects are closer than German dialects among themselves?


Actually a German philologist claimed that exactly this were the case: Claus Jürgen Hutterer in his _Die germanischen Sprachen_ (1975/1990; my translation into English): 'In fact [or probably even: _it is an undisputed fact],_ the differencies between the individual Slavic languages are smaller than the ones between the different dialects of the German language*).' _('Es ist eine Tatsache, daß z. B. die Unterschiede zwischen den slawischen Einzelsprachen geringfügiger sind als jene zwischen den einzelnen Mundarten der deutschen Nationalsprache.')_ p. 369
(*) This I didn't translate as 'national language' because Hutterer's use of terms is rather outdated - the use of 'Nationalsprache' in context with German gives the presupposition of Pangermanism.)

And although I do not agree with everything Hutterer has to say (and not even with this statement), he has a point here.
As for the example of an Austrian woman communicating with Russian soldiers, I think it _could _have been Ingeborg Bachmann - but I am not sure.

As for me: I did learn Slovenian right from scratch and nevertheless, afterwards, found it rather easy to make sense of written Russian even though I never learned it; equally I've also read Bulgarian, Czech and Polish - certainly with difficulty and not without phrasebook, nevertheless I was able to catch the intended meaning. Slavic languages really have *very *much in common with each other.



Tolovaj_Mataj said:


> Believe me there are quite some differences even among the dialects inside Slovene and, here I'm talking from my own experience, that me being from Ljubljana had a hard time understanding the dialect of Prekmurje.


Well yes, I was exposed to Slovenian dialects for two moths and I had especially problems understanding a guy from Škofja Loka and another one from Primorje.
And with Carinthian Slovene included it becomes even worse. Nevertheless, communication is possible if one is willing.
With Prekmurci, however, the problem is sometimes that Ljubljančani seem to not like the dialect at all - the condition of willingness is not fullfilled completely. 

Difficulties of understanding each other from different dialect regions of course happen in Austria (inside Austria!) on a rather similar level to the one of Slovenia.
It gets much, much worse if one counts in the rest of the German speaking area.
These (= the Slovenian, and the ones _inside _Austria) however are rather 'minor' communication problems really.



Athaulf said:


> In other words, the differences definitely don't increase linearly with distance -- they rise sharply at first, but much more slowly as you go further.



Exactly, and many of the differencies are phonetical and phonological - there always remains a rather strict set of grammatical rules that aren't touched at all, or only slightly: e. g. declension and conjugation, and especially the verbal aspect. (And even though Bulgarian and Macedonian stand out with the loss of declension and retaining of classical tempus system, still much similarities remain.)

But yes, to discuss mutual intelligibility of Slavic languages here would be off topic, so let's not make the life of our mods harder than necessary.


----------



## robbie_SWE

modus.irrealis said:


> Kraus, you might want to take a look at the Swadesh list. It has often been used in attempts to calculate similarity and I have seen percentages based on it. A very quick google search found things like this but there should be more. Now, I don't think it's all that valuable a scientific tool but it should at least lead you to some numbers.


 
I'm sorry to say it Modus.Irrealis, though the thought is nice , the site you linked to in your post is filled with mistakes in the Romanian column making it dreadfully inaccurate. 

But I do think that the Swadesh lists are more interesting, because you can really compare the most basic elements of a language on sight (no math involved).

 robbie


----------



## modus.irrealis

robbie_SWE said:


> I'm sorry to say it Modus.Irrealis, though the thought is nice , the site you linked to in your post is filled with mistakes in the Romanian column making it dreadfully inaccurate.


To be honest, I didn't look closely at the site, and on second look, it seems to be some kind of school project and their link to their database doesn't seem to work either, so, not the best site, no .


----------



## robbie_SWE

modus.irrealis said:


> To be honest, I didn't look closely at the site, and on second look, it seems to be some kind of school project and their link to their database doesn't seem to work either, so, not the best site, no .


 
No harm done . I actually think that this Wikipedia page would actually be of help and for major Romance languages see this. 

 robbie


----------



## Alijsh

@Kraus: Visit this page and click on any of available languages to see its relevant transparency with some languages. For example, this one is for Italian.


----------



## Kraus

Kheyli mamnunam Alijsh!


----------



## Stiklas

In conclusion, I think the first question to be considered should be "for what purpose is the comparative data collected"

Let me use my own language for example:

Lithuanian is said to have it's origins in Sanskrit, and for those that wish to earn high degrees in Indo-theology it is required to study Lithuanian in it's natural enviroment (i.e. to go to Lithuania). However, that does not mean that it would make it easier for me to learn Sanskrit, as the two languagesa at first glance are very different. This I would call the historical approach, or a method that looks at how the diferent languages spread around the continents. It takes roots of basic archaic words that have remained unchanged for thousands of years, such as "mama" (mom) and "namas" (house) and "saule" (sun).

On contrary, when I was learning Italian, I found my knowledge of Lithuanian grammar a great advantage that other English-only students did not have. Even though Italian is not usualy quoted as a language similar to Lithuanian by the first method, it is simular by this second one: how similar is the actual current/standard grammar, synthax and pronounciation of certain letter combinations. This method looks at the language more as a whole, and less historically. 

Method one is usefull for historians, archeologist and people tryng to understand certain cultures. Method two is usefull for lingvists, students and tourists. I'm sure there are other methods of comparision.

Did I understand this right? Please correct me if I have mistaken something...


----------



## Athaulf

Stiklas said:


> Lithuanian is said to have it's origins in Sanskrit, and for those that wish to earn high degrees in Indo-theology it is required to study Lithuanian in it's natural enviroment (i.e. to go to Lithuania).



Sorry to disappoint , but Lithuanian is most definitely _not_ descended from Sanskrit, even though these languages are very distantly related, and share Proto-Indo-European as their common ancestor. Lithuanian belongs to the Baltic branch of IE languages, while Sanskrit belongs to the Indo-Iranian branch. You can find a more general diagram of relationships between IE languages, for example, on this link. As you can see, Lithuanian is no more closely related to Sanskrit than any other modern Baltic, Germanic, Romance, or Slavic language, although it did preserve some ancient IE grammatic features better than most, which makes it more similar in certain regards to Latin, Sanskrit, and other ancient IE languages.

I'm also really curious about this Lithuanian requirement for "Indo-theology", but I'd rather not test the nerves of the moderators with off-topic inquiries.


----------



## Stiklas

Athaulf,

Please take a closer look at theese pages...
http://postilla.mch.mii.lt/Kalba/baltai.en.htm
http://eeuropeanhistory.suite101.com/article.cfm/history_of_lithuanian_language
http://1000petals.wordpress.com/2007/08/30/the-mysterious-beauty-of-lithuanian-language/
and this one

In regards to Indo-theology, I believe you have to draw the connecting line between understanding a language to understanding sacred scriptures written in that language to understanding the religion.


----------



## Athaulf

Stiklas said:


> Please take a closer look at theese pages...
> http://postilla.mch.mii.lt/Kalba/baltai.en.htm
> http://eeuropeanhistory.suite101.com/article.cfm/history_of_lithuanian_language
> http://1000petals.wordpress.com/2007/08/30/the-mysterious-beauty-of-lithuanian-language/
> and this one



Which talk about _similarities_ between Lithuanian and Sanskrit, without claiming that Lithuanian has "its origins in Sanskrit" or anything similar. Any IE language is related to Sanskrit, however distantly, and will have at least some similarities with it. Those in Lithuanian might be more plainly obvious than in some other IE languages, and they might be particularly interesting for some technical reasons, but nothing more than that. Those bombastic statements by Meillet and other old linguists about Lithuanian being greatly similar to Sanskrit or Proto-IE should be understood as greatly exaggerated hyperbole and taken with a huge grain of salt. Their goal was probably to arouse scientific interest in what most Western academics back then saw (ignorantly and unfairly) as faraway peasant languages unworthy of serious academic study. 

Otherwise, from what I've seen at a glance, the stuff written on these pages is more or less correct, except for the claims that Lithuanian is somehow more "archaic" or "ancient" than other modern IE (or any other, for that matter) languages. Such claims have no scientific basis for any living natural language. Discussion of such nationalist romanticism properly belongs to the recently opened thread titled "Etymology, nationalism, religion..."


----------



## Stiklas

Ok, I agree that maybe the original claim was a bit exaggerated, maybe it was due to my teachers being a bit old fashioned or a little too patriotic, probably as you have pointed out due to our language being unfairly ignored in the recent past...

I was just quoting what any present day Lithuanian would quote but this thread made me look more closely into the whole thing myself thank you...


----------



## Frank06

Hi,


Stiklas said:


> Lithuanian is said to have it's origins in Sanskrit, and for those that wish to earn high degrees in Indo-theology it is required to study Lithuanian in it's natural enviroment (i.e. to go to Lithuania).


Who says something like that? Not one single serious linguist would make such a claim! At best, this seems to be a part of pseudo-linguistic folklore...

Lithuanian can be said to be a very conservative IE language, i.e. conserving quite some archaic traits, but that is something completely different than saying that it has "it's origins in Sanskrit".
See for example here. But anyway, this probably deserves a thread on its own.



> On contrary, when I was learning Italian, I found my knowledge of Lithuanian grammar a great advantage that other English-only students did not have. Even though Italian is not usualy quoted as a language similar to Lithuanian by the first method, it is simular by this second one: how similar is the actual current/standard grammar, synthax and pronounciation of certain letter combinations. This method looks at the language more as a whole, and less historically.


I fail to spot the method...



> Method one is usefull for historians, archeologist and people tryng to understand certain cultures. Method two is usefull for lingvists, students and tourists.


How do you mean?

Groetjes,

Frank


----------



## Stiklas

Nevermind- I guess i took to literaly the "pseudo-linguistic folklore..." that I have been taught since a child, without checking more carefully into it... I'll be more carefull next time...

I resign from this conversation- allthough this has been a very interesting thread, and I do appreaciate learning new things. That's what WR is for, right? 

Visogero,
-A-


----------



## Athaulf

Frank06 said:


> Hi,
> 
> 
> 
> *Stiklas:*
> On contrary, when I was learning Italian, I found my knowledge of Lithuanian grammar a great advantage that other English-only students did not have. Even though Italian is not usualy quoted as a language similar to Lithuanian by the first method, it is simular by this second one: how similar is the actual current/standard grammar, synthax and pronounciation of certain letter combinations. This method looks at the language more as a whole, and less historically.
> 
> 
> 
> I fail to spot the method...
Click to expand...


I suppose the point is that when it comes to similarities between languages, linguists and language learners are often interested in quite different things. The sorts of similarities that language learners find useful in practice often don't reflect the actual genetic relationships between languages. To use an example from languages I'm familiar with, if you're a Spanish speaker, your knowledge of Spanish will likely be a much more useful asset for learning English than for learning Latin, even though from the point of view of historical linguistics, Spanish is a relatively recent offshoot of Latin, and only very distantly related to English.

The above example with Lithuanian and Sanskrit vs. Italian is of course factually untrue, as we've already mentioned.


----------



## MarX

Kraus said:


> Hello! I'd like to know if there are some sites which provide statistical data about the similarity - relatively to words - between Neo-Latin and/or Indo-European languages (or other family languages), i.e. French-Italian 89%, Spanish-Portuguese 90%, Swedish - Danish 92% and so on.
> 
> Thanks in advance for your help!


Hi!

I'd say that the similarity between Indonesian and Malaysian would be somewhere around 98%.
I wanted to write less, but if for Swedish - Danish it's 92%, then Indonesian - Malaysian should reach 98%.

Salam,


MarX


----------



## boyo

It really surprised me, Spanish speaker have it easier to learn English than Italian. There are obvious similarities between French and English thanks to the Norman invasion and  dual layer of English, that has been formed. French is a roman language as is Spanish but so is Italian.


----------

