# Genetic classification



## Frank06

*Partially split off and partially inspired by this thread (and dozens of other threads). The examples given here (Persian-Arabic, Germanic languages) are just examples. Members who want to discuss those specific topics in detail are asked, no, are urged to search the archives of WR first and continue the discussion in the respective threads.*
*Frank*
*Moderator EHL*


Hi,
A lot of threads and posts in EHL and other WR sub-forums are concerned with "connections". More than a few of those posts boil down to the question "What's the connection between language A and B?".
It's very difficult to reply to this kind of questions, since "connection" can allmost mean anything one wants it to mean. 
Another classic question involves "similarity", yet another term which excells in vagueness. 
Just one extreme example: I once went to a supermarket with a few Chinese students who had just arrived in Belgium, and to them, the Dutch, English, Italian, Turkish and even the Russian texts on the packages of the products looked _incredibly_ similar.

Idem dito for the word "relation". "What's the 'relation' between Persian and Arabic' has been asked more than one time in the long history of WR, and it's not always clear what is meant by "relation": it's often a connection, sometimes (a) similarity, or sometimes a _genetic _relation (see below).

This leads me sometimes to wonder where this pre-occupation, and in some cases, obsession, with relations, similarities and connections comes from. But that's not the topic of my post.

Almost without exception, this kind of questions lead to a long, sometimes very long discussions on _language classification_ and _genetic relations_, the practical usage of this kind of classifications, or the lack thereof. 

In the rest of this post I will use "(language) classification" and "relation(ship)s" as shorthand for respectively "genetic classification" and "genetic relations". I won't use the word "connection", since it doesn't bear any similarity to a useful linguistic term. 
A few parts of this post will probably sound incredibly snotty, but, oh well, that shouldn't come as too big a surprise :-D.



> Well, who cares about the classifications.


I think it's difficult not to classify and I wouldn't be surprised if humans were hard-wired for it. Apart from that, a lot of people did (and still do) care about it. 

The Ancient Greek classification system was very simple: they had a very keen sense for the Greek dialects, but basically it was "Greek" on the one hand, "anything else" on the other. No need for the online version of ethnologue.com in those days.

Over the centuries, not only the classification of languages became slightly more complex, but also the reasons why, or rather, the criteria by which to classify languages in different groups.
Lots of scholars in medieval and post-medieval times studied the relation between the Romance languages and Latin, and the Germanic languages (e.g. the study of Gothic in the 17th century in the Teutonic context, though a lot of the knowledge was acquired in relative isolation and hence got a bit "lost").
They all cared and some of the scholars prior to the 19th century, which alas were hidden in the shadow of Sir William Jones, did tremendous jobs. Most of them, however, did concentrate upon words and letters (not sounds).


> As long as 30% of the Persian words is a Arabic words. Languages are words per se.


[For the *context *of this post, see this thread.]
A language is not a collection of words. I don't think that needs further explanation. One does only have to think about the concept of "grammar book" to know why. Rather than: about book concept does grammar have know of one only the think to to why.

I saw numbers higher than 30%, but actually, it doesn't really matter how high the percentage of loan words in a specific language is in order to classify it. 
To put it bluntly: despite being very visible, the lexicon doesn't matter. 
Okay, it does matter, but not _that _much.

I think one can defend the idea that modern comparative historical linguistics started when scholars began to abandon the idea that the lexicon was the prime criterion to classify languages. In this respect, it's both sad and funny to see that it is still the most popular (and often the only) criterion which gets used by many linguists on the fringe, more commony known as pseudo-linguists or full-fledged linguistic nutters.

It took philologists and Sanskritists as Jones to give an enormous boost to historical linguistics for various reasons. Although the importancy of morphology and other grammatical aspects in determing relationships didn't originate with Jones, he's best known for his eloquent and often quoted third annual discourse before the Asiatic Society (1786) in which he proposed the basics of the language family which we now call the Indo-European. And he stressed that "no philologer could examine them all three [Latin, Greek, Sanskrit], without believing them to have sprung from some common source, which, perhaps, no longer exists". A few decades later, the first drafts of Proto-Indo-European where published by other scholars.

Another boost was the (Western) discovery and translation of old Sanskrit writings on grammar and sound systems, which lead quite directly to the study of what we now call phonology and phonetics and to the study of (regular) sound changes (Grimm being one of the first to do so in modern times).

To illustrate all this, I'd like to refer to this short outline on the Germanic language family: "Seven distinct features of Germanic". 
Seven, that's not a lot. I hope it's clear from this short example that the lexicon is not the primary concern of somebody who classifies languages.

The classification is often represented as a tree, the famous language tree model. And it is what it says it is: a model, an outline. Nothing more, nothing less. A model which represents the criteria mentioned above by simple lines connecting simple labels on various levels (e.g. Proto-Indo-European - Germanic - West-Germanic etc.) isn't the most elaborate explanation. But to be honest, one must be very short-sighted to expect more from a model that claims to be nothing more than being a model. It's a bit like complaining that one cannot take an elevator in a very basic blueprint in order to go to the highest store of a sky-scraper to enjoy the panoramic view in the rotating four star restaurant. And yet, that blueprint is very useful.

So, in short; common ancestry, common phonological, morphological and lexical features (and a few things more) *together* are used as criteria to classify a language.
What's not used as a criterion is anything else. That's the weakness of the model, but also its strength. It's simple and basic. 
But the classification, the tree model, should never be interpreted on its own, in isolation, without context. And I guess here things go wrong quite often.


----------



## Frank06

[continued]



> Well, who cares about the classifications.


I cannot help but wonder why it's quite often people who don't really care that are so vocal about it. 
But maybe it's better to wonder first who _doesn't _have to care about it. (It's a rare species in the realms of WR where I normally dwell, but I heard that people who don't care about it exist :-D.)

I cannot imagine that the average language learner who's not particularly interested in linguistic issues but incredibly interested in mastering language X or Y should wonder much about language families and relationships. I don't think it would affect the language skills if one doesn't know that Persian is classified as Southwestern Iranian < Western Iranian < Iranian < Indo-Iranian < Indo-European as opposed to Kurdish, which is Northwestern Iranian < Western Iranian < Iranian < Indo-Iranian < Indo-European. 
Language classification cannot be said to be the primary concern of somebody who learns a language; they have other, more practical and urgent concerns. 

So, who _does _care about it: people interested in linguistics. Simple as that. The tree model is a tool by and for (comparative) linguists of whatever kind, from professional to amateur. And yes, I think one has to learn how to use this tool, how much value to attach to this tool (not too much, but enough) by studying historical comparative linguistics. 
Just the same way it takes a medical _student_ (or a member of The Lennon Sisters) to interpret Dem Dry Bones. 


> But, I think, for a learner, or a non-linguist, the lexicon is more important at the end of the day than the grammar. Languages are classified as belonging to groups of languages with whom the share a "genetic" relationship, yet, the whole issue of "borrowing" (a stupid term, I think) confuses things. So, you can end up with a situation where Farsi is classified as being more related to English, yet, for all practical purposes, especially in the lexis, it is more similar to Arabic.


[This post is taken a bit out of context. For the full *context *of this post, see this thread.]
I choose to divide the animal world in two primary categories: "Food" (almost anything with four legs or less) and "Non-Food" (almost anything with 5 legs or more). My secondary categories, for equally practical purposes, are "Cute" (pandas and penguins), "Ouch" (tigers and bears), "GTP (Grab the paper)" for spiders and cockroaches, and "Panic" for scorpions and snakes.
I don't think one has to be a biologist to see the difference with the standard classification of animals in the average biology handbook. But should *my* classification for *my* very practical purposes be a reason to dismiss the scientific, standard classification used by biologists and shout "Who cares about the classification"?
If I'd do this on a biology message board, I bet I'd get a long and tedious reply...
　
Groetjes,
Frank


----------



## Abu Rashid

> Languages are classified as belonging to groups of languages with whom  the share a "genetic" relationship, yet, the whole issue of "borrowing"  (a stupid term, I think) confuses things



This is an excellent point from the other thread. It got me thinking, perhaps 'borrowing' is akin to marriage. You might become one family in a sense, but there's no actual genetic relationship established between the husband and wife because of that. They will begin to borrow and imitate and adapt to one anothers mannerisms, but they will not cross the divide into the other's genetic family.


----------



## Ghabi

Frank06 said:


> So, who _does _care about it: people interested in linguistics. Simple as that. The tree model is a tool by and for (comparative) linguists of whatever kind, from professional to amateur. And yes, I think one has to learn how to use this tool, how much value to attach to this tool (not too much, but enough) by studying historical comparative linguistics.
> Just the same way it takes a medical _student_ (or a member of The Lennon Sisters) to interpret Dem Dry Bones.


That's a good point. My impression is that people tend to attach too much importance to the tree model and the comparative method, as if that were _the only way_ for thinking about how languages interact and evolve. Thus we see that will-o'-the-wisp that is called PIE pop up all the time (which always makes my mouth water) with all those asterisks flying around (for *unattested forms, they hesitate to add). No one can deny (and I don't think anyone will try to) that the comparative method is a huge intellectual achievement, but the danger is to deify it. A 18th-century German writes:


> Anhänger des Herrn Kant ihren Gegnern immer vorwerfen, sie ver-ständen ihn nicht, so auch manche glauben, Herr Kant habe Recht, weil sie ihn verstehen.


Of course it's not just Herr Kant's philosophy. Once we succeed to master something rather complicated, we tend to take it too seriously. Who can resist the temptation to show off all those PIE stuff (my mouth waters again) once he's learnt all those brain-racking sound laws? But we do need to learn (perhaps the hard way) how much value we should attach to the thing we've learnt.


----------



## CapnPrep

Frank06 said:


> I cannot imagine that the average language learner who's not particularly interested in linguistic issues but incredibly interested in mastering language X or Y should wonder much about language families and relationships.


It has been amply proven that students who first spend one year memorizing every detail of the language-tree model then go on to learn every foreign language at least twice as fast. (However, all such studies were conducted by members of the sect_*-like*_ Comparativist movement.)
 [to avoid misunderstandings, I'll add la verda stelo: ٭]



> So, who _does _care about it: people interested in linguistics. Simple as that.


Don't forget that the people who are possibly the most obsessed with classification issues are not sincerely interested in linguistics at all, but motivated by regionalist/nationalist (or anti-regionalist/anti-nationalist) ideologies. Just like biology, nuclear physics, etc., the science of historical linguistics can be applied in ugly, destructive ways…


----------



## clevermizo

CapnPrep said:


> Don't forget that the people who are possibly the most obsessed with classification issues are not sincerely interested in linguistics at all, but motivated by regionalist/nationalist (or anti-regionalist/anti-nationalist) ideologies. Just like biology, nuclear physics, etc., the science of historical linguistics can be applied in ugly, destructive ways…



Definitely! We see this especially in paleontology. In the early 20th century, taxonomy in paleontology was often construed to support racist ideology. Before genetic data suggested that all modern humans have a recent African origin, it was common to classify (Western-defined, of course) so-called "races" almost as separate species or subspecies, and then make claims about inherited traits, like "intelligence."

Frank is definitely right of course, that languages are not just collections of words as was suggested by someone in the previous thread to this. 

I'm not sure though. I know instinctively that it's not valid to classify languages based on lexicon (alone), but I'm not sure why. I mean if the poster here is correct, and Persian has 30% Arabic-derived lexicon... does that not still leave an overwhelming 70% non-Arabic-derived lexicon? 

Or consider, Maltese which has been receiving my attention. I think because Persian is still overwhelmingly Persian, it's difficult to say that based on lexicon it should be classified as similar to Arabic. But Maltese has only about 40% Arabic roots, however it is still Semitic and akin to Arabic. This is based on syntax and morphology obviously, but is there a good reason not to call Maltese Romance considering half of its vocabulary is Romance? Furthermore, there are some morphological features which are also Romance, such as some rules of pluralization and gemination in verbs which it has inherited from Southern Italian/Sicilian. 

I'm just playing Devil's Advocate here. In biology, I know that a shark is a fish and a whale is a mammal, even though the shark fin and the dolphin flipper look analogous. If you peel away the fin or the flipper, you find inherently different structures underneath. Under the skin the entire animal is different.

Now, in biology, there's a reason to classify things genetically. A dolphin has more in common with a dog than with a shark, and this affects how you treat it, how you set up a pair of dolphins to mate, how you perform surgery on a dolphin, perhaps even how you would test medications intended for a dolphin. Not that we commonly have dolphins in the hospital and not to say there aren't still big differences between dolphins and dogs, but its genetic relationship to other mammals is important in biology. Testing lab rats has relevance for dolphins. It does not for fish.

What is the real utility in cladistic relationships of languages? If it's simply for pleasure and interest, then that's fine. If knowing correct relationships aids something, like computational analysis, machine learning and translation, then that makes sense to me. If it aids second language acquistion, as CapnPrep mentions above, then fantastic. Is the only reason we care about reconstructing language family relationships the fact that _we want to know_?


----------



## Frank06

Hi,


CapnPrep said:


> It has been amply proven that students who first spend one year memorizing every detail of the language-tree model then go on to learn every foreign language at least twice as fast. (However, all such studies were conducted by members of the sect_*-like*_ Comparativist movement.)


Hey, that's _exactly_ what they told me in the Biblioteko Kompara Lingvoscienco Kaj Historio de Lingvo...
[to avoid misunderstandings, I'll add a green smiley ]


> Don't forget that the people who are possibly the most obsessed with classification issues are not sincerely interested in linguistics at all, but motivated by regionalist/nationalist (or anti-regionalist/anti-nationalist) ideologies. Just like biology, nuclear physics, etc., the science of historical linguistics can be applied in ugly, destructive ways…


In my experience with linguists on the fringe over the years (among which the Turian heavy weight Polat Kaya and lightweight Biblical literalists), the family tree model (FTM) is the first thing that's under attack. Or at least their very ideosyncratic interpration of it. 

Their way of reasoning (simplified): 
- the FTM says that e.g. English comes from Germanic (this works with almost every language/language family), 
- this cannot be true given the large amounts of e.g. Romance words, 
- ergo the FTM is wrong, 
- ergo the whole field of historical linguistics is wrong.

Needless to say that straw man arguments, false dichotomies and in the case of Kaya, world wide, centuries' old Dan Brown like conspiracies against Turkish, dominated the debate.

The second part in those debates consisted of isolating their favourite language, giving it a special status, while using a kind of methodology (what's in a name) which cannot be falisfied. In this part, it is often "established" that their favourite language is morphologically perfect, pure, ideal, glorious, divine, you name it.

In the last part, a new tree model is set up (ignoring their own previous objections against the traditional tree model mentioned under point one). Would it come as a surprise that on top of that model is their favourite language...

Groetjes,

Frank


----------



## Athaulf

CapnPrep said:


> It has been amply proven that students who first spend one year memorizing every detail of the language-tree model then go on to learn every foreign language at least twice as fast.



Do you know by any chance what metrics have been used to calculate this ratio? How exactly do they quantify the speed of language-learning?


----------



## Frank06

Hi,


Ghabi said:


> That's a good point. My impression is that people tend to attach too much importance to the tree model and the comparative method, as if that were _the only way_ for thinking about how languages interact and evolve.


Indeed, it's a model. Comments can be found here, for example.



> Thus we see that will-o'-the-wisp that is called PIE pop up all the time (which always makes my mouth water) with all those asterisks flying around (for *unattested forms, they hesitate to add)


I fail to see your point. 
In a subforum called EHL, there is indeed a fair chance that PIE forms pop up. Almost as big as the chance that French words pop up in the French-English forum. 
The Proto in "Proto-Indo-European", by definition, refers to the fact that PIE is reconstructed (see for example the _Oxford concise dictionary of linguistics_). Reconstructed, by definition, means unattested (otherwise it doesn't need to be reconstructed). 
So every single time somebody writes *Proto*-Indo-European, that person indicates in an incredibly explicite way, and without hesitation, that the form is not attested. 
Furthermore, the asterisk is a symbol which, by general agreement, _also _indicates that the form is reconstructed.


> No one can deny (and I don't think anyone will try to) that the comparative method is a huge intellectual achievement, but the danger is to deify it.


Who would anybody deify a scientific theory or a scientific tool as the comparative method (which is not the only one used in mainstream historical linguistics)? 
If tomorrow the linguistic equivalent of a rabbit fossil will be discovered in the linguistic equivalent of the Pre-Cambrium, which would make linguists aware of serious flaws in their methodology, then those linguists will come up with a new model and a new methodology.
I am wondering about the word 'deify' you used. This is not a weak attempt to start a semantic word game, but I think there isn't a lot of _believe_ involved in mainstream historical linguistics. General acceptance, yes (until circumstances force linguists to come up with something better).



> Once we succeed to master something rather complicated, we tend to take it too seriously. Who can resist the temptation to show off all those PIE stuff (my mouth waters again) once he's learnt all those brain-racking sound laws? But we do need to learn (perhaps the hard way) how much value we should attach to the thing we've learnt.


All in al it's not that complicated. Some basic reading, one should remember where to find back this or that sound change, and some logical thinking. It's less complicated than Kant...

Groetjes,

Frank


----------



## Hulalessar

Frank06 said:


> I think it's difficult not to classify and I wouldn't be surprised if humans were hard-wired for it.



I think they are. Language is often cited as the thing that separates humans from all other animals and language itself is an act of classification. It has to be because the number of phenomena is infinite.

Apart from language itself, humans use language to effect all sorts of classifications. Some are informal, but still useful, whilst others are or aim to be scientific. The classifications made by linguists aim to be scientific. Linguistics is though for the most part a social rather than a hard science. If the classifications made by hard science are not free from problems, then the classifications made by the social sciences are even more prone to them since they are dealing with what humans do and what humans do can only rarely be neatly classified.

The genetic classification of languages seems to work best in what may be termed the middle range. Whether one believes in the monogenesis of language or not, it does seem likely that some language families are related, but that the relationship is unlikely to be demonstrated. It is interesting to note that some groupings have been split, whilst others brought together.

Leaving aside how some extinct languages like Hittite fit in, whether any language should be included or excluded from the Indo-European family is uncontroversial as is which modern language belongs to the Satem or Centum group. At the next level down, there is some argument about whether groupings such as Balto-Slavonic and Italo-Celtic are valid. However, when we go down to the next level we are on safer ground, with no disagreement as to what the main branches of Indo-European are: Germanic, Slavic, Celtic etc. Looking at each branch you start to encounter difficulties trying to decide how many sub-divisions, let alone languages each contains. Within Germanic there is no problem distinguishing between North Germanic and West Germanic, but within both subdivisions whilst we have our "army and navy" languages there is a continuum of dialects.

The classification of Romance languages has been exercising the minds of foreros recently. Any introductory book on language is going to tell you that it includes at least: French, Spanish, Portuguese, Italian and Romanian - all "army and navy" languages. It may include Catalan and Occitan and that will be because both have established and admired literature. A mention may be made of Rhaeto-Romance because Romansh is one of the national languages of Switzerland. A more detailed book may mention any one or more of Galician, Sardinian, Corsican, Francoprovençal and Gascon, not to mention the various vernaculars spoken in Italy.

Apart from deciding how many languages there are, there is the question of how you group them together. No grouping is entirely satisfactory as there are too many overlaps, the most cited being that Catalan is included in Ibero-Romance and Occitan in Gallo-Romance when the two are clearly closely related. There is also the thorny problem of the languages of Italy. "Italian" is not the mother tongue of the majority of Italians and many of the languages differ more from Italian than Italian differs from Spanish. We can then go on to mention that there is wide disagreement among linguists - for example some say that there is no such a thing as Rhaeto-Romance and/or Francoprovençal whilst some suggest that the Occitan spoken in the extreme south-west of France should be classified with Ligurian.

We can see that to an extent some of the classification is based on geographical or political divisions - the two sometimes being close if not identical. That is perhaps not very scientific, but inevitable when we accept that, much as they may want to, linguists cannot discuss language in isolation.


----------



## XiaoRoel

Hay lenguas que forman *diasistemas*: el gallego y el portugués, el catalán y el occitano, sólo por ceñirme al terreno de Hispania. El italiano hay que considerarlo también como un diasistema. También el francés y sus variedades, algunas como el franco-provenzal, bien difernciadal de la lengua oficial francesa y con nivel escrito.
Para no enrollarme, que es tarde, si no consideramos el concepto de *diasistema* _no podremos clasificar_ nunca bien las lenguas. Ni a nivel sincrónico ni diacrónico.


----------



## Hulalessar

"Diasystem" is essentially a socio-cultural rather than purely linguistic concept. It helps linguists to describe situations where speakers A and B speak/write what a linguist would regard as varieties of the same language but speaker A, or it could be B, or then again both, insist they speak/write different languages. Amongst other things it enables "new" languages to pop up from nowhere.

Imagine a country Nambulonia where everyone is regarded as speaking Nambulonian, though they speak it a bit different in the southern region called Lantonia. Lantonia achieves independence and decides its citizens all speak Lantonian. A new orthography is proposed and what was previously considered a variety/dialect of Nambulonian is written to take account of its slightly different phonology and morphology. Nothing has actually changed except that the Lantonian variety of Nambulonian has acquired its own orthography (and an army and navy), but a linguist suddenly finds that he has to describe the situation in Nambulonia and Lantonia as a diasytem.

Valencian is really only called Valencian because the citizens of Valencia do not wish to come under the political or cultural hegemony of Barcelona. Valencian/Catalan is described as a diasystem. On the other hand the citizens of the US have no fear of being dominated by the British and have no problem in referring to the language they speak and write as English. Standard American English and Standard British English are not regarded as forming a diasystem.

Any scientific system of classification needs to be based on "observable characteristics". Whatever the science may be, there may be discussion as to what observable characteristics are relevant or valid. When it comes to languages it is not only difficult to set down hard and fast criteria as to what observable characteristics should be taken into account and what weight should be given to different characteristics, but it is also difficult to find ways of measuring differences. This is why "dialect" has to be a relative concept. In deciding how to classify languages linguists would I am sure just like to be able to just "look at the language" and not have to taken into account factors such as number of speakers; whether a variety is written or not; the socio/economic/political/cultural status of a variety's speakers; history and other non-linguistic factors. But of course they do because language is a human activity and cannot be considered in isolation. In the end, no system of classification is going to please everyone.


----------



## Frank06

Hi,



XiaoRoel said:


> Hay lenguas que forman *diasistemas*: el gallego y el portugués, el catalán y el occitano, sólo por ceñirme al terreno de Hispania. El italiano hay que considerarlo también como un diasistema. También el francés y sus variedades, algunas como el franco-provenzal, bien difernciadal de la lengua oficial francesa y con nivel escrito.
> Para no enrollarme, que es tarde, si no consideramos el concepto de *diasistema* _no podremos clasificar_ nunca bien las lenguas. Ni a nivel sincrónico ni diacrónico.


For the classification of the Romance languages, one doesn't have to know what a "diasistema" is, one has to realise what classification in this context means and one needs to understand what is getting classified and on which basis.

I don't know which ideas are triggered in your head by the simple label "Paris" and "Brussels" (or by the simple label "Germanic", "Slavic", "German" and "Polish"), but I think that (1) we all have a fairly good idea of what Paris is (probably less when thinking about Brussels) and that (2) we can all agree that it doesn't matter that much when using a compass as an indicator of a direction. 
I think we have to understand the "simple labels" in a language family tree on this level.

The story changes when we'd like to go from one city to another, then we can use a gps system (rather than a compass). We'd also need very precise addresses, or at least we'll have to have an idea where in Paris we want to arrive, what we mean by "Paris": do we mean the very city of Paris (if so, which department), do we mean the the city +/- banlieus, and if so, where exactly.

The story _and_ the tools change once again if we'd like to get informed about the social, economic and political life in both cities, and the way both cities are "connected" socially, politically and economically (especially economically ). To find out we take a book, visit the archives of a few newspapers and we start to read and learn. Neither a compass nor a gps will provide a lot of help. 

These contexts not only require a different interpreation of the simple labels, but also a set of different tools. And I think it's more or less the same when talking about languages, language classification and relations between languages.

Now let's look again at the the post quoted above: Let's imagine that I am interested in the Romance languages and that I have some basic knowledge about those languages and their histories. Let's also imagine it's the first time I come across the simple label "Galego" and that I have no idea how or where to situate that language. I google and I find the labels (or classification):

Italic > Romance > ... > Gallo-Iberian > ... West-Iberian

If I know what the combination "Italic > Romance" means, then I understand immediately that *Galego comes from Latin*. Imagine that I already came across the term West Iberian in connection with Portuguese, so I can *expect Galego to be "similar", in one or another way, to Portuguese* rather than to French or Romanian. 
Even if I still don't have a clue what Galego looks or sounds like, I used a simple classification tool to get myself orientated.

Do I know by now that Galego has some (minor) lexical influence from Germanic languages? Do I know by now that it has less loans from Arabic than let's say Spanish or Portuguese? Do I know anything about the dialect variation withing Galego? Does it matter what a diasistema is? Do I know anything about Galego apart from the pieces of information (or expectations) I marked in *bold grey*? *No*.
Does it matter what exactly Galego is (a language, a dialect, a part of a dialect continuum)? No, not really. For a basic orientation, I need a basic, simple label and a classification tool (which *looks* basic and simple on the surface). 

Would it be incredibly weird if I'd expect to find more information than *those marked items* when reading a language family tree, which is a list of labels on several levels, connected by black lines? Answer it for yourself.


My second example concerns Syldavian, a language spoken on the Balkan/Eastern Europe. The corpus of texts in this language is quite limited and only to be found in the specialised literature. It will be clear from the examples, that the language is Germanic (that's the classification suggested by Mark Rosenfelder anyway, who based himself a bit too much on the lexicon for this).
Looking at the texts, we can also be sure that it is Western, Continental, non-High German (for this we have to turn to phonetical/phonological features as far as they are represented by the spelling). There no traces of the High German consonant shift or, _very_ simply put, we will not find an instance of k or p where German cognates would have ch or pf (f).

Comparing it to other languages that are West Germanic, continental and non-High German, we can postulate that it probably is one or another way related to Low German or Low Franconian.

If you want to find out how a Germanic language ended up in Eastern Europe (and on the Balkan), you'll have to do some further reading (for example here and here), where you will find out that Syldavian has some minor Slavic influences.
People who didn't know Syldavian, will also find out that Syldavian is a fictional language created by Hergé for his Tintin comic books and that it's based upon a dialect spoken in a very specific part of Brussels where Hergé grew up as a kid.

Apart from all this, I'd like to know whether or not Syldavian forms a diasystem with Bordurian...


Groetjes,

Frank


----------



## XiaoRoel

No estaba hablando ni de sociolingüística (tema que me interesa poco en mi calidad de filólogo, pero mucho como ciudadano bilingüe), ni de "marca" para búsquedas en Google, en las que poco fío.
Me refería a un concepto lingüistico; a lenguas que pertenecen a la misma familia lingüística y mantienen en más del 98 ó 99 por ciento
los mecanismos morfosintáctico y en su nivel culto, y en ciertas zonas dialectales son mutuamente intercomprensibles casi al 100%. Además, sus derivas lingüisticas tiene direcciones comunes (otra nuevas, en cambio contrarias) y es posible el préstamo entre ellas sin que suene a extranjerismo.
Entre el grupo iberorromance, hay un diasistema evidente; el que forman el gallego, las variedades diatópicas del portugués y, en ciertas interpretaciones, la fala del Xálima en Cáceres.
Otro idioma, el catalán forma diasitema con el occitano, con Los pirineos como barrera, por eso las lenguas occitanas de Francia están influidas por el superestrato francés, el catalán por el español (el llamado valenciano no es una lengua, sino un dialecto occidental del catalán muy influido por el aragonés y modernamente por el español).
El mozárabe debió de tener zonas dialectales, por lo que conservamos. Los _*dialectos norteños*_ del latín o, mejor, ya lenguas en el s.VIII eran: *gallego*, *astur-leonés*, un *diasistema* en el que participaban el *castellano*, el riojano-navarro y el navarro-aragonés, que acabaron conluyendo en la propia edad media en una sola lengua, el castellano.
El aragonés y el catalán, perteneciente al diaasistema tampón del occitano (no hay que olvidar que el catalán se forja, para resultar la lengua que hoy es, al norte de los Pirineos, su zona de origen son el Rosellón y la Cerdaña.
Todo esto se mide por _isoglosas, fonéticas, léxicas y de us_o (idiolectos), y así se pùeden *demarcar fonteras precisas entre diasistemas y lenguas, dentro de los diasistemas las lenguas, y dentro de las lenguas los dialectos y en éstos los idiolectos.*
Todo muy lingüístico.


----------



## sokol

I'd like to link to the Wave Model entry in Wiki, as well as the entry on the Tree Model.

The Tree Model (see e. g. Austro-Asiatic) good for demonstrating genetical relationships; it is also very handy if you want to include a timeline which indicates when approximately languages split.
So the tree model is supposed to show which languages are closer related to each other - but not necessarily more similar (even though this is the case quite often).

The Wave Model (see e. g. Romance) however is not only or primarily concerned with genetical relationship but also mutual influence within the dialect continuum. Such mutual influences spreading from some political and cultural centres are well-known and attested both in Romance and German dialect continuums.
If you take a look at the Romance model linked to above you will easily recognise which groups would be referred to as "diasystems" as used by XiaoRoel, but it also shows that the way Romance linguists use the term "diasystem" isn't necessarily precise: they use the term "Italian diasystem" when Italian dialects (without Corsican/Sardinian) at least should be sub-divided into two (if not more) diasystems.

However, this model of Romance languages I'm linking to here does not show the historical perspective: Island Romance (with Old Corsican and Sardinian) is grouped as a separate group, as opposed to all other Romance dialects.
This is so because, historically, it is believed that Island Romance was the first group to split from Vulgar Latin; it is not more different from Italy mainland dialects than Romanian if you compare both on a synchronic level (quite the contrary!), but it is more conservative in some respects (Latin "c" is retained as "k"), and it shows some innovations of its own (like the dialect being taken from "ille/ipse").
So Island Romance is standing apart very much genetically - but not so much structurally; if you'd compare Romance language on a structural level then I'm sure we would have to classify the eastern (Romanian) group as standing apart (with postponed article, "Balkanic" vowel system, case system with vocative, and neuter retained as gender).

So while Romanian actually is genetically linked rather close to Northern Italian dialects (closer than Island Romance languages) it is more different in structural terms.

I hope this illustrates that genetic relationship and structural similarity aren't necessarily the same thing (even though there surely will be plenty of cases where both go more or less together).

The problem with establishing genetic relationships usually is that our sources are scarce, and that there are many cases where it is different to tell whether similarities are due to a genuine genetic relationship or influence from neighbouring languages and/or dialects.

And the problem with structural relationship is how to measure them, and what linguistic level (from phonetics and phonology to morphology and syntax and finally lexicology) we should consider being "more" relevant.

In my experience, lexicology is not a good marker for genetic relationship at all - words easily cross borders; and neither are they a good indicator for structural relationship (except in a few cases where their morphological integration gives some clues).
Despite all this, lexicology has helped establish the Indo-European genetic group; so even words may be important for genetic classification - _*all*_ linguistic levels are indeed.


----------



## Frank06

XiaoRoel said:


> No estaba hablando ni de sociolingüística (tema que me interesa poco en mi calidad de filólogo, pero mucho como ciudadano bilingüe), ni de "marca" para búsquedas en Google, en las que poco fío.


This thread indeed is not about sociolinguistics. You may also pick any source of information you do trust. 


> Me refería a un concepto lingüistico; [...]


Yes. Okay.


> Entre el grupo iberorromance, hay un diasistema evidente; el que forman el gallego, las variedades diatópicas del portugués y, en ciertas interpretaciones, la fala del Xálima en Cáceres.
> [...]
> Todo esto se mide por _isoglosas, fonéticas, léxicas y de us_o (idiolectos), y así se pùeden *demarcar fonteras precisas entre diasistemas y lenguas, dentro de los diasistemas las lenguas, y dentro de las lenguas los dialectos y en éstos los idiolectos.*
> Todo muy lingüístico.


I understand you and follow you more or less (because my Spanish is not that good, I'm sorry). But I don't see any contradiction with what has been written before in this thread. Thanks for the additional information.

Groetjes,

Frank


----------

