# The vocabulary of dead languages



## slowik

How many words are there in these languages?


Latin
Ancient Greek
Old English
These languages are dead (well, actually Ancient Greek and Old English aren't), so it is possible to count all the words, right?


----------



## Frank06

slowik said:


> How many words are there in these languages?
> 
> 
> Latin
> Ancient Greek
> Old English
> These languages are dead (well, actually Ancient Greek and Old English aren't), so it is possible to count all the words, right?


I understand the question as following: we take all Old English texts available, and count all the words.

If so, then there is still a rather fundamental problem: What do you mean by Latin, Ancient Greek and Old English? What do you include/exclude and why?

I mean, is Old English the language used between January 1, 881 and December 31 1066 (to give just two completely random dates)? If not, why not?

I think it will be even more dificult in connection to Latin (Pre-Classic, Classic, Vulgar, Medieval, plus probably variants in post-Roman Gaul/France, Italy, Iberia, etc. etc.).

Groetjes,

Frank


----------



## slowik

Well, I admit it's a bit vague but you can buy dictionaries of these languages, right? You can say with certainty that a word is an Old English word, right? I agree that it's a difficult (and perhaps a bit weird) question but you can learn these languages, so you can also learn their vocabulary and my question is how large these vocabularies actually are. 
But after thinking about it Latin and Ancient Greek were used for too long to distinguish one version of each. I guess we could stick to the language used in what we have now, which is written texts.


----------



## 0m1

How are Old English and Ancient Greek not dead, out of interest?


----------



## slowik

They evolved into modern English and Greek, and Latin didn't (at least not in the same way).


----------



## Frank06

0m1 said:


> How are Old English and Ancient Greek not dead, out of interest?



I think this part of the Wiki-article on dead languages summarises it well:


> This has happened to Latin, which (through Vulgar Latin) eventually developed into the family of Romance languages. Such a process is normally not described as "language death", because it involves an unbroken chain of normal transmission of the language from one generation to the next, with only minute changes at every single point in the chain. There is thus no one point where "Latin died".


It would be the same as saying "early Modern English is a dead language"...

Old English, Latin, Ancient Greek are just labels, quite often based upon a convention. 
I cannot quote them right now, but I have a few Old English grammar books, and they all define Old English in a slightly different way. Define is the wrong word here. They use other conventions.
From the top of my head, my Greek dictinaries mention anything "between" Homeric Greek and New Testament Greek, but ignore the language rendered in Linear B. 

It would be possile, I guess, for a language as Old Persian: a very strict definition, a small corpus of texts in a specific script (well, these three items are part of the normal definition of Old Persian). The same goes more or less for Gothic (and no, I am not going to count the words ;-). Possible, if we could come to an agreement about the definiton of the word "word", obviously.



slowik said:


> They evolved into modern English and Greek, and Latin didn't (at least not in the same way).


What's the difference between let's say Old English > Middle English > Modern English and Latin > Old French > French, apart from the fact that we change the label Latin with the label French? I don't see technical problem labeling French as Modern French Latin.


Frank


----------



## DenisBiH

slowik said:


> They evolved into modern English and Greek, and Latin didn't (at least not in the same way).




Not in the same way? Which way then?


----------



## slowik

Old English and Ancient Greek evolved into modern English and Greek and Latin evolved into the whole Romance language group, I guess.

To be honest I don't want to start a discussion, I'm just curious: if you can say that a word is an Old English word, could you count them, and if you could how many are there. That's it. Whether the languages I mentioned are dead or not is not important to me. I do understand the constant development of languages. Still, there is a label 'Old English', 'Ancient Greek' etc. and there are dictionaries of such languages so...

But I've read your answers and yes, these are just labels and it's all based on conventions. So nevermind


----------



## phosphore

The problem is that to count the vocabulary makes sense only on the synchronic level. So you may count the vocabulary of Classical Latin and I am sure someone already did that, but to count the vocabulary of Latin in all epochs would hardly make any sense.


----------



## Alxmrphi

You've also got a problem about words that were never written, or dialect forms, there's a word in my dialect of English that means "mean / sly", I have no idea how it is spelled because I've only ever heard it spoken, it's not a word you'd write, but I guess by definition it is a word, it's pronounced /ɑːlaːs/ but you wouldn't find it in any dictionary, so in a way these sorts of forms (of which I expect there's a lot more that were never written) will be omitted and it'd never be a true count of all the words in the language. 

Given the linguistic situation in Old English the variability between preferences in certain words might be very high, and often works were translated into the Wessex dialect and older versions lost, so there's a heavy tendency for other forms / to be converted into more prestige forms, so words that were used by them are not available to us now.


----------



## sokol

phosphore said:


> The problem is that to count the vocabulary makes sense only on the synchronic level. So you may count the vocabulary of Classical Latin and I am sure someone already did that, but to count the vocabulary of Latin in all epochs would hardly make any sense.



Yes, to mix different epochs of the development of Latin language would be to mix "different languages", in a way - as Medieval Latin significantly differs from Classical Latin already, and in the case of Latin there are still new words created continuously (not only for the benefit of the Vatican and the Pope - there are actually Latin words for Communism, to give an example; a political concept which didn't exist at Caesar's time).

And there's also another problem - we have no means to establish a full corpus of ancient Latin: even though the corpus for Classical Latin is huge it isn't still complete. (Take a look at the Realencyclopädie - this isn't a dictionary, and it isn't about Latin culture alone, but it comprises more than a hundred volumes ...).
You can only count words based on a specific corpus.

Take Hittite - the corpus of written Hittite documents is relatively small, and of that not even all have been published, so if you try to count all Hittite words based on the published corpus that'd be rather easy.
However, this does not include the non-published cuneiform tables accumulating dust in store-rooms of museums, and even more importantly it doesn't include those words which never have been written down.


----------



## phosphore

The whole idea of counting the vocabulary, be it a dead or a living language, is rather problematic. For a living language, for example, is the technical vocabulary part of the vocabulary? Are dialectisms and regionalisms part of the vocabulary of the language as a whole? And what about doublets, would you count the words or the ideas? And for a dead language, for example, what about taboo words, some of which were perhaps never written? The point is that the question of size of the vocabulary poses much more trouble than insight into a particular language and any number you might be given would have to be followed by a detailed description of what had been taken into account.


----------



## sokol

I also agree with this, phosophore - yes indeed, to count vocabulary in principle is problematic.
(Also, I do not quite see the point of even trying to: why count, for which reasons? - but that's just my opinion. )


----------



## DenisBiH

Well, absolute numbers may not mean much, but ratios (say, native words versus borrowed) or clusters (groups of words denoting a particular concept or belonging to a certain field) could be interesting.

Things like English has x% of borrowings, Eskimos have y words for ice, etc.


----------



## slowik

Actually I wondered if anyone could learn the whole (or close to the whole) vocabulary of a language such as Latin or Ancient Greek.


----------



## sokol

slowik said:


> Actually I wondered if anyone could learn the whole (or close to the whole) vocabulary of a language such as Latin or Ancient Greek.


Radio Yerevan says, in principle yes, but it is more fun if you limit yourself to Seneca*). 

*) And even more fun if you'd limit yourself to Seneca's _Apocolocyntosis_ alone. 


The thing is, certainly you could learn the whole known corpus of Latin of a certain period, but it would take awfully long - and when you've finished you'd possibly wonder why you've gone to all that trouble in the first place.


----------



## Alxmrphi

> The thing is, certainly you could learn the whole known corpus of Latin  of a certain period, but it would take awfully long - and when you've  finished you'd possibly wonder why you've gone to all that trouble in  the first place.


In our pre-Time Machine era, maybe


----------



## miguel89

sokol said:


> The thing is, certainly you could learn the whole known corpus of Latin of a certain period, but it would take awfully long - and when you've finished you'd possibly wonder why you've gone to all that trouble in the first place.



It's only usefulness would lie in making it into the Guiness Records... so, not really worth the time.


----------



## Rallino

I think some people believe that the number of the vocabulary shows how rich that language is. They want to see numbers.

By the way there was a thread, a few days ago, where some guy provided a link to some site that I don't recall now, that English had reached 1 million words, and was considered to be the richest language.

Now, out of curiousity, how do you think that they counted it?


----------



## Frank06

Rallino said:


> By the way there was a thread, a few days ago, where some guy provided a link to some site that I don't recall now, that English had reached 1 million words, and was considered to be the richest language.


*We have discussed the 1 million words claim in this thread.

Frank
Moderator EHL*


----------



## bibax

I disagree that it is impossible to count all known words of Classical Latin. Classical Latin has been studied for centuries by many (Hittite is a different story), the number of known ancient Latin texts is finite. The number of words in these texts must be finite as well.

The biggest dictionaries of Classical Latin contain about 60,000 words, including proper names and adjectives derived from proper names.

My estimation (if we don't count the proper names):
~ 40,000 (definitely less than 50,000) words in all preserved Classical Latin texts

IMHO it is not impossible to learn (nearly) all known words of Classical Latin (a retentive memory recommended).


----------



## miguel89

Yes, but then you'd still have face the problem of defining what you'll be considering a word, as Frank06 has pointed out. If you decide to think of a word as an entry in a dictionary, then the values you give would be correct, but someone could argue that this definition is nonetheless arbitrary.


----------



## bibax

The definitions are usually arbitrary. You can define what you want, for example brevis/brevior/brevissimus form one "word". I think it could be interesting to know how many Latin words are preserved from the ancient times. However it is still possible that many Classical Latin words were preserved in later texts but they are inexistent in the ancient texts that we know today.


----------



## Frank06

bibax said:


> I disagree that it is impossible to count all known words of Classical Latin.


Please note that we started with the rather vague word "Latin". 
The narrower we define (or limit) "Latin", in this case "Classical Latin", the easier it will become to determine which texts are available and hence to count the words.

But if we are to accept a definition of Classical Latin as e.g. given on Wikipedia, viz.:


> The term refers to the *canonicity of works of literature *written in Latin in the late Roman republic and the early to middle Roman empire: "that is to say, that of belonging to an exclusive group of authors (or works) that were considered to be emblematic of a certain genre."


then we can only conclude that it has more to do with philology than with linguistics. 

Frank


----------

