# Using Google "counts" as authority



## Kevin Beach

I have noticed an increasing frequency of citing the number of times a particular word, phrase or usage appears in the listings produced by a Google search as authority for for the proposition that the usage is correct.

I know that language is a living phenomenon and that previous errors can become the norm after a long time, but have we _really_ reached the stage where mere abundance is the conclusive criterion?

After all, a frequently repeated error is none the less an error.


----------



## Paulfromitaly

I agree with you.
Although the number of hits a word/expression has on Google can be indicative of its spread, nevertheless we have to bear in mind that personal blogs, websites hosted by non-native speakers and all those kinds of sources_ are often not reliable_ from a linguistic point of view.


----------



## Cagey

On the other hand, I find searches of edited sources, such as _Google News_, _Google Books_ and _Google Scholar _useful as indications of how words and phrases are actually used by people who have a concern for grammatical conventions.  

They are also helpful for finding examples from real life.  Sentences contrived to be grammatical examples often sound strange and stilted, although some people are better at this than I am.     

What the _number of hits_ represents, however, is another question.   If you go to the last page of citations, you will often (always?) find a discrepancy between that number and the number of hits claimed the first page.  They can be off by factors of 10 or even higher.   And it sometimes happens that the usage that claims the greater number of hits has, in fact, fewer citations.  Thus, the numbers probably give you a sense of what is used "lots" and what is rarely used.  They are less useful for closer comparisons of usages.

Not to mention that Google only looks for words in a certain pattern. It ignores punctuation and cannot sort out syntax.  So high numbers of hits may include a large proportion of citations with completely unrelated structures or usages.

(Clearly, I have spent far too much time contemplating Google searches.)


----------



## lsp

I'm always reminded of this post when we revisit the validity of using Google counts to prove language "correctness." Google merely counts. It counts examples of improper and proper use with the same weight, likewise the posts of poor spellers and bad typists, among others likely to contribute to high counts of incorrect spelling and grammar.


----------



## cuchuflete

Google hits appear to have some correlation with use, both bad, good, and absolutely absurd.  If a search for a word or term shows very few occurrences, that is usually a hint that it is either rare or a mistaken spelling or contorted syntax.  

For example, the word _bogus _is widely used: Results *1* - *10* of about *17,300,000* for *bogus
*
while a misspelled form, _boguiss_, is not:  Results *1* - *9* of *9* for *boguiss
*
Kevin Beach is correct, in that search engine results require some human interpretation.  By themselves, they are only pointers towards patterns of use, and do not speak to what is or is not correct.


----------



## TimLA

I believe that the number of Google hits can be used as "information" but not as anything definitive.

I use Google hundreds of times a day, and have a completely *arbitrary* standard to determine if I'm on the right track.
I've found that if there are less than 100 hits,
the word or phrase in question is EXTREMELY uncommon,
or there is something wrong with the data I've entered into the engine (often misspellings or just bad grammar).

One of my "tricks" in finding text is:
"phrase in language 1" (words in language 2) (words in language 2)
Often it will help me zoom in on what I call "direct translations" - side by side translations done by experts.

Interestingly, I've recently noted that I'll get many hits with a query such as the one above, but when I click on the page, it's often in only one language.
I wonder if those webpages (or perhaps even Google) have/has an automatic translator that allows these hits to be added to the list -
OFTEN they are VERY misleading.

If you get 1,000,000 hits on a word or a phrase, yes, it exists - but whether it is correct, or appropriate in the context in question - should be left to "non-computers".


----------



## cycloneviv

I do occasionally use Google results to explain something, but I make sure to browse through at least the first couple of pages of results and comment on whether these seem to be written by native English speakers and the context in which they appear, for example, if the term/phrase only appeared in blogs and/or forum posts written in extremely casual/largely incorrect language, I would mention that and probably draw the conclusion that we can not rely on the Google results. I also look for "false" hits, for example where the first word of a two word expression is broken by a full-stop, a semi-colon or the like.

The rules advise members to do an internet search before posting their question. Given this, I don't think we can really discount such results out-of-hand.


----------



## Frank06

Hi,



Kevin Beach said:


> I have noticed an increasing frequency of citing the number of times a particular word, phrase or usage appears in the listings produced by a Google search as authority for for the proposition that the usage is correct.


Depends on how you look upon 'correct' versus 'incorrect' and on how the data are presented.
But it would be silly to consider or to present those numbers as authorative. They can only indicate (at best) a tendency.



> I know that language is a living phenomenon and that previous errors can become the norm after a long time, but have we _really_ reached the stage where mere abundance is the conclusive criterion?


If you look at the history of grammars and of language description / prescription, you'll notice that we're at the end of a plus minus 500 year long period during which the opinion of one enlightened spirit (or group of) dictated what's correct or incorrect. What I am concerned, at last.

But no, we didn't _reach_ the stage where mered abundance is the _conclusive_ criterion. The first reason is simple: there are hardly any conclusive criteria. The second reason is simple too: abundance _always _has been a criterion, so why would it be different now?

Groetjes,

Frank


----------



## bibliolept

When I saw this threads title, I decided to step in just so I could be excoriated for my past offenses. However, I only use Google results, and sparingly, at that, as a guide to whether a phrase is idiomatic or in order to determine the commonality of a specific construction or collocation. I would never consider using Google results as prima facie evidence of correctness.



Frank06 said:


> But no, we didn't _reach_ the stage where mered abundance is the _conclusive_ criterion. The first reason is simple: there are hardly any conclusive criteria. The second reason is simple too: abundance _always _has been a criterion, so why would it be different now?



Prevalence or popularity must be the determining factor in a language without a central authority. Though dictionaries are guides, what we have is essentially an ochlocratic language. Nonetheless, surveys of usage in the past wouldn't have included reviews of graffiti, which is about the level to which some forums, blogs, and other "user-generated" and "social networking" spaces can only hope to aspire.


----------



## timpeac

Kevin Beach said:


> I have noticed an increasing frequency of citing the number of times a particular word, phrase or usage appears in the listings produced by a Google search as authority for for the proposition that the usage is correct.
> 
> I know that language is a living phenomenon and that previous errors can become the norm after a long time, but have we _really_ reached the stage where mere abundance is the conclusive criterion?
> 
> After all, a frequently repeated error is none the less an error.


Google is an indication, that's all. "Correctness" does not equal abundance (or at least it doesn't always) and so a google result should always be taken with a grain of salt. That said, if there was a usage that was criticised by some grammarians but had, say, a google count in the millions I would be very disposed to consider that the grammarians were out of date on this point rather than several million people wrong. After all, correct usage is a fashion not a scientific fact.


----------



## Singinswtt11

bibliolept said:


> When I saw this threads title, I decided to step in just so I could be excoriated for my past offenses. However, I only use Google results, and sparingly, at that, as a guide to whether a phrase is idiomatic or in order to determine the commonality of a specific construction or collocation. I would never consider using Google results as prima facie evidence of correctness.


 
I agree with you here, biblio. Another thing I try and do when checking for idiomaticity is, depending on the audience that will be reading the translation, I'll put in my query and then something like site:.gob.mx to restrict the domain that is searched. I'd like to think that documents published by a governmental agency in mexico are going to be fairly more reliable than fulanito's blog.


----------



## giovannino

I agree with practically everything that has been said in this thread, especially about using Google hits as evidence of the correctness of a phrase or usage. I, too, only discovered recently that the number of results is drastically reduced when you go to the last page [does anybody know why that happens?] so now, when I do quote Google results, I quote the resuls on the last page.

However, having said that, I still find Google searches very useful. Because linguistic research is underfunded in Italy, we do not have the comprehensive corpus-based descriptions of oral and written usage available for English. Therefore I find it very useful to search Google to find out, for example, whether a phrase which I think is more commonly used than another (both being perfectly correct) only seems so to me because it's more widespread _in my region._ So Google is a powerful learning tool for me and helps me to be more accurate in my replies.

Of course I don't just look at the number of results but go over many of the quotes, taking their source and context into account.
That is where a Google search can also help in debates over "correctness". I've sometimes been able to quote countless examples of a particular usage, all drawn from the works of some of our most distinguished writers, spanning several centuries, or from scholarly essays and textbooks -- a usage that was being objected to in the forum based on arbitrary rules introduced by purists in the 19th century and nowadays discarded by language scholars, though still perpetuated by some schoolteachers (e.g. "you shouldn't start a sentence with _E (and)_").


----------



## Cagey

giovannino said:


> I agree with practically everything that has been said in this thread, especially about using Google hits as evidence of the correctness of a phrase or usage. I, too, only discovered recently that _the number of results is drastically reduced when you go to the last page [does anybody know why that happens?_] .... (emphasis added)



Apparently, the answer is that no one outside Google knows.  This information seems to be a trade secret; if they explained this, they would be explaining how their search and ranking work.  

There is an interesting discussion of the question in the lower half of this post in Language Log.  Because the linguists at _Language Log_ use frequency studies professionally, they are very interested in how searches work, and are knowledgeable about statistics in general.  Below is a relevant quote. (Here _strings_ are series of words in sequence):_The numbers google gives in response to a query are not counts of the number of pages with the given string. Rather, they are estimates based on a formula that, so far as I know, is not public. For simple searches, the estimate is presumably based on a calculation of the probability of the page having all the search terms based on the number of pages in the google caches for each of the component terms. But once you start doing string searches, this sort of approach becomes very unreliable.  _​The source describes concrete examples of the peculiarities of Google searches.  If you are interested in the question, you might like to read the original article.


----------



## Etcetera

I often use Google to check if my translation of this or that phrase into English is acceptable. But I always make sure that the site on which "my" phrase occurs is a British or American website (1), and/or can at least boast good English (2).


----------

