# Top 100 Catalan verbs - recursos



## soupdragon78

Sabeu si existeix una llista d'els verbs mes frequents en catala?
Despres de un parella de horas de recerca, no he trobat res.
I'm trying to put together a mini-dictionary of conjugated verbs as the one I bought at the shops uses verbs that seem a little uncommon....
Thanks in advance 
Soup


(and sorry about my spelling/grammar.)


----------



## soupdragon78

If anyone knows of an online word frequency list or corpus, that would be even better...
Soup


----------



## ajohan

There's a corpus in the 'recursos' section but you can only consult it, you can't open it and make word frequency lists. One solution might be to get hold of a Spanish verb list, like the one at Lingolex (I believe we are not allowed to post links to commercial websites), because Spanish and Catalan people tend to talk about the same things, don't they? 
The problem with this type of word list is that if they are not based on a varied corpus, they tend to be wrong because they concentrate too much on action verbs like 'swim' rather than everyday verbs to describe feelings. In the one in question, 'sentir' is absent.


----------



## soupdragon78

Cheers Ajohan.
I found another good corpus http://ramsesii.upf.es/cgi-bin/cucweb/search-form.pl?lang=ca_ES. But still no word frequency lists.
I've emailed Dacco but they'll probably take a while to get back to me.
I might resort to using a castilian list I suppose.
Thanks for the help.
Soup


----------



## ajohan

I realise that using a Spanish one might cause a few raised eyebrows. To check there are no alarming emissions you might want to get a small Catalan one together yourself of a few thousand words and run it through a programme like Wordsmith Tools or KWICFinder.


----------



## soupdragon78

Thanks again Ajohan. I'll give it a crack. This whole idea seemed so simple a few hours ago...


----------



## soupdragon78

Visca CuCWeb!
(Tranki Mods que es gratuït del Universitat Pompeu Fabra...) 
Here are the top 14:

* ser* 

* haver*

* poder*

* anar*

* fer*

* estar*

* caldre*

* veure*

* trobar*

* realitzar*

* donar*

* dir*

* presentar*


Cerceu Cucweb stats, puseu *Verb *en la Categoria morfològica i eligiu *Freqs. Lema* on es posa Mostra com a resposta. Click *Fes l'analisi* i ja esta!

Am I a nerd to be so excited by this? Yes, I probably am.

Here is a link to the full list of the top 200:
http://ramsesii.upf.es/cgi-bin/cucw...anning=5000000&nresults=500&minfreq=&maxfreq=


----------



## ajohan

It's very interesting but I have my reservations. It only seems to be looking up the infinitives and not all the lemmas, which would explain why 'saber' is so far down the list and 'presentar' so far up. The technology might not be in place to relate 'sé' to 'saber' for example. I might be wrong, of course, because the introduction shows that it's a very thorough piece of development. The other difficulty is that it doesn't reflect spoken language, which might explain why 'realitzar' is so far up.
Even so Soupdragon, like you I'm going to spend hours playing and experimenting with it and when I understand it better, I'll report back.


----------



## soupdragon78

Have fun with it. The results do represent all forms of the verb (click the blue arrows under _exemples)_ as long as you specify *Freqs. lema *in the search. But I do have to agree with you that this list does have it's limitations as, unlike other similar corpora, it is only based on internet articles. Like you say it doesn't 100% reflect spoken language.
I'm going to play around and see if I can do the same trick with the corpus you told me about in the resources thread. I understand that it is the "big one" when it comes to Catala. If it's like the big English ones it should also reference transcripts of natural speech and radio.
Fingers crossed. I'll keep you posted.
Good luck.
Soup


----------



## soupdragon78

Oh well so much for that idea...
There is no access to the statistic on the CTILC site. Worse still I couldn't post any results anyway as:
"Són rigorosament prohibides, sense l'autorització escrita dels titulars del copyright, l'extracció, la reutilització i la reproducció en qualsevol mena de suport o el tractament informàtic del contingut de les dades del CTILC"


----------

