Al la enhavo

Most Common Esperanto Words

de Alkanadi, 2015-majo-07

Mesaĝoj: 20

Lingvo: English

1Guy1 (Montri la profilon) 2015-majo-07 19:04:17

Kirilo81:Why do Esperantists always tend to reinvent the wheel?
(BTW: Inhaltsverzeichnis = table of content)
I wouldn't call a free resource instead of paid one reinventing the wheel. ridulo.gif

Leke (Montri la profilon) 2015-majo-08 14:21:48

Alkanadi:
Leke:alicio?
Yes but I made all the words lower case before sorting them. Because I don't want La to be counted as a different word than la.

In the text the name appears as Alicio
Sorry, in my insomniactic daze, I didn't even realise this was from the Alice in Wonderland book. I'm curious though, how many unique (or at least root) words are there in that book?

Venkistido (Montri la profilon) 2015-majo-08 14:52:35

I think it is self-evident that word frequency tables are very useful for learners beginning to read a language, but the benefit beyond reading is another question. When texts are analysed in this way they produce "league tables" for the written language and not the spoken language. Frequency tables for the spoken language would also be useful, but would involve a research project considerably trickier and more expensive to execute.

Alkanadi (Montri la profilon) 2015-majo-10 06:50:20

Leke:I'm curious though, how many unique (or at least root) words are there in that book?
That would be a tough program to write because of how Esperanto creates words.

I guess the program would look something like this:
- Read an exhaustive dictionary and remove endings to get roots
- Use regex to match the roots
- Count each instance

Alkanadi (Montri la profilon) 2015-majo-11 09:48:27

Leke:I'm curious though, how many unique (or at least root) words are there in that book?
From the source file that I used, there is a little over 5 thousand unique words in the text. My program counts diris, diras, and diros as 3 different words. Maybe, I will try to find the roots of each when I get time and motivation.

morico (Montri la profilon) 2015-majo-16 21:45:08

There are several lists if you make a research with Google (Guglo)

Most common words in Esperanto - Wikipedia, the free ...
en.wikipedia.org/
.../Most_common_words_in_Espera...
Traduire cette page
This is a list of the 200 most frequently used words in

Esperanto-English Glossary - Esperanto - Panorama (550 words)
esperanto-panorama.net/angla/vortaro.htm

650 Most Common Words in Esperanto (segment 1/13 ...
https://quizlet.com/.../650-most-common-words-in-e......
Traduire cette page
Vocabulary words for 650 Most Common Words in Esperanto (segment 1/13). Includes studying games and tools such as flashcards.

Esperanto at Stanford - Graded Vocabulary Lists
www.esperanto.org ›
... › Handouts
Traduire cette page
The vocabulary lists are currently broken up into 2 broad groups. ... Level 1 through 5 - 500 most commonly used roots; Level 2; Level 3; Level 4; Level 5; Level 6

ktp

eshapard (Montri la profilon) 2015-majo-25 07:14:33

Alkanadi:After much toil and trouble (because I am not the best programmer in the word), I successfully created a nice Python program that can read a text and output the number of occurrences that each word appears.

Using the Esperanto Alice in Wonderland text from Project Gutenberg, these are the top 10 most used words and the number of times that they appear:

la - 2153 times
kaj - 659 times
mi - 517 times
ŝi - 507 times
ne - 457 times
vi - 370 times
alicio - 347 times
diris - 331 times
al - 314 times
en - 296 times

There are a total of 24562 words in the source text that I used. That means that if someone learns the word la then they will be able to understand over 8% of the text.

I bet if someone memorized the top 100 words then they would be able to understand about 80% of the text.

What is your opinion? Should a language course begin by teaching the most used words?
I think if you start with articles, pronouns, the verb for 'to be' (if your language has it), common conjunctions, enough verbs to learn the most common conjugation paradigms, enough nouns and adjectives to learn the most common declension patterns and the most common prepositions, you'd have the most common words in the language; plus a small vocabulary that would allow you to learn quite a lot of grammar.

There tend to be a lot of prepositions in some languages, so you could use text analysis and table/list of prepositions to figure out the most common ones. Same for conjunctions.

It might help to learn words of the same type (nouns, verbs, etc.) together since they are used in similar ways. Your text analysis combined with a regex search for the noun, verb, adjective endings would help with that.

makis (Montri la profilon) 2015-majo-25 17:26:54

Alkanadi: Ibet if someone memorized the top 100 words then they would be able to understand about 80% of the text.
I agree. That's why I made an Anki deck with the top 500 common words awhile back.

Still waiting for some kind of feedback on it... okulumo.gif

https://ankiweb.net/shared/info/293843977

lagtendisto (Montri la profilon) 2015-majo-25 18:09:53

AntCon (youtube) seems to be very useful freeware tool to analyze text corpora. His youtube channel has lots of tutorials about his software.

lagtendisto (Montri la profilon) 2015-majo-25 18:49:14

Venkistido:I think it is self-evident that word frequency tables are very useful for learners beginning to read a language, but the benefit beyond reading is another question. When texts are analysed in this way they produce "league tables" for the written language and not the spoken language.
Very good point.

Reen al la supro