Skip to the content

Most Common Esperanto Words

by Alkanadi, May 7, 2015

Messages: 20

Language: English

Alkanadi (User's profile) May 7, 2015, 3:22:56 PM

After much toil and trouble (because I am not the best programmer in the word), I successfully created a nice Python program that can read a text and output the number of occurrences that each word appears.

Using the Esperanto Alice in Wonderland text from Project Gutenberg, these are the top 10 most used words and the number of times that they appear:

la - 2153 times
kaj - 659 times
mi - 517 times
ŝi - 507 times
ne - 457 times
vi - 370 times
alicio - 347 times
diris - 331 times
al - 314 times
en - 296 times

There are a total of 24562 words in the source text that I used. That means that if someone learns the word la then they will be able to understand over 8% of the text.

I bet if someone memorized the top 100 words then they would be able to understand about 80% of the text.

What is your opinion? Should a language course begin by teaching the most used words?

Leke (User's profile) May 7, 2015, 3:32:33 PM

alicio?

Tempodivalse (User's profile) May 7, 2015, 3:40:18 PM

Alkanadi:There are a total of 24562 words in the source text that I used. That means that if someone learns the word la then they will be able to understand over 8% of the text.

I bet if someone memorized the top 100 words then they would be able to understand about 80% of the text.
I'm not sure this is the way text comprehension works. The percentage of intelligible words does not correlate to percentage of information you actually understood. What matters is that the words give clues to context.

Suppose that, without any learning, you understood 0% of an Esperanto text. Now you are taught "la", which occurs X%, means "the". You will find that you still understand nothing what the text is about, because "la" doesn't give you information if you don't know what it attaches to.

To better illustrate, consider the following philosophical jargon:

--> Supererogatory choice-options are inherent in most deontological but not utilitarian axiologies.

Most people don't know technical philosophical terms. The average person would say, "I do not understand at all what this sentence means," despite obviously recognising the words "are", "of," "not," etc. This is because very large amounts of context are left unclear.
What is your opinion? Should a language course begin by teaching the most used words?
I think yes, because without the most common words, you cannot build or begin to understand any sentence - they are the fundamental "building blocks." However, the foundation doesn't make a building, so it's good to expand one's vocabulary in short order.

Alkanadi (User's profile) May 7, 2015, 3:55:47 PM

Leke:alicio?
Yes but I made all the words lower case before sorting them. Because I don't want La to be counted as a different word than la.

In the text the name appears as Alicio

Alkanadi (User's profile) May 7, 2015, 3:59:23 PM

Tempodivalse:I'm not sure this is the way text comprehension works. The percentage of intelligible words does not correlate to percentage of information you actually understood. What matters is that the words give clues to context.
I agree. The most common words are glue-words that hold a sentence together. We need to understand a high percentage of the words (maybe about 95%) before we can understand the meaning of a story.

Clarence666 (User's profile) May 7, 2015, 4:06:10 PM

> Using the Esperanto Alice in Wonderland text from Project
> Gutenberg, these are the top 10 most used words

la - 2153 times

> if someone learns the word la then they will
> be able to understand over 8%

Overuse of " la" ... bad translation from EN

Trouzo de "la" ... fusxa traduko el EN

diris - 331 times

eble "diras" "diri" diros" ... estas la sama vorto ???

alicio - 347 times

???

ŝi - 507 times

Vidu: http://akademio-de-esperanto.org/fundamento/antaup...

> I bet if someone memorized the top 100 words then they
> would be able to understand about 80% of the text.

Good, give me your bucks ... Bone, donu al mi antauxe viajn stelojn okulumo.gif

> language course begin by teaching the most used words?

YES / JES

1Guy1 (User's profile) May 7, 2015, 4:36:44 PM

There is frequency based word learning on Lernu here

This (from memory so forgive me if I am wrong) was based on analysing the speech of Esperanto speakers.

I would be very interested in trying your program Alkanadi,is it on line anywhere? Is it just for Eo? It could be useful with learning to read any e-text.

Tempodivalse (User's profile) May 7, 2015, 4:40:08 PM

For anyone who's interested, here are the 100 most common words based on an analysis of Esperanto works in Project Gutenberg (original).

I modified this list to feature only basic forms, that is, nominatives, singulars, and infinitives (e.g., ŝi and ŝin are not counted separately). I think any good learning method should present these as soon as possible.

1 | la | the
2 | kaj | and
3 | de | of, from
4 | mi | I, me
5 | en | in, into, inside
6 | al | to [indirect object]
7 | ne | no, not
8 | li | he, him
9 | esti | to be
10 | ke | that [conjunction]
11 | vi | you
12 | por | for, in order to
13 | sed | but
14 | ŝi | she, her
15 | ĉi | [proximity]
16 | kun | with
17 | ni | us, we
18 | sur | upon, on top of
19 | kiu | who, whom, which X
20 | tiu | that [demonstrative pronoun]
21 | ili | they, them
22 | per | with, by means of
23 | ĝi | it
24 | kiel | how, in what way
25 | pli | more [comparative]
26 | el | from, out of
27 | da | of [for quantity]
28 | pri | about, concerning
29 | unu | one [cardinal numeral]
30 | diri | to say
31 | tiel | so, thus, in that way
32 | kiam | when
33 | tio | that [demonstrative pronoun]
34 | oni | one [indeterminate pronoun]
35 | ĉar | because
36 | jam | already
37 | nur | only, merely
38 | se | if
39 | si | -self [reflexive pronoun]
40 | aŭ | or
41 | plej | most [superlative comparison]
42 | nun | now
43 | tre | very
44 | povi | to be able to
45 | ĉu | [indicates yes-no question]
46 | post | after
47 | ĉiu | everyone, every X
48 | antaŭ | before [time], in front of [space]
49 | ankaŭ | also
50 | tie | there, over there [location]
51 | tute | entirely, completely
52 | eĉ | even [emphasis]
53 | lingvo | language
54 | ol | than [comparison]
55 | ankoraŭ | still, yet
56 | ĉe | at, around [location]
57 | havi | to have, to possess
58 | tamen | however
59 | ĝis | until
60 | pro | because of, on account of
61 | dum | while, during
62 | tiam | then, at that time
63 | jen | [points out an object]
64 | kio | what
65 | je | [generic preposition]
66 | du | two
67 | dio | god
68 | sinjoro | Mister, sir, gentleman
69 | ja | emphatic yes
70 | mem | -self
71 | devi | must, to have to
72 | tuj | immediately
73 | ĉio | everything
74 | ĉiam | always
75 | kvazaŭ | as though, as if
76 | iom | some amount, a little
77 | granda | big
78 | laŭ | along, according to
79 | eble | possibly, maybe
80 | bona | good
81 | tial | for that reason
82 | kie | where [location]
83 | homo | person
84 | do | thus, therefore
85 | tempo | time
86 | vidi | to see
87 | respondi | to reply, to respond
88 | tri | three
89 | multe | much, many, a lot
90 | alia | other, another
91 | neniam | never
92 | scii | to know
93 | sen | without
94 | patro | father
95 | preskaŭ | almost
96 | vivo | life
97 | momento | moment
98 | maro | sea
99 | tia | that kind of X
100 | fari | to do

1Guy1 (User's profile) May 7, 2015, 4:40:42 PM

I also wanted to add that the logic of learning the most frequent words is to cut down dictionary use. I have the Readers Greek and Readers Hebrew Bible that expect you to know a core vocabulary and then gives you the rarer words as footnotes. This works really well.

Kirilo81 (User's profile) May 7, 2015, 6:12:36 PM

Why do Esperantists always tend to reinvent the wheel?
(BTW: Inhaltsverzeichnis = table of content)

Back to the top