לתוכן העניינים

Kindle and Esperanto dictionarys

של chrisim101010, 2 במאי 2012

הודעות: 23

שפה: English

chrisim101010 (הצגת פרופיל) 9 ביוני 2012, 08:29:45

I played around with the software, and i can say with 100% confidence, that i am more confused now than i was yesterday.
You are right, the infliction's entered with 'esti' does come up in others such as 'devi', along with all the explanations of all the verbs entered! I must be missing something.
The dictionary probably does have enough have power, we just need to find out how to build it properly. I suspect this software is a bit too simple to handle such a large dictionary. Good for building books though.

Edit: I just found this page with an esperanto-english dictionary (and others). He also has the script that was used to generate the dictionary avaliable to download. The dictionary appears as good as the one in the kindle store, but it still cannot handle the different verb endings.

xdzt:
The Creator software uses a powerful algorithm to build the inflection index which allows to dramatically reduce the size required for the index : inflections are not stored as entries in the index, but are deduced from a set of rules, which are automatically generated based on the inflected forms contained in the publication. This applies to any language.
I haven't played with the software yet, but I wonder how complex these generated rules are. For example, if you had one entry for say, esti, with inflections for -is, -as, and -os. And a second entry, klini, without explicity inflections, will the generated rules automatically point to klini when you search klinas?

xdzt (הצגת פרופיל) 10 ביוני 2012, 07:53:07

So, I finally got a chance to play around with this some, and I think I've more or less got it all sussed. I've written a script to translate ESPDIC from text to a formatted dictionary HTML file, and inflections can be included in the process. I've tested it on a small section of ESPDIC and it works on my Kindle. The only question now is what inflections to include for which words, there are two extremes I can see (and a lot of space between them):

* Minimal inflections included per word. If it's a verb (dictionary entry ends in -i), include is/os/as/us/u, if it's a noun o/on/oj/ojn, adjective a/an/aj/ajn. More involved constructions to be left to the user to decipher.

* The other end of the spectrum is trying to include as many inflections as possible for every word. So, for example, abismo would have inflections not only for noun, but also adjective and verb and adverb, plus participles, with singular, plural, and accusative variations included when applicable. You could go a bridge farther with this, too, and include inflections for all the suffix combinations and prefixes, but that might be too much.

Let me know what you think would be most useful in terms of included inflections for a given word in ESPDIC. Tomorrow I'll take a stab at cleaning up the formatting a bit and generating a full transcription of ESPDIC.

EDIT: I couldn't resist so I went ahead and did a run of the entire ESPDIC using my script as-is (very inelegant at the moment, it indiscriminately adds a wide range of inflections to every word's entry -- basically all noun/verb/adjective/adverb endings + their combinations with participle endings -- because I currently just ignore the word form, it leads to some silly things like the dictionary thinking kaintojn should point to kaj). I scanned through a few lines of Aliaj Tempoj and it worked pretty darn well if I say so myself. Some clean up to the formatting and a check for words that don't have a -o/i/a/e ending should be sufficient to make a very decent Kindle Esperanto dictionary. Since ESPDIC is creative commons, I'll post a copy of the .prc file somewhere when I've worked out a few of the rough patches. The dictionary file came out to just around 2 mb, which isn't terrible -- I was a little worried when the HTML file I generated with my script was 300 mb!

chrisim101010 (הצגת פרופיל) 10 ביוני 2012, 11:49:49

I believe the dictionary should recognize as many common words and forms as possible. The English dictionary recognizes everything; a complete Esperanto dictionary cannot possibly be larger, so the kindle should be able to handle it.

At a later date, a dictionary with all the prefixes and suffixes would be good, and one with all the common constructions even better, but we should get a standard one going first.
Should be good to see the results

xdzt (הצגת פרופיל) 10 ביוני 2012, 19:03:33

OK, so I've cleaned up some of the rougher edges and it appears to be working pretty well. There are still a couple of oddities, but they seem pretty minor to me.

I've generated inflections for the basic endings o/j/n/is/as/os/us/u/a/e and their combinations, as well as their applicable combinations with it/at/ot/int/ant/ont. In the case where a participle form of the word has its own entry, the dictionary lookup may display a non-participle form (usually the adjective form, due to -a being first in the alphabet) instead. This could be overcome with a little work, but it seems like a pretty minor/rare problem to me.

I've also generated inflections for the x-system variants. There are a couple refinements possible here (similar issues to the above, where an inflection for an earlier alphabetical entry supercedes the word's actual entry), but it mostly works well. This is particularly nice for manually searching the dictionary, as the kindle's keyboard doesn't have a very extensive special character functionality.

I've uploaded the dictionary file here for now. Let me know how it works out for you. When I'm pretty sure I've got most of the big issues ironed out, I might see about uploading the file somewhere more permanent along with the python script I wrote.

It's also worth noting that, for me at least, though the Kindle is able to display Esperanto special characters in text and when the dictionary is open, the font of the small bubble with definitions that opens at the bottom appears to be unable to render the special characters. Words like "ŝanĝi" show up as " an i" on my Kindle in the little bubble, though they render perfectly in the dictionary and in a book.

If anyone can suggest a good plaintext Esperanto-Esperanto dictionary, I should also be able to adapt that to a Kindle dictionary with only minor modification of my script.

EDIT: I've used this some more, and the only thing that strongly needs fixing IMO is to make the participles redirect to the root verb rather than other words.

chrisim101010 (הצגת פרופיל) 11 ביוני 2012, 12:29:12

So far it is looking good. It is the first one i have tried that will recognize all verb forms. i have only played around with it for 10 minutes before writing this, but i see very little wrong with it. I shall try it out some more over the coming few days.
I do agree that pointing the participles to the verbs is the best idea. The other alternative is to give the participles their own entry, but i suspect that would also require adding new definitions.
I found a couple of oddities

drata goes to d-ro
Horloĝa not found
Serĉante not found

As for eo-eo dictionaries, there is the PIV. I am not sure if they will just give away their database though. They may agree if the final dictionary were available for sale in Amazon, and have the payments made to the PIV website, or whoever owns the copyright. Either way, it would be worth sending an email about the proposal and see what they say.
The Reta Vortaro is also a possibility, although i read somewhere that it is based on the PIV

xdzt (הצגת פרופיל) 11 ביוני 2012, 14:20:29

chrisim101010:drata goes to d-ro
Horloĝa not found
Serĉante not found
Thanks, these are really helpful. The drata > d-ro is one of the participle problems (it's generating d-rata as a inflection for d-ro). The changes I have in mind should fix this one.

The other two are surprising, however. It occurs to me that both those words have special characters, so maybe I'm doing something inadvertant when I generate the x-system inflections.

chrisim101010 (הצגת פרופיל) 11 ביוני 2012, 15:58:49

xdzt:
The other two are surprising, however. It occurs to me that both those words have special characters, so maybe I'm doing something inadvertant when I generate the x-system inflections.
I just played around with the special characters and discovered the following

ŝi = excessive and irrelevant answers
ĝin = not recognized
ŝatas = sati
ŝian = si
ŝajnas = not recognized

So there is definitely something strange happening to the special characters, especially concerning the examples that drop the hat and display a different word. That error gives the whole sentance a new, and sometimes strange meaning ridulo.gif

xdzt (הצגת פרופיל) 14 ביוני 2012, 14:29:44

I've worked on this problem a bit more and solved most of these issues. Only "ŝatas" still isn't obvious why it isn't working, everything I've fixed or I know how to fix. Amusingly, the problem with pronouns is that, since they end in -i, my program was considering them verbs and not giving them the -ia -in -iajn endings as inflections, but constructions like ĝis or ŝus would be recognized. ridulo.gif Hopefully, I can sort out these last few problems and everything will work more or less as expected, then I'll upload another version.

pdenisowski (הצגת פרופיל) 28 ביוני 2012, 15:22:32

Excellent work! This is exactly the kind of thing I hoped ESPDIC would be used for. If you get the dictionary to a "finished" state (as much as any dictionary is ever "finished"), I'd be happy to post it or link to it from the ESPDIC web page.

Amike,
Paul

cxau (הצגת פרופיל) 27 באפריל 2014, 22:25:01

xdzt:I've worked on this problem a bit more and solved most of these issues. Only "ŝatas" still isn't obvious why it isn't working, everything I've fixed or I know how to fix. Amusingly, the problem with pronouns is that, since they end in -i, my program was considering them verbs and not giving them the -ia -in -iajn endings as inflections, but constructions like ĝis or ŝus would be recognized. ridulo.gif Hopefully, I can sort out these last few problems and everything will work more or less as expected, then I'll upload another version.
Great thread with so much useful info! I also ran into a blog page and it could be yours, @ http://mywebsiteontheinternet.com/?p=9

Could you please share your latest version of this dict?

Koran dankon!

לראש הדף