Kindle and Esperanto dictionarys
di chrisim101010, 02 maggio 2012
Messaggi: 23
Lingua: English
chrisim101010 (Mostra il profilo) 09 giugno 2012 08:29:45
You are right, the infliction's entered with 'esti' does come up in others such as 'devi', along with all the explanations of all the verbs entered! I must be missing something.
The dictionary probably does have enough have power, we just need to find out how to build it properly. I suspect this software is a bit too simple to handle such a large dictionary. Good for building books though.
Edit: I just found this page with an esperanto-english dictionary (and others). He also has the script that was used to generate the dictionary avaliable to download. The dictionary appears as good as the one in the kindle store, but it still cannot handle the different verb endings.
xdzt:The Creator software uses a powerful algorithm to build the inflection index which allows to dramatically reduce the size required for the index : inflections are not stored as entries in the index, but are deduced from a set of rules, which are automatically generated based on the inflected forms contained in the publication. This applies to any language.I haven't played with the software yet, but I wonder how complex these generated rules are. For example, if you had one entry for say, esti, with inflections for -is, -as, and -os. And a second entry, klini, without explicity inflections, will the generated rules automatically point to klini when you search klinas?
xdzt (Mostra il profilo) 10 giugno 2012 07:53:07
* Minimal inflections included per word. If it's a verb (dictionary entry ends in -i), include is/os/as/us/u, if it's a noun o/on/oj/ojn, adjective a/an/aj/ajn. More involved constructions to be left to the user to decipher.
* The other end of the spectrum is trying to include as many inflections as possible for every word. So, for example, abismo would have inflections not only for noun, but also adjective and verb and adverb, plus participles, with singular, plural, and accusative variations included when applicable. You could go a bridge farther with this, too, and include inflections for all the suffix combinations and prefixes, but that might be too much.
Let me know what you think would be most useful in terms of included inflections for a given word in ESPDIC. Tomorrow I'll take a stab at cleaning up the formatting a bit and generating a full transcription of ESPDIC.
EDIT: I couldn't resist so I went ahead and did a run of the entire ESPDIC using my script as-is (very inelegant at the moment, it indiscriminately adds a wide range of inflections to every word's entry -- basically all noun/verb/adjective/adverb endings + their combinations with participle endings -- because I currently just ignore the word form, it leads to some silly things like the dictionary thinking kaintojn should point to kaj). I scanned through a few lines of Aliaj Tempoj and it worked pretty darn well if I say so myself. Some clean up to the formatting and a check for words that don't have a -o/i/a/e ending should be sufficient to make a very decent Kindle Esperanto dictionary. Since ESPDIC is creative commons, I'll post a copy of the .prc file somewhere when I've worked out a few of the rough patches. The dictionary file came out to just around 2 mb, which isn't terrible -- I was a little worried when the HTML file I generated with my script was 300 mb!
chrisim101010 (Mostra il profilo) 10 giugno 2012 11:49:49
At a later date, a dictionary with all the prefixes and suffixes would be good, and one with all the common constructions even better, but we should get a standard one going first.
Should be good to see the results
xdzt (Mostra il profilo) 10 giugno 2012 19:03:33
I've generated inflections for the basic endings o/j/n/is/as/os/us/u/a/e and their combinations, as well as their applicable combinations with it/at/ot/int/ant/ont. In the case where a participle form of the word has its own entry, the dictionary lookup may display a non-participle form (usually the adjective form, due to -a being first in the alphabet) instead. This could be overcome with a little work, but it seems like a pretty minor/rare problem to me.
I've also generated inflections for the x-system variants. There are a couple refinements possible here (similar issues to the above, where an inflection for an earlier alphabetical entry supercedes the word's actual entry), but it mostly works well. This is particularly nice for manually searching the dictionary, as the kindle's keyboard doesn't have a very extensive special character functionality.
I've uploaded the dictionary file here for now. Let me know how it works out for you. When I'm pretty sure I've got most of the big issues ironed out, I might see about uploading the file somewhere more permanent along with the python script I wrote.
It's also worth noting that, for me at least, though the Kindle is able to display Esperanto special characters in text and when the dictionary is open, the font of the small bubble with definitions that opens at the bottom appears to be unable to render the special characters. Words like "ŝanĝi" show up as " an i" on my Kindle in the little bubble, though they render perfectly in the dictionary and in a book.
If anyone can suggest a good plaintext Esperanto-Esperanto dictionary, I should also be able to adapt that to a Kindle dictionary with only minor modification of my script.
EDIT: I've used this some more, and the only thing that strongly needs fixing IMO is to make the participles redirect to the root verb rather than other words.
chrisim101010 (Mostra il profilo) 11 giugno 2012 12:29:12
I do agree that pointing the participles to the verbs is the best idea. The other alternative is to give the participles their own entry, but i suspect that would also require adding new definitions.
I found a couple of oddities
drata goes to d-ro
Horloĝa not found
Serĉante not found
As for eo-eo dictionaries, there is the PIV. I am not sure if they will just give away their database though. They may agree if the final dictionary were available for sale in Amazon, and have the payments made to the PIV website, or whoever owns the copyright. Either way, it would be worth sending an email about the proposal and see what they say.
The Reta Vortaro is also a possibility, although i read somewhere that it is based on the PIV
xdzt (Mostra il profilo) 11 giugno 2012 14:20:29
chrisim101010:drata goes to d-roThanks, these are really helpful. The drata > d-ro is one of the participle problems (it's generating d-rata as a inflection for d-ro). The changes I have in mind should fix this one.
Horloĝa not found
Serĉante not found
The other two are surprising, however. It occurs to me that both those words have special characters, so maybe I'm doing something inadvertant when I generate the x-system inflections.
chrisim101010 (Mostra il profilo) 11 giugno 2012 15:58:49
xdzt:I just played around with the special characters and discovered the following
The other two are surprising, however. It occurs to me that both those words have special characters, so maybe I'm doing something inadvertant when I generate the x-system inflections.
ŝi = excessive and irrelevant answers
ĝin = not recognized
ŝatas = sati
ŝian = si
ŝajnas = not recognized
So there is definitely something strange happening to the special characters, especially concerning the examples that drop the hat and display a different word. That error gives the whole sentance a new, and sometimes strange meaning
xdzt (Mostra il profilo) 14 giugno 2012 14:29:44
pdenisowski (Mostra il profilo) 28 giugno 2012 15:22:32
Amike,
Paul
cxau (Mostra il profilo) 27 aprile 2014 22:25:01
xdzt:I've worked on this problem a bit more and solved most of these issues. Only "ŝatas" still isn't obvious why it isn't working, everything I've fixed or I know how to fix. Amusingly, the problem with pronouns is that, since they end in -i, my program was considering them verbs and not giving them the -ia -in -iajn endings as inflections, but constructions like ĝis or ŝus would be recognized. Hopefully, I can sort out these last few problems and everything will work more or less as expected, then I'll upload another version.Great thread with so much useful info! I also ran into a blog page and it could be yours, @ http://mywebsiteontheinternet.com/?p=9
Could you please share your latest version of this dict?
Koran dankon!