Al la enhavo

Building a computational grammar for Esperanto, will need linguistic assistance (questions)

de juliaH, 2012-aprilo-27

Mesaĝoj: 46

Lingvo: English

juliaH (Montri la profilon) 2012-aprilo-27 04:54:57

Hi Everyone!

I am building a computational grammar resource for Esperanto within the Chalmers University multilingual grammar system GF (Grammatical Framework), http://www.grammaticalframework.org/. I am a newbie to Esperanto, so I will need all linguistic assistance I can get. The system will be free to use for anyone, and it can be used to get a syntactic analysis (what grammar components it consists of and how they are put together) of a string of words, or a string representation (a phrase or sentence) of a syntactic structure. The system describes the core grammar of the implemented languages, but is continuously growing.

I will post grammar questions in this thread, and examples in the thread "Building a computational grammar for Esperanto, need linguistic assistance (examples to proofread)".

Thank you beforehand for your assistance,
Julia

juliaH (Montri la profilon) 2012-aprilo-27 05:31:14

Subject 1: Number for the nominal complements of quantitative adverbs
Subject 2: Case for quantitative adverbs as verb complements

Question: Is it correct that when using the adverb "multe" for expressing quantity:

1. Both "many apples" and "much apples" could be translated to the singular "multe da fruto" or plural "multe da frutoj"

2. "Multe" in these cases could never be in the object case *"multen da fruto(j)"?

Thanks!

Evildela (Montri la profilon) 2012-aprilo-27 09:45:29

juliaH:Subject 1: Number for the nominal complements of quantitative adverbs
Subject 2: Case for quantitative adverbs as verb complements

Question: Is it correct that when using the adverb "multe" for expressing quantity:

1. Both "many apples" and "much apples" could be translated to the singular "multe da fruto" or plural "multe da frutoj"

2. "Multe" in these cases could never be in the object case *"multen da fruto(j)"?

Thanks!
Many apples = multaj pomoj or multe da pomoj
I have never really heard someone say much apples before, but it would translate the same.
You would never say multe da pomo, it defeats the purpose of this particular string, however there is instance where you can say it in singular. example: multe da inko, because ink is a noun which is naturally plural, or multe da akvo, again it's naturally plural.

sudanglo (Montri la profilon) 2012-aprilo-27 10:14:24

In English Many is for countables, Much for uncountables. Anyway, Much apples is wrong.

Many apples may be rendered as Multe da pomoj, or Multaj pomoj.

What is a computational grammar, anyway, Julia? Is this a grammar for computers?

An adverb in the accusative has a special usage in Esperanto - showing direction. Venu hejmen - Come home.

As I can't see how multe would indicate a location, I would not expect to see multen.

juliaH (Montri la profilon) 2012-aprilo-27 11:03:22

Evildela:Many apples = multaj pomoj or multe da pomoj
I have never really heard someone say much apples before, but it would translate the same.
You would never say multe da pomo, it defeats the purpose of this particular string, however there is instance where you can say it in singular. example: multe da inko, because ink is a noun which is naturally plural, or multe da akvo, again it's naturally plural.
If I would want to say "many inks" as I might do in a store with different brands of ink (even though the super correct form probably would be "many sorts of ink" since "ink" is a mass noun in English too), would that translate as "multaj inko", that is with a modifying adjective that does not match the number suffix of the noun, or would "inko" take the plural suffix in this special case? Or would it plainly be an ungrammatical phrase?

In for example English some words never take the singular, such as "pants" or "scissors", are there words like this in Esperanto too?

juliaH (Montri la profilon) 2012-aprilo-27 11:47:29

sudanglo:In English Many is for countables, Much for uncountables. Anyway,Much apples is wrong. Many apples may be rendered as Multe da pomoj, or Multaj pomoj. What is a computational grammar, anyway, Julia? Is this a grammar for computers? An adverb in the accusative has a special usage in Esperanto - showing direction. Venu hejmen - Come home. As I can't see how multe would indicate a location, I would not expect to see multen.
Hi Sudanglo,
A computational grammar is a formalised (i.e. extremely precise according to some consistent manual or model of how to describe things) description of the grammar of some language, written as a computer program. In the case of my program, the intention is to use the linguistic knowledge it encodes to serve as a knowledge base for other computer programs, such as grammar checkers, translation programs and programs for learning languages, in order to facilitate their construction, so that the programmers won't need to do everything from scratch. The system is multilingual, it relates many different languages to each other by a common structure, so you can use it for building programs that deals with more than one language too.

At present, my grammar doesn't come equipped with an ontology (an inventory of the concepts you can use your words to refer to and how they relate to each other), instead it relies on the user of the program to be able to relate words to the stuff they represent, and don't prompt the program for grammatically correct strings (such as "such a beautiful beautiful day" cf. "such a beautiful sunny day" ) that is just semantically (i.e. with regard to meaning) weird. So for example, it can give you all inflection forms of a a noun (pomo, pomoj, pomon, pomojn), and you could build the system in such a way that it could make adverbs out of nouns (hejmo -> hejme) and put that adverb in object case (hejme -> hejmen), but without an ontology, it wouldn't know that this is ok to do with homes but not with apples, so you could end up with "pome" and "pomen", which again is weird with regard to meaning. To a human mind this is very easy to see that something is fishy with pome, because we just dont use language to describe the world in such a way, but for a computer it needs a rule for how to differentiate between the category of stuff that is places and stuff that is not. I expect that to add an ontology is a task for later.

Hope this helps to explain what a computational grammar is or could be ridulo.gif

sudanglo (Montri la profilon) 2012-aprilo-27 12:19:07

Ink can be countable and uncountable in English - so a lot of different inks is perfectly acceptable.

And it would not offend my lingvosento to say multaj inkoj de diversaj koloroj in Esperanto.

I cannot think of any case where X-aj Y-o would be correct. Adjectives agree in number with the noun.

Though possibly one might say ruĝa kaj blua inkoj to distinguish it from ruĝaj kaj bluaj inkoj. That is one red ink and one blue ink as opposed to several red inks and several blue inks.

Plural means plural in Esperanto.

Neither trousers or scissors are plural in Esperanto unless you are referring to several pairs of trousers and several scissors.

In general all these sorts of oddities of natural languages are absent from Esperanto. What would be the point of having them? It is a fundamental mistake to view Esperanto as just like a minority natural language.

The shared cultural value among the speakers is that any sort of higgledy-piggledy evolution so characteristic of the natural languages is resisted. There are perhaps some 'illogical' features hallowed by usage but not many.

erinja (Montri la profilon) 2012-aprilo-27 12:25:23

JuliaH, I apologise in advance if you already know everything I'm about to say, but are you aware of the extensive computational grammar work that has already been done on Esperanto? I can think of a couple of projects, off the top of my head. One is an Esperanto grammar checker (Lingvohelpilo)

Maybe this isn't useful if your whole project is just to plug Esperanto into an existing software and see what it does. But a lot of computational linguistics work has already been done on Esperanto, and if you want to benefit from any of that, you may want to read some of the literature about the previous work, and possibly contact some of the authors of work that may relate to your project.

Eckhard Bick and Ilona Koutny are a couple of names to get you started, if you haven't contacted them already. And if you haven't already done so, I suggest getting a copy of Modernaj Teknologioj por Esperanto, which has the conference proceedings of KAEST, a conference where a good number of presentations dealt with Esperanto as it relates to machine translation, computational linguistics, etc. I think the references accompanying each article should be of particular interest. The KAEST website has downloadable versions of KAEST presentations, but those are just PDF versions of powerpoint presentations, which are not as detailed or explanatory as the scientific papers written on the topics.

I don't know if every paper in this field has an English version. Much of the work is published in Esperanto, which is an excellent chance to improve your Esperanto.

erinja (Montri la profilon) 2012-aprilo-27 12:29:04

Evildela: because ink is a noun which is naturally plural, or multe da akvo, again it's naturally plural.
I wouldn't call ink and water "naturally plural". They are considered to be a mass of something rather than a single item, so they are uncountable, versus countable.

You can certainly say inkoj (different inks), akvoj (different waters).

It's the same with flour, you would say "multe da faruno" because flour is a whole mass of something, it isn't a single piece that you can pick up and hold, it's a quasi-fluid mass of many tiny pieces. Therefore it's uncountable, and you'd say "multe da faruno", or if you meant that there were several different types of flour, you might indeed say "Estas multaj diversaj farunoj en tiu vendejo!"

That's why computational grammar isn't so easy as it looks on its face, even with a relatively easy and straightforward language like Esperanto. "multaj farunoj" is wrong in many contexts, but in the right context, it's completely correct.

juliaH (Montri la profilon) 2012-aprilo-27 17:08:46

erinja:are you aware of the extensive computational grammar work that has already been done on Esperanto?
Hi Erinja,
Thank you so much for your advice, it is very welcome indeed! I am very grateful for all assistance, recommendations and hints.

The main point with GF however, is not primarily to implement a new grammar resource for Esperanto, I have seen so much amazing work you guys have done; but to test some hypothesis about common structures in language, and how that relates to programming multilingual grammars, as well as testing some specific approaches to encode grammar structures with mathematical type theory, to see if it can capture language in beneficial ways. To do this, languages have to be implemented, and there is no full GF grammar for an artificial language yet (some work is done on Interlingua though). The resulting computational grammar is just a nice and hopefully useful by-product. So really, what I am doing is comparing implementing an artificial language, which happens to be Esperanto, in GF with implementing natural languages, and also checking out if it is a suitable candidate as role model for other language implementations. Does the common language structure hypothesised in GF work for Esperanto as well, or does this language have special requirements that the underlying language structure intended for natural languages does not support, and are those requirements in that case due to the artificial origin of Esperanto or something else? So far it seems to work pretty well, even though some grammar features, such as the lovely correlatives and the word derivation process are making some trouble due to their regular structure not so common to natural languages ridulo.gif .

Some things I think this grammar will be good at due to how it encodes grammatical relations, is for example to capture some of the following incorrect results from the three Traduku computational grammars and Lingvohelpilo:

1)
The man who is there sleeps:
T1: *La viro kiu estas dormoj.
T2: La viro, kiu estas, tie dormas. (Strange punctuation, but ok otherwise I imagine.)
T3: *La viro kiu estas tie dormoj.
2)
He is over there:
T1: ?Li estas tien. (Is directional adverb ok here as an interpretation of "over there"?)
T2: *Li estas superi tie.
T3: *Li finas tie.
3)
L: La homo, kiuj estas tie dormas.
L: Li estas superantaj tie.
L: Donu al mi multe da pomo. (Just discussed.)
4)
Give me some apple!:
T1: Don mi iuj pomo.

Also, it gives basic translations to/from:

Amharic (partial), Arabic (partial), Bulgarian, Catalan, Danish, Dutch, English, Finnish, French, German, Hindi (fragments), Interlingua, Italian, Latin (fragments), Latvian, Nepali, Norwegian bokmål, Persian, Polish, Punjabi, Romanian, Russian, Spanish, Swedish, Thai, Turkish (fragments) and Urdu.

I hope it will be helpful to the Esperanto community.

Reen al la supro