Al la enhavo

The Common Voice project needs your voice in Esperanto - especially if you are female, under 18 or above 40

de ranulo, 2019-aŭgusto-13

Mesaĝoj: 3

Lingvo: English

ranulo (Montri la profilon) 2019-aŭgusto-13 16:45:31

The Common Voice project by Mozilla builds up a huge voice database in many languages, including Esperanto. This open database exists to help make speech recognition better and/or available in more languages. So, this is basically a project to make machine learning for speech recognition reachable for everyone, especially for smaller languages and startups without much capital. This is the website in Esperanto:

https://voice.mozilla.org/eo

You can do two things there:
  • donate your voice - you record short sentences in a language of your choice. It is very important that this project gets a diverse collection of audios. Right now, only 10% of the audio in the esperanto dataset is from female donors and they are also looking for people in the age group below 18 and above 40.
  • Listen to the records and validate that they are corect. This part is great to train your listening skills in Esperanto and you learn a lot about different Esperanto accents from different countries.
Right now (Aug. 2019) there are 20 hours of recordings in Esperanto from 144 speakers, 15 hours are already validated. The complete dataset is under a free license (CC0) so everyone can use it without any restrictions for their own private or commercial machine learning projects.

This could also help sites like Lernu or Duolingo. Many language versions of Duolingo support speak recognition, but the Esperanto tree doesn't. I assume that this is because of the complete lack of Esperanto speech recognition software. The common voice project is the first project I know of that really could change this situation. A lot of startups and big company will use this dataset simply because it is free. If they find a good dataset for Esperanto the chance exists that at least a few of them will train there system also with this data. (for example Google already supports Esperanto in google translate)

So in the end we might get speech recognition for Esperanto for a few services. Plus, this dataset could also be very useful for speech synthesizers that use machine learning.

I find it a lot of fun to donate some time to this project every now and then. What are you thinking about it?

ranulo (Montri la profilon) 2019-aŭgusto-14 08:42:38

The website itself is not completely translated into Esperanto yet. For everyone who wants to translate a few strings, the translation can be done on this website:
https://pontoon.mozilla.org/eo/common-voice/

bitcoin_support (Montri la profilon) 2019-novembro-07 12:35:52

Great work, keep it running.

Reen al la supro