Späť na obsah

Accented Character Encoding

od Harun, 28. mája 2009

Príspevky: 6

Jazyk: English

Harun (Zobraziť profil) 28. mája 2009 16:34:05

Hello, all.

I'm trying to study pdf books on LERNU with my screen reader. It can read Esperanto outloud, using the X-system. However, I have problems with the accented letters. This seems to be if I'm using my screen reader to read the file, or if I copy the text into an editor.

For example, I'm trying to read the Intro to Esperanto (http://en.lernu.net/dosiero.php?id=/komunaj/elsxut...), alphabet section

The uppercase letter "CX" is displayed as the copyright symbol, ©. The lowercase "cx" is a paragraph marker, ¶. These letters seem to be displayed properly on the screen, but perhaps there is an encoding problem?

Does anyone know what the encoding used is, and howto convert these characters?

Thanks in advance for your help.

jchthys (Zobraziť profil) 28. mája 2009 18:09:21

Is your screen reader set to an ASCII encoding instead of Unicode?

Harun (Zobraziť profil) 28. mája 2009 20:22:12

My screen reader (NVDA) uses Unicode, as so far as I can tell.

I've tried converting the ASCII text to UNICODE with no effect. The problem seems to be mostly with PDFs, like the ELIBRO books and LERNU PDFs.

The "Fratoj Grimm" content is read correctly.
http://en.lernu.net/biblioteko/rakontoj/gfabeloj/i...

jchthys (Zobraziť profil) 28. mája 2009 21:35:56

Yes, I noticed that when I tried to copy and paste text from the PDF of Gerda Malaperis! all the letters with hats became different symbols. I think that the PDFs are not encoded properly.

tommjames (Zobraziť profil) 29. mája 2009 12:29:01

I too get the problem when copying and pasting the text from Gerda Malaperis. In Adobe Reader, I notice the encoding of the fonts is set to "Ansi", which presumably is wrong. You can view the encoding in the document properties, Font tab.

One other PDF from lernu that works correctly though is the Detala Gramatiko.. In that file the encoding is set to "Built-in". I have no idea what that means but perhaps it helps.

ceigered (Zobraziť profil) 30. mája 2009 5:45:10

tommjames:I too get the problem when copying and pasting the text from Gerda Malaperis. In Adobe Reader, I notice the encoding of the fonts is set to "Ansi", which presumably is wrong. You can view the encoding in the document properties, Font tab.

One other PDF from lernu that works correctly though is the Detala Gramatiko.. In that file the encoding is set to "Built-in". I have no idea what that means but perhaps it helps.
Perhaps the 'built-in' setting means that the encoding is on application (e.g. PDF-reader) level as opposed to system level (e.g. encodings such as Unicode and ASCII et al.)

And I don't think 'application level' and 'system level' are correct terminology but nonetheless...

Nahor