Contenido

Accented Character Encoding

de Harun, 28 de mayo de 2009

Aportes: 6

Idioma: English

Harun (Mostrar perfil) 28 de mayo de 2009 16:34:05

Hello, all.

I'm trying to study pdf books on LERNU with my screen reader. It can read Esperanto outloud, using the X-system. However, I have problems with the accented letters. This seems to be if I'm using my screen reader to read the file, or if I copy the text into an editor.

For example, I'm trying to read the Intro to Esperanto (http://en.lernu.net/dosiero.php?id=/komunaj/elsxut...), alphabet section

The uppercase letter "CX" is displayed as the copyright symbol, ©. The lowercase "cx" is a paragraph marker, ¶. These letters seem to be displayed properly on the screen, but perhaps there is an encoding problem?

Does anyone know what the encoding used is, and howto convert these characters?

Thanks in advance for your help.

jchthys (Mostrar perfil) 28 de mayo de 2009 18:09:21

Is your screen reader set to an ASCII encoding instead of Unicode?

Harun (Mostrar perfil) 28 de mayo de 2009 20:22:12

My screen reader (NVDA) uses Unicode, as so far as I can tell.

I've tried converting the ASCII text to UNICODE with no effect. The problem seems to be mostly with PDFs, like the ELIBRO books and LERNU PDFs.

The "Fratoj Grimm" content is read correctly.
http://en.lernu.net/biblioteko/rakontoj/gfabeloj/i...

jchthys (Mostrar perfil) 28 de mayo de 2009 21:35:56

Yes, I noticed that when I tried to copy and paste text from the PDF of Gerda Malaperis! all the letters with hats became different symbols. I think that the PDFs are not encoded properly.

tommjames (Mostrar perfil) 29 de mayo de 2009 12:29:01

I too get the problem when copying and pasting the text from Gerda Malaperis. In Adobe Reader, I notice the encoding of the fonts is set to "Ansi", which presumably is wrong. You can view the encoding in the document properties, Font tab.

One other PDF from lernu that works correctly though is the Detala Gramatiko.. In that file the encoding is set to "Built-in". I have no idea what that means but perhaps it helps.

ceigered (Mostrar perfil) 30 de mayo de 2009 05:45:10

tommjames:I too get the problem when copying and pasting the text from Gerda Malaperis. In Adobe Reader, I notice the encoding of the fonts is set to "Ansi", which presumably is wrong. You can view the encoding in the document properties, Font tab.

One other PDF from lernu that works correctly though is the Detala Gramatiko.. In that file the encoding is set to "Built-in". I have no idea what that means but perhaps it helps.
Perhaps the 'built-in' setting means that the encoding is on application (e.g. PDF-reader) level as opposed to system level (e.g. encodings such as Unicode and ASCII et al.)

And I don't think 'application level' and 'system level' are correct terminology but nonetheless...

Volver arriba