לתוכן העניינים

Accented Character Encoding

של Harun, 28 במאי 2009

הודעות: 6

שפה: English

Harun (הצגת פרופיל) 28 במאי 2009, 16:34:05

Hello, all.

I'm trying to study pdf books on LERNU with my screen reader. It can read Esperanto outloud, using the X-system. However, I have problems with the accented letters. This seems to be if I'm using my screen reader to read the file, or if I copy the text into an editor.

For example, I'm trying to read the Intro to Esperanto (http://en.lernu.net/dosiero.php?id=/komunaj/elsxut...), alphabet section

The uppercase letter "CX" is displayed as the copyright symbol, ©. The lowercase "cx" is a paragraph marker, ¶. These letters seem to be displayed properly on the screen, but perhaps there is an encoding problem?

Does anyone know what the encoding used is, and howto convert these characters?

Thanks in advance for your help.

jchthys (הצגת פרופיל) 28 במאי 2009, 18:09:21

Is your screen reader set to an ASCII encoding instead of Unicode?

Harun (הצגת פרופיל) 28 במאי 2009, 20:22:12

My screen reader (NVDA) uses Unicode, as so far as I can tell.

I've tried converting the ASCII text to UNICODE with no effect. The problem seems to be mostly with PDFs, like the ELIBRO books and LERNU PDFs.

The "Fratoj Grimm" content is read correctly.
http://en.lernu.net/biblioteko/rakontoj/gfabeloj/i...

jchthys (הצגת פרופיל) 28 במאי 2009, 21:35:56

Yes, I noticed that when I tried to copy and paste text from the PDF of Gerda Malaperis! all the letters with hats became different symbols. I think that the PDFs are not encoded properly.

tommjames (הצגת פרופיל) 29 במאי 2009, 12:29:01

I too get the problem when copying and pasting the text from Gerda Malaperis. In Adobe Reader, I notice the encoding of the fonts is set to "Ansi", which presumably is wrong. You can view the encoding in the document properties, Font tab.

One other PDF from lernu that works correctly though is the Detala Gramatiko.. In that file the encoding is set to "Built-in". I have no idea what that means but perhaps it helps.

ceigered (הצגת פרופיל) 30 במאי 2009, 05:45:10

tommjames:I too get the problem when copying and pasting the text from Gerda Malaperis. In Adobe Reader, I notice the encoding of the fonts is set to "Ansi", which presumably is wrong. You can view the encoding in the document properties, Font tab.

One other PDF from lernu that works correctly though is the Detala Gramatiko.. In that file the encoding is set to "Built-in". I have no idea what that means but perhaps it helps.
Perhaps the 'built-in' setting means that the encoding is on application (e.g. PDF-reader) level as opposed to system level (e.g. encodings such as Unicode and ASCII et al.)

And I don't think 'application level' and 'system level' are correct terminology but nonetheless...

לראש הדף