Tartalom

Accented Character Encoding

Harun-tól, 2009. május 28.

Hozzászólások: 6

Nyelv: English

Harun (Profil megtekintése) 2009. május 28. 16:34:05

Hello, all.

I'm trying to study pdf books on LERNU with my screen reader. It can read Esperanto outloud, using the X-system. However, I have problems with the accented letters. This seems to be if I'm using my screen reader to read the file, or if I copy the text into an editor.

For example, I'm trying to read the Intro to Esperanto (http://en.lernu.net/dosiero.php?id=/komunaj/elsxut...), alphabet section

The uppercase letter "CX" is displayed as the copyright symbol, ©. The lowercase "cx" is a paragraph marker, ¶. These letters seem to be displayed properly on the screen, but perhaps there is an encoding problem?

Does anyone know what the encoding used is, and howto convert these characters?

Thanks in advance for your help.

jchthys (Profil megtekintése) 2009. május 28. 18:09:21

Is your screen reader set to an ASCII encoding instead of Unicode?

Harun (Profil megtekintése) 2009. május 28. 20:22:12

My screen reader (NVDA) uses Unicode, as so far as I can tell.

I've tried converting the ASCII text to UNICODE with no effect. The problem seems to be mostly with PDFs, like the ELIBRO books and LERNU PDFs.

The "Fratoj Grimm" content is read correctly.
http://en.lernu.net/biblioteko/rakontoj/gfabeloj/i...

jchthys (Profil megtekintése) 2009. május 28. 21:35:56

Yes, I noticed that when I tried to copy and paste text from the PDF of Gerda Malaperis! all the letters with hats became different symbols. I think that the PDFs are not encoded properly.

tommjames (Profil megtekintése) 2009. május 29. 12:29:01

I too get the problem when copying and pasting the text from Gerda Malaperis. In Adobe Reader, I notice the encoding of the fonts is set to "Ansi", which presumably is wrong. You can view the encoding in the document properties, Font tab.

One other PDF from lernu that works correctly though is the Detala Gramatiko.. In that file the encoding is set to "Built-in". I have no idea what that means but perhaps it helps.

ceigered (Profil megtekintése) 2009. május 30. 5:45:10

tommjames:I too get the problem when copying and pasting the text from Gerda Malaperis. In Adobe Reader, I notice the encoding of the fonts is set to "Ansi", which presumably is wrong. You can view the encoding in the document properties, Font tab.

One other PDF from lernu that works correctly though is the Detala Gramatiko.. In that file the encoding is set to "Built-in". I have no idea what that means but perhaps it helps.
Perhaps the 'built-in' setting means that the encoding is on application (e.g. PDF-reader) level as opposed to system level (e.g. encodings such as Unicode and ASCII et al.)

And I don't think 'application level' and 'system level' are correct terminology but nonetheless...

Vissza a tetejére