Bug 786 - Garbled non-ASCII text in X selection pasting
: Garbled non-ASCII text in X selection pasting
Status: CLOSED FIXED
: Sisyphus
(All bugs in Sisyphus/emacs-X11)
: unstable
: all Linux
: P4 major
Assigned To:
:
:
:
:
:
  Show dependency tree
 
Reported: 2002-04-06 01:24 by
Modified: 2003-08-25 15:18 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2002-04-06 01:24:23
In the ru_RU.KOI8-R locale, select any text containing Russian characters in a
GTK+ window, then paste it in an Emacs buffer.
The result contains the original Russian strings mixed with garbage that seems
to be special control sequences.
---

---
emacs-X11-21.1-alt13
gtk+-1.2.10-alt2
XFree86-4.2.0-alt2
------- Comment #1 From 2002-04-07 13:25:28 -------
Seeing this on:

emacs-X11-21.1-alt13
gtk+-1.2.10-alt3
XFree86-4.0-ipl8mdk
------- Comment #2 From 2002-04-07 13:25:28 -------
Seeing this on:

emacs-X11-21.1-alt13
gtk+-1.2.10-alt3
XFree86-4.0-ipl8mdk
------- Comment #3 From 2002-04-07 19:35:01 -------
The cause of this is probably that GNU Emacs doesn\'t treat \"extended
segments\" as
specified by X Compound Text (section 6 in ctext.ps from XFree86 docs). (As far
as I understand, X Compound Text is a subset of iso2022 and these
\"segments\" -- their meaning -- is an extension specified only by X;
and Emacs parses the pure iso2022 and so pays no attention to the special
meaning of these \"segments\".)

You will not see this bug, if you start a gtk+ application in ru_RU.ISO8859-5
and set-selection-coding-system to compound-text, because then there is no need
in using the \"extended segments\" to transfer Cyrillic text.

This bug is not so easy to fix well (or even understand whether it is the
described problem) if one is not an Emacs hacker. We should follow up with them
concerning this issue.

A work-around could be requesting the selection as STRING instead of
COMPOUND-TEXT that is preferred now (by signalling an error in x-get-selection
wrapper-function) and treating the result as a koi8-r encoded string (or
whatever the common encoding is). Then, of course, multilingual texts will not
be transferred correctly.
------- Comment #4 From 2002-04-07 19:35:01 -------
The cause of this is probably that GNU Emacs doesn\'t treat \"extended
segments\" as
specified by X Compound Text (section 6 in ctext.ps from XFree86 docs). (As far
as I understand, X Compound Text is a subset of iso2022 and these
\"segments\" -- their meaning -- is an extension specified only by X;
and Emacs parses the pure iso2022 and so pays no attention to the special
meaning of these \"segments\".)

You will not see this bug, if you start a gtk+ application in ru_RU.ISO8859-5
and set-selection-coding-system to compound-text, because then there is no need
in using the \"extended segments\" to transfer Cyrillic text.

This bug is not so easy to fix well (or even understand whether it is the
described problem) if one is not an Emacs hacker. We should follow up with them
concerning this issue.

A work-around could be requesting the selection as STRING instead of
COMPOUND-TEXT that is preferred now (by signalling an error in x-get-selection
wrapper-function) and treating the result as a koi8-r encoded string (or
whatever the common encoding is). Then, of course, multilingual texts will not
be transferred correctly.
------- Comment #5 From 2002-04-07 19:47:59 -------
According to <a
href="http://www.pps.jussieu.fr/~jch/software/UTF8_STRING/">http://www.pps.jussieu.fr/~jch/software/UTF8_STRING/</a>
(linked from <a
href="http://www.freedesktop.org/standards/">http://www.freedesktop.org/standards/</a>),
there is also a new third way to transfer selection data in addition to STRING
and COMPOUND-TEXT: UTF8-STRING. Ideally, Emacs should prefer this way,
otherwise fallback to cumbersome COMPUND-TEXT (support for its X variant should
be fixed).
------- Comment #6 From 2002-04-07 19:47:59 -------
According to <a
href="http://www.pps.jussieu.fr/~jch/software/UTF8_STRING/">http://www.pps.jussieu.fr/~jch/software/UTF8_STRING/</a>
(linked from <a
href="http://www.freedesktop.org/standards/">http://www.freedesktop.org/standards/</a>),
there is also a new third way to transfer selection data in addition to STRING
and COMPOUND-TEXT: UTF8-STRING. Ideally, Emacs should prefer this way,
otherwise fallback to cumbersome COMPUND-TEXT (support for its X variant should
be fixed).
------- Comment #7 From 2002-04-08 08:29:53 -------
The latest GNU release fixes the problem (!), we ought to build it.

<a href="http://www.gnu.org/software/emacs/NEWS.21.2">http://www.gnu.org/software/emacs/NEWS.21.2</a>:

 * Changes in Emacs 21.2

** Emacs now supports ICCCM Extended Segments in X selections.

------- Comment #8 From 2002-04-08 08:29:53 -------
The latest GNU release fixes the problem (!), we ought to build it.

<a href="http://www.gnu.org/software/emacs/NEWS.21.2">http://www.gnu.org/software/emacs/NEWS.21.2</a>:

 * Changes in Emacs 21.2

** Emacs now supports ICCCM Extended Segments in X selections.

------- Comment #9 From 2002-04-13 16:49:12 -------
Now it works for koi8-r. Should test it in cp1251, and then will mark the bug
as resolved.
------- Comment #10 From 2002-04-13 16:49:12 -------
Now it works for koi8-r. Should test it in cp1251, and then will mark the bug
as resolved.
------- Comment #11 From 2002-04-13 18:39:19 -------
The worryings were justified. Emacs doesn\'t process extended segments in
miscrosoft-cp1251 or koi8-u.
------- Comment #12 From 2002-04-13 18:39:19 -------
The worryings were justified. Emacs doesn\'t process extended segments in
miscrosoft-cp1251 or koi8-u.
------- Comment #13 From 2002-04-14 18:16:47 -------
Selections with extended segments in koi8-r were correctly porcessed already in
emacs-21.2-alt1. (iso8859-5 is not used in extended segments, it is a part of
the standard compound text, and so it worked.)

This bug is completly fixed in emacs-21.2-alt4 where the support for koi8-u and
microsoft-cp1251 has been added. (This wasn\'t hard: add two more entries to a
table.)

To work correctly with X selections, one should use the deafult value for
selection-coding-system: compound-text-with-extensions. Modify .emacs
accordingly. etcskel-2.0.2-alt1 conforms to this requirement. emacs-21.2-alt2
and later conflicts with earlier etcskel to allow consistent update.

(Note: some old XFree86-libs construct wrong compound text with extensions, so
this fix/feature of Emacs can not be tested on such systems, e.g. with
XFree86-libs-4.0-ipl8mdk.)
------- Comment #14 From 2002-04-14 18:16:47 -------
Selections with extended segments in koi8-r were correctly porcessed already in
emacs-21.2-alt1. (iso8859-5 is not used in extended segments, it is a part of
the standard compound text, and so it worked.)

This bug is completly fixed in emacs-21.2-alt4 where the support for koi8-u and
microsoft-cp1251 has been added. (This wasn\'t hard: add two more entries to a
table.)

To work correctly with X selections, one should use the deafult value for
selection-coding-system: compound-text-with-extensions. Modify .emacs
accordingly. etcskel-2.0.2-alt1 conforms to this requirement. emacs-21.2-alt2
and later conflicts with earlier etcskel to allow consistent update.

(Note: some old XFree86-libs construct wrong compound text with extensions, so
this fix/feature of Emacs can not be tested on such systems, e.g. with
XFree86-libs-4.0-ipl8mdk.)