Bug 786

Summary: Garbled non-ASCII text in X selection pasting
Product: Sisyphus Reporter: Mikhail Zabaluev <mhz>
Component: emacs-X11Assignee: Ivan Zakharyaschev <imz>
Status: CLOSED FIXED QA Contact:
Severity: major    
Priority: P4    
Version: unstable   
Hardware: all   
OS: Linux   

Description Mikhail Zabaluev 2002-04-06 01:24:23 MSD
In the ru_RU.KOI8-R locale, select any text containing Russian characters in a GTK+ window, then paste it in an Emacs buffer.
The result contains the original Russian strings mixed with garbage that seems to be special control sequences.
---

---
emacs-X11-21.1-alt13
gtk+-1.2.10-alt2
XFree86-4.2.0-alt2

Comment 1 imz 2002-04-07 13:25:28 MSD
Seeing this on:

emacs-X11-21.1-alt13
gtk+-1.2.10-alt3
XFree86-4.0-ipl8mdk
Comment 2 imz 2002-04-07 13:25:28 MSD
Seeing this on:

emacs-X11-21.1-alt13
gtk+-1.2.10-alt3
XFree86-4.0-ipl8mdk
Comment 3 imz 2002-04-07 19:35:01 MSD
The cause of this is probably that GNU Emacs doesn\'t treat \&quot;extended segments\&quot; as
specified by X Compound Text (section 6 in ctext.ps from XFree86 docs). (As far as I understand, X Compound Text is a subset of iso2022 and these \&quot;segments\&quot; -- their meaning -- is an extension specified only by X; and Emacs parses the pure iso2022 and so pays no attention to the special meaning of these \&quot;segments\&quot;.)

You will not see this bug, if you start a gtk+ application in ru_RU.ISO8859-5 and set-selection-coding-system to compound-text, because then there is no need in using the \&quot;extended segments\&quot; to transfer Cyrillic text.

This bug is not so easy to fix well (or even understand whether it is the described problem) if one is not an Emacs hacker. We should follow up with them concerning this issue.

A work-around could be requesting the selection as STRING instead of COMPOUND-TEXT that is preferred now (by signalling an error in x-get-selection wrapper-function) and treating the result as a koi8-r encoded string (or whatever the common encoding is). Then, of course, multilingual texts will not be transferred correctly.
Comment 4 imz 2002-04-07 19:35:01 MSD
The cause of this is probably that GNU Emacs doesn\'t treat \&quot;extended segments\&quot; as
specified by X Compound Text (section 6 in ctext.ps from XFree86 docs). (As far as I understand, X Compound Text is a subset of iso2022 and these \&quot;segments\&quot; -- their meaning -- is an extension specified only by X; and Emacs parses the pure iso2022 and so pays no attention to the special meaning of these \&quot;segments\&quot;.)

You will not see this bug, if you start a gtk+ application in ru_RU.ISO8859-5 and set-selection-coding-system to compound-text, because then there is no need in using the \&quot;extended segments\&quot; to transfer Cyrillic text.

This bug is not so easy to fix well (or even understand whether it is the described problem) if one is not an Emacs hacker. We should follow up with them concerning this issue.

A work-around could be requesting the selection as STRING instead of COMPOUND-TEXT that is preferred now (by signalling an error in x-get-selection wrapper-function) and treating the result as a koi8-r encoded string (or whatever the common encoding is). Then, of course, multilingual texts will not be transferred correctly.
Comment 5 imz 2002-04-07 19:47:59 MSD
According to <a href="http://www.pps.jussieu.fr/~jch/software/UTF8_STRING/">http://www.pps.jussieu.fr/~jch/software/UTF8_STRING/</a> (linked from <a href="http://www.freedesktop.org/standards/">http://www.freedesktop.org/standards/</a>), there is also a new third way to transfer selection data in addition to STRING and COMPOUND-TEXT: UTF8-STRING. Ideally, Emacs should prefer this way, otherwise fallback to cumbersome COMPUND-TEXT (support for its X variant should be fixed).
Comment 6 imz 2002-04-07 19:47:59 MSD
According to <a href="http://www.pps.jussieu.fr/~jch/software/UTF8_STRING/">http://www.pps.jussieu.fr/~jch/software/UTF8_STRING/</a> (linked from <a href="http://www.freedesktop.org/standards/">http://www.freedesktop.org/standards/</a>), there is also a new third way to transfer selection data in addition to STRING and COMPOUND-TEXT: UTF8-STRING. Ideally, Emacs should prefer this way, otherwise fallback to cumbersome COMPUND-TEXT (support for its X variant should be fixed).
Comment 7 imz 2002-04-08 08:29:53 MSD
The latest GNU release fixes the problem (!), we ought to build it.

<a href="http://www.gnu.org/software/emacs/NEWS.21.2">http://www.gnu.org/software/emacs/NEWS.21.2</a>:

 * Changes in Emacs 21.2

** Emacs now supports ICCCM Extended Segments in X selections.

Comment 8 imz 2002-04-08 08:29:53 MSD
The latest GNU release fixes the problem (!), we ought to build it.

<a href="http://www.gnu.org/software/emacs/NEWS.21.2">http://www.gnu.org/software/emacs/NEWS.21.2</a>:

 * Changes in Emacs 21.2

** Emacs now supports ICCCM Extended Segments in X selections.

Comment 9 imz 2002-04-13 16:49:12 MSD
Now it works for koi8-r. Should test it in cp1251, and then will mark the bug as resolved.
Comment 10 imz 2002-04-13 16:49:12 MSD
Now it works for koi8-r. Should test it in cp1251, and then will mark the bug as resolved.
Comment 11 imz 2002-04-13 18:39:19 MSD
The worryings were justified. Emacs doesn\'t process extended segments in miscrosoft-cp1251 or koi8-u.
Comment 12 imz 2002-04-13 18:39:19 MSD
The worryings were justified. Emacs doesn\'t process extended segments in miscrosoft-cp1251 or koi8-u.
Comment 13 imz 2002-04-14 18:16:47 MSD
Selections with extended segments in koi8-r were correctly porcessed already in emacs-21.2-alt1. (iso8859-5 is not used in extended segments, it is a part of the standard compound text, and so it worked.)

This bug is completly fixed in emacs-21.2-alt4 where the support for koi8-u and microsoft-cp1251 has been added. (This wasn\'t hard: add two more entries to a table.)

To work correctly with X selections, one should use the deafult value for selection-coding-system: compound-text-with-extensions. Modify .emacs accordingly. etcskel-2.0.2-alt1 conforms to this requirement. emacs-21.2-alt2 and later conflicts with earlier etcskel to allow consistent update.

(Note: some old XFree86-libs construct wrong compound text with extensions, so this fix/feature of Emacs can not be tested on such systems, e.g. with XFree86-libs-4.0-ipl8mdk.)
Comment 14 imz 2002-04-14 18:16:47 MSD
Selections with extended segments in koi8-r were correctly porcessed already in emacs-21.2-alt1. (iso8859-5 is not used in extended segments, it is a part of the standard compound text, and so it worked.)

This bug is completly fixed in emacs-21.2-alt4 where the support for koi8-u and microsoft-cp1251 has been added. (This wasn\'t hard: add two more entries to a table.)

To work correctly with X selections, one should use the deafult value for selection-coding-system: compound-text-with-extensions. Modify .emacs accordingly. etcskel-2.0.2-alt1 conforms to this requirement. emacs-21.2-alt2 and later conflicts with earlier etcskel to allow consistent update.

(Note: some old XFree86-libs construct wrong compound text with extensions, so this fix/feature of Emacs can not be tested on such systems, e.g. with XFree86-libs-4.0-ipl8mdk.)