Bug 10445 - UTF-8 support is next to nonexistent
: UTF-8 support is next to nonexistent
Status: NEW
: Sisyphus
(All bugs in Sisyphus/coreutils)
: unstable
: all Linux
: P3 normal
Assigned To:
:
: http://lists.altlinux.org/pipermail/d...
:
:
: 10446
  Show dependency tree
 
Reported: 2006-12-18 12:58 by
Modified: 2008-04-24 15:24 (History)


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2006-12-18 12:58:02
It's known that GNU coreutils don't really work with Unicode at this time (even
utilities like tr and sed); there are limited attempts to remedy this at RH:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=120933

and there's a corresponding bug in Debian:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=139861

Probably something could be cleanroomed from the Heirloom Toolchest which claims
to support UTF-8 and at least tr and sed do:
http://heirloom.sourceforge.net/tools.html

Regarding tr:
http://mail.nl.linux.org/linux-utf8/2003-08/msg00224.html

Some more (aging) patches are available here:
http://www.openi18n.org/subgroups/utildev/dli18npatch2.html
------- Comment #1 From 2006-12-26 11:25:33 -------
*** Bug 10520 has been marked as a duplicate of this bug. ***
------- Comment #2 From 2008-04-24 15:24:48 -------
current state of things:
http://lists.gnu.org/archive/html/bug-coreutils/2008-04/msg00231.html

BTW, multibyte support in grep is awkward (grep works several _magnitudes_
slower in UTF-8), so I have to disable this "support" in scripts:
http://git.altlinux.org/people/ldv/packages/?p=hasher.git;a=commit;h=1.2.5-alt1-15-gd1764b6