<?xml version="1.0" encoding="UTF-8" ?>

<bugzilla version="5.2"
          urlbase="https://bugzilla.altlinux.org/"
          
          maintainer="jenya@basealt.ru"
>

    <bug>
          <bug_id>10445</bug_id>
          
          <creation_ts>2006-12-18 12:58:02 +0300</creation_ts>
          <short_desc>UTF-8 support is next to nonexistent</short_desc>
          <delta_ts>2025-01-17 15:53:34 +0300</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>4</classification_id>
          <classification>Development</classification>
          <product>Sisyphus</product>
          <component>coreutils</component>
          <version>unstable</version>
          <rep_platform>all</rep_platform>
          <op_sys>Linux</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc>http://lists.altlinux.org/pipermail/devel/2006-October/037964.html</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          <blocked>10446</blocked>
          <everconfirmed>1</everconfirmed>
          <reporter name="Michael Shigorin">mike</reporter>
          <assigned_to name="placeholder@altlinux.org">placeholder</assigned_to>
          <cc>arseny</cc>
    
    <cc>glebfm</cc>
    
    <cc>ldv</cc>
    
    <cc>omarandemad</cc>
    
    <cc>placeholder</cc>
    
    <cc>vt</cc>
    
    <cc>zerg</cc>
          
          <qa_contact>qa-sisyphus</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>43415</commentid>
    <comment_count>0</comment_count>
    <who name="Michael Shigorin">mike</who>
    <bug_when>2006-12-18 12:58:02 +0300</bug_when>
    <thetext>It&apos;s known that GNU coreutils don&apos;t really work with Unicode at this time (even
utilities like tr and sed); there are limited attempts to remedy this at RH:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=120933

and there&apos;s a corresponding bug in Debian:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=139861

Probably something could be cleanroomed from the Heirloom Toolchest which claims
to support UTF-8 and at least tr and sed do:
http://heirloom.sourceforge.net/tools.html

Regarding tr:
http://mail.nl.linux.org/linux-utf8/2003-08/msg00224.html

Some more (aging) patches are available here:
http://www.openi18n.org/subgroups/utildev/dli18npatch2.html</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>43870</commentid>
    <comment_count>1</comment_count>
    <who name="Michael Shigorin">mike</who>
    <bug_when>2006-12-26 11:25:33 +0300</bug_when>
    <thetext>*** Bug 10520 has been marked as a duplicate of this bug. ***</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>68902</commentid>
    <comment_count>2</comment_count>
    <who name="Dmitry V. Levin">ldv</who>
    <bug_when>2008-04-24 15:24:48 +0400</bug_when>
    <thetext>current state of things:
http://lists.gnu.org/archive/html/bug-coreutils/2008-04/msg00231.html

BTW, multibyte support in grep is awkward (grep works several _magnitudes_
slower in UTF-8), so I have to disable this &quot;support&quot; in scripts:
http://git.altlinux.org/people/ldv/packages/?p=hasher.git;a=commit;h=1.2.5-alt1-15-gd1764b6</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>183353</commentid>
    <comment_count>3</comment_count>
    <who name="Sergey V Turchin">zerg</who>
    <bug_when>2019-08-01 10:53:18 +0300</bug_when>
    <thetext>2 LDV: redmine#6375,6581

P.S.
Можно тоже удалить.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>183354</commentid>
    <comment_count>4</comment_count>
    <who name="Sergey V Turchin">zerg</who>
    <bug_when>2019-08-01 10:54:50 +0300</bug_when>
    <thetext>Извиняюсь. Не заметил, что там дело уже пошло.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>257574</commentid>
    <comment_count>5</comment_count>
    <who name="Arseny Maslennikov">arseny</who>
    <bug_when>2025-01-17 15:53:34 +0300</bug_when>
    <thetext>More like &quot;UTF-8 support is next to omnipresent&quot;. Closing.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>