Bug 16127

Summary: broken UTF-8 handling while trimming field length
Product: Infrastructure Reporter: Michael Shigorin <mike>
Component: bugzilla.altlinux.orgAssignee: Mikhail Gusarov <dottedmag>
Status: CLOSED FIXED QA Contact: Mikhail Gusarov <dottedmag>
Severity: enhancement    
Priority: P2 CC: vitaly.fedrushkov
Version: unspecified   
Hardware: all   
OS: Linux   
URL: https://bugzilla.altlinux.org/buglist.cgi?query_format=advanced&classification=Development&product=Sisyphus&component=udev&component_type=equals&bug_severity=critical&bug_severity=major&emailassigned_to1=1&emailassigned_to2=1&emailreporter2=1&emailqa_contact2=1&emailcc2=1&chfieldto=Now&cmdtype=doit&order=Reuse%20same%20sort%20as%20last%20time
Bug Depends on: 16711    
Bug Blocks:    

Description Michael Shigorin 2008-06-21 13:23:11 MSD
Seems like the field shortening (used at least in buglists) is a bit naїve about multibyte characters and uses byte counts.  This results in Cyrillic strings being cut too early (even if Chinese would get even less hieroglyphs):

udev depends on udev_static-addon instead of udev_static
не все правила отрабатывают пр�...

Second one would also get its last character damaged by being cut in two bytes.

In a perfect world, there might be no sense to cut things at all; but closer to reality, they cut strings preferably on whitespace/punctuation boundaries.
Comment 1 Mikhail Gusarov 2008-07-02 23:54:38 MSD
Yes, Bugzilla does a simple substr() on bytestrings. D'oh.

I can invent a quick hack for the our, UTF-8, Bugzilla, but making it suitable for upstream means a lot of work (essentially converting all the internals from the bytestrings to the Unicode strings :)
Comment 2 Vitaly Fedrushkov 2008-12-02 12:26:20 MSK
https://bugzilla.mozilla.org/show_bug.cgi?id=363153 fixed in 3.2
Comment 3 Mikhail Gusarov 2009-01-22 06:51:52 MSK
Yep, fixed in 3.2.