Bug 16127

Summary:	broken UTF-8 handling while trimming field length
Product:	Infrastructure	Reporter:	Michael Shigorin <mike>
Component:	bugzilla.altlinux.org	Assignee:	Mikhail Gusarov <dottedmag>
Status:	CLOSED FIXED	QA Contact:	Mikhail Gusarov <dottedmag>
Severity:	enhancement
Priority:	P2	CC:	vitaly.fedrushkov
Version:	unspecified
Hardware:	all
OS:	Linux
URL:	https://bugzilla.altlinux.org/buglist.cgi?query_format=advanced&classification=Development&product=Sisyphus&component=udev&component_type=equals&bug_severity=critical&bug_severity=major&emailassigned_to1=1&emailassigned_to2=1&emailreporter2=1&emailqa_contact2=1&emailcc2=1&chfieldto=Now&cmdtype=doit&order=Reuse%20same%20sort%20as%20last%20time
Bug Depends on:	16711
Bug Blocks:

Description Michael Shigorin 2008-06-21 13:23:11 MSD

Seems like the field shortening (used at least in buglists) is a bit naїve about multibyte characters and uses byte counts.  This results in Cyrillic strings being cut too early (even if Chinese would get even less hieroglyphs):

udev depends on udev_static-addon instead of udev_static
не все правила отрабатывают пр�...

Second one would also get its last character damaged by being cut in two bytes.

In a perfect world, there might be no sense to cut things at all; but closer to reality, they cut strings preferably on whitespace/punctuation boundaries.

Comment 1 Mikhail Gusarov 2008-07-02 23:54:38 MSD

Yes, Bugzilla does a simple substr() on bytestrings. D'oh.

I can invent a quick hack for the our, UTF-8, Bugzilla, but making it suitable for upstream means a lot of work (essentially converting all the internals from the bytestrings to the Unicode strings :)

Comment 2 Vitaly Fedrushkov 2008-12-02 12:26:20 MSK

https://bugzilla.mozilla.org/show_bug.cgi?id=363153 fixed in 3.2

Comment 3 Mikhail Gusarov 2009-01-22 06:51:52 MSK

Yep, fixed in 3.2.