Bug 1379

Summary: GROUP value printed in Russian in other locales
Product: Sisyphus Reporter: imz <vanyaz>
Component: rpmAssignee: placeholder <placeholder>
Status: CLOSED FIXED QA Contact: qa-sisyphus
Severity: normal    
Priority: P5 CC: at, glebfm, imz, ldv, placeholder, vt
Version: unstable   
Hardware: all   
OS: Linux   
Attachments:
Description Flags
0001379-test_i18n_rpm_language.sh none

Description imz 2002-10-08 23:24:04 MSD
I\'m testing how rpm prints localized values in mainly UTF-8 locales. The results differ between rpm-4.0.4-alt4 and
rpm-4.0.4-alt7, but both are not correct.
---
With rpm-4.0.4-alt7:

$ LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Коммуникации

date demonstrates that genrally the specified locale is valid:

$ LANG=de_AT.UTF-8 date -d \'Jan 6 2002\'
Son Jц╓n  6 00:00:00 MSK 2002

With LANG=\'\', LANG=en and LANG=de rpm prints the English word (which I consider to be correct if there is no German translation available). With LANG=en_US, LANG=de_DE -- the Russian in KOI8-R (incorrect).

rpm-4.0.4-alt4 is the same except for that it doesn\'t covert the output string to UTF-8 -- always KOI8-R.

---

Comment 1 Dmitry V. Levin 2002-10-09 18:20:03 MSD
It\'s because of algorithm implemented in rpm.

lib/header.c:headerFindI18NString()
checks environment variables in this order:
LC_ALL, LC_MESSAGES, LANG.
Comment 2 Dmitry V. Levin 2002-10-09 18:20:03 MSD
It\'s because of algorithm implemented in rpm.

lib/header.c:headerFindI18NString()
checks environment variables in this order:
LC_ALL, LC_MESSAGES, LANG.
Comment 3 imz 2002-10-09 20:30:07 MSD
What does it mean: \&quot;checks environment variables in this order\&quot;: latter override former or vice versa?

This still doen\'t explain why only changing LANG with the rest of the environment remaining the same we can get either English (with invalid (?) LANG=\'\', LANG=en and LANG=de) or Russian (LANG=de_AT.UTF-8) Group names:

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de rpmquery -q minicom --qf=%{GROUP}\\\\n 
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
Comment 4 imz 2002-10-09 20:30:07 MSD
What does it mean: \&quot;checks environment variables in this order\&quot;: latter override former or vice versa?

This still doen\'t explain why only changing LANG with the rest of the environment remaining the same we can get either English (with invalid (?) LANG=\'\', LANG=en and LANG=de) or Russian (LANG=de_AT.UTF-8) Group names:

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de rpmquery -q minicom --qf=%{GROUP}\\\\n 
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
Comment 5 imz 2002-10-09 20:35:22 MSD
Here are the corresponding locale values: LC_MESSAGES is unset, LC_ALL is empty (also unset).

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ locale
LANG=ru_RU.KOI8-R
LC_CTYPE=\&quot;ru_RU.KOI8-R\&quot;
LC_NUMERIC=\&quot;ru_RU.KOI8-R\&quot;
LC_TIME=\&quot;ru_RU.KOI8-R\&quot;
LC_COLLATE=\&quot;ru_RU.KOI8-R\&quot;
LC_MONETARY=\&quot;ru_RU.KOI8-R\&quot;
LC_MESSAGES=\&quot;ru_RU.KOI8-R\&quot;
LC_PAPER=\&quot;ru_RU.KOI8-R\&quot;
LC_NAME=\&quot;ru_RU.KOI8-R\&quot;
LC_ADDRESS=\&quot;ru_RU.KOI8-R\&quot;
LC_TELEPHONE=\&quot;ru_RU.KOI8-R\&quot;
LC_MEASUREMENT=\&quot;ru_RU.KOI8-R\&quot;
LC_IDENTIFICATION=\&quot;ru_RU.KOI8-R\&quot;
LC_ALL=
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de_AT.UTF-8 locale
LANG=de_AT.UTF-8
LC_CTYPE=\&quot;de_AT.UTF-8\&quot;
LC_NUMERIC=\&quot;de_AT.UTF-8\&quot;
LC_TIME=\&quot;de_AT.UTF-8\&quot;
LC_COLLATE=\&quot;de_AT.UTF-8\&quot;
LC_MONETARY=\&quot;de_AT.UTF-8\&quot;
LC_MESSAGES=\&quot;de_AT.UTF-8\&quot;
LC_PAPER=\&quot;de_AT.UTF-8\&quot;
LC_NAME=\&quot;de_AT.UTF-8\&quot;
LC_ADDRESS=\&quot;de_AT.UTF-8\&quot;
LC_TELEPHONE=\&quot;de_AT.UTF-8\&quot;
LC_MEASUREMENT=\&quot;de_AT.UTF-8\&quot;
LC_IDENTIFICATION=\&quot;de_AT.UTF-8\&quot;
LC_ALL=
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de locale
LANG=de
LC_CTYPE=\&quot;de\&quot;
LC_NUMERIC=\&quot;de\&quot;
LC_TIME=\&quot;de\&quot;
LC_COLLATE=\&quot;de\&quot;
LC_MONETARY=\&quot;de\&quot;
LC_MESSAGES=\&quot;de\&quot;
LC_PAPER=\&quot;de\&quot;
LC_NAME=\&quot;de\&quot;
LC_ADDRESS=\&quot;de\&quot;
LC_TELEPHONE=\&quot;de\&quot;
LC_MEASUREMENT=\&quot;de\&quot;
LC_IDENTIFICATION=\&quot;de\&quot;
LC_ALL=
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
Comment 6 imz 2002-10-09 20:35:22 MSD
Here are the corresponding locale values: LC_MESSAGES is unset, LC_ALL is empty (also unset).

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ locale
LANG=ru_RU.KOI8-R
LC_CTYPE=\&quot;ru_RU.KOI8-R\&quot;
LC_NUMERIC=\&quot;ru_RU.KOI8-R\&quot;
LC_TIME=\&quot;ru_RU.KOI8-R\&quot;
LC_COLLATE=\&quot;ru_RU.KOI8-R\&quot;
LC_MONETARY=\&quot;ru_RU.KOI8-R\&quot;
LC_MESSAGES=\&quot;ru_RU.KOI8-R\&quot;
LC_PAPER=\&quot;ru_RU.KOI8-R\&quot;
LC_NAME=\&quot;ru_RU.KOI8-R\&quot;
LC_ADDRESS=\&quot;ru_RU.KOI8-R\&quot;
LC_TELEPHONE=\&quot;ru_RU.KOI8-R\&quot;
LC_MEASUREMENT=\&quot;ru_RU.KOI8-R\&quot;
LC_IDENTIFICATION=\&quot;ru_RU.KOI8-R\&quot;
LC_ALL=
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de_AT.UTF-8 locale
LANG=de_AT.UTF-8
LC_CTYPE=\&quot;de_AT.UTF-8\&quot;
LC_NUMERIC=\&quot;de_AT.UTF-8\&quot;
LC_TIME=\&quot;de_AT.UTF-8\&quot;
LC_COLLATE=\&quot;de_AT.UTF-8\&quot;
LC_MONETARY=\&quot;de_AT.UTF-8\&quot;
LC_MESSAGES=\&quot;de_AT.UTF-8\&quot;
LC_PAPER=\&quot;de_AT.UTF-8\&quot;
LC_NAME=\&quot;de_AT.UTF-8\&quot;
LC_ADDRESS=\&quot;de_AT.UTF-8\&quot;
LC_TELEPHONE=\&quot;de_AT.UTF-8\&quot;
LC_MEASUREMENT=\&quot;de_AT.UTF-8\&quot;
LC_IDENTIFICATION=\&quot;de_AT.UTF-8\&quot;
LC_ALL=
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de locale
LANG=de
LC_CTYPE=\&quot;de\&quot;
LC_NUMERIC=\&quot;de\&quot;
LC_TIME=\&quot;de\&quot;
LC_COLLATE=\&quot;de\&quot;
LC_MONETARY=\&quot;de\&quot;
LC_MESSAGES=\&quot;de\&quot;
LC_PAPER=\&quot;de\&quot;
LC_NAME=\&quot;de\&quot;
LC_ADDRESS=\&quot;de\&quot;
LC_TELEPHONE=\&quot;de\&quot;
LC_MEASUREMENT=\&quot;de\&quot;
LC_IDENTIFICATION=\&quot;de\&quot;
LC_ALL=
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
Comment 7 imz 2002-10-09 20:37:35 MSD
Several more strange examples:

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de_AT.UTF-8 LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ echo $LANGUAGE 
ru_RU.KOI8-R
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
Comment 8 imz 2002-10-09 20:37:35 MSD
Several more strange examples:

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de_AT.UTF-8 LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ echo $LANGUAGE 
ru_RU.KOI8-R
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
Comment 9 imz 2002-10-09 20:46:15 MSD
If we assume that LANGUAGE is the variable that specifies the language of Group value (and neither LANG nor LC_*), then this test remains unexplained:

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru_RU.KOI8-R LANG=invalid rpmquery -q minicom --qf=%{GROUP}\\\\n 
Communications
Comment 10 imz 2002-10-09 20:46:15 MSD
If we assume that LANGUAGE is the variable that specifies the language of Group value (and neither LANG nor LC_*), then this test remains unexplained:

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru_RU.KOI8-R LANG=invalid rpmquery -q minicom --qf=%{GROUP}\\\\n 
Communications
Comment 11 Dmitry V. Levin 2002-10-09 21:10:30 MSD
I\'ve planned to add LANGUAGE to the head of the variables list rpm uses in that algorithm.
rpm-4.1 also checks LANGUAGE.

(first nonempty variable from that list is used.)
Comment 12 Dmitry V. Levin 2002-10-09 21:10:30 MSD
I\'ve planned to add LANGUAGE to the head of the variables list rpm uses in that algorithm.
rpm-4.1 also checks LANGUAGE.

(first nonempty variable from that list is used.)
Comment 13 imz 2002-10-09 22:51:28 MSD
After reading the info page on \&quot;Using gettextized software\&quot;, I think that 2 parallel list hav to be used by rpm for getting different information: (LANGUAGE, ) LC_ALL, LC_MESSAGES, LANG should be examined to get the language (perhaps, a short name like \&quot;de\&quot;), and LANGUAGE, LC_ALL, LC_CTYPE, LANG for the codeset to convert to. I\'m not sure whether LANGUAGE should be included in the second list.

It seems that rpm does all this already (without LANGUAGE in the 2nd list):

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=en LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru LC_CTYPE=de_DE.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru LC_CTYPE=de rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru LC_CTYPE=invalid rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de:ru LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de:ru_RU.UTF-8 LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de:ru_RU.UTF-8 LC_CTYPE=de_DE.ISO8859-1 rpmquery -q minicom --qf=%{GROUP}\\\\n 
????????????

If LC_TYPE is invalid, it outputs the English value.

I do not see any important misbahaviour. It seems it was my error to report this bug because I didn\'t understand how all the locale variables work together quiet well.

This concerns GROUP. I see misbahaviour in recoding SUMMARY. They should be recoded just as the GROUPs are (and the error messages), I think. A testing script is attached. I run it twice:

cat /user/imz/test_i18n_rpm_language.sh |  sh
cat /user/imz/test_i18n_rpm_language.sh | sed -e \'s!LANGUAGE!LC_MESSAGES!g\' | sh

The results are different.
Comment 14 imz 2002-10-09 22:51:28 MSD
After reading the info page on \&quot;Using gettextized software\&quot;, I think that 2 parallel list hav to be used by rpm for getting different information: (LANGUAGE, ) LC_ALL, LC_MESSAGES, LANG should be examined to get the language (perhaps, a short name like \&quot;de\&quot;), and LANGUAGE, LC_ALL, LC_CTYPE, LANG for the codeset to convert to. I\'m not sure whether LANGUAGE should be included in the second list.

It seems that rpm does all this already (without LANGUAGE in the 2nd list):

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=en LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru LC_CTYPE=de_DE.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru LC_CTYPE=de rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru LC_CTYPE=invalid rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de:ru LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de:ru_RU.UTF-8 LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de:ru_RU.UTF-8 LC_CTYPE=de_DE.ISO8859-1 rpmquery -q minicom --qf=%{GROUP}\\\\n 
????????????

If LC_TYPE is invalid, it outputs the English value.

I do not see any important misbahaviour. It seems it was my error to report this bug because I didn\'t understand how all the locale variables work together quiet well.

This concerns GROUP. I see misbahaviour in recoding SUMMARY. They should be recoded just as the GROUPs are (and the error messages), I think. A testing script is attached. I run it twice:

cat /user/imz/test_i18n_rpm_language.sh |  sh
cat /user/imz/test_i18n_rpm_language.sh | sed -e \'s!LANGUAGE!LC_MESSAGES!g\' | sh

The results are different.
Comment 15 Dmitry V. Levin 2004-06-27 02:23:50 MSD
Fixed in rpm-4.0.4-alt39