Bug 1379 - GROUP value printed in Russian in other locales
: GROUP value printed in Russian in other locales
Status: CLOSED FIXED
: Sisyphus
(All bugs in Sisyphus/rpm)
: unstable
: all Linux
: P5 normal
Assigned To:
:
:
:
:
:
  Show dependency tree
 
Reported: 2002-10-08 23:24 by
Modified: 2005-12-19 17:12 (History)


Attachments
0001379-test_i18n_rpm_language.sh (522 bytes, application/x-sh)
2002-10-09 22:53, imz
no flags Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2002-10-08 23:24:04
I\'m testing how rpm prints localized values in mainly UTF-8 locales. The
results differ between rpm-4.0.4-alt4 and
rpm-4.0.4-alt7, but both are not correct.
---
With rpm-4.0.4-alt7:

$ LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t
koi8-r
Коммуникации

date demonstrates that genrally the specified locale is valid:

$ LANG=de_AT.UTF-8 date -d \'Jan 6 2002\'
Son Jц╓n  6 00:00:00 MSK 2002

With LANG=\'\', LANG=en and LANG=de rpm prints the English word (which I
consider to be correct if there is no German translation available). With
LANG=en_US, LANG=de_DE -- the Russian in KOI8-R (incorrect).

rpm-4.0.4-alt4 is the same except for that it doesn\'t covert the output string
to UTF-8 -- always KOI8-R.

---
------- Comment #1 From 2002-10-09 18:20:03 -------
It\'s because of algorithm implemented in rpm.

lib/header.c:headerFindI18NString()
checks environment variables in this order:
LC_ALL, LC_MESSAGES, LANG.
------- Comment #2 From 2002-10-09 18:20:03 -------
It\'s because of algorithm implemented in rpm.

lib/header.c:headerFindI18NString()
checks environment variables in this order:
LC_ALL, LC_MESSAGES, LANG.
------- Comment #3 From 2002-10-09 20:30:07 -------
What does it mean: \"checks environment variables in this order\":
latter override former or vice versa?

This still doen\'t explain why only changing LANG with the rest of the
environment remaining the same we can get either English (with invalid (?)
LANG=\'\', LANG=en and LANG=de) or Russian (LANG=de_AT.UTF-8) Group names:

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t
koi8-r
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de
rpmquery -q minicom --qf=%{GROUP}\\\\n 
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
------- Comment #4 From 2002-10-09 20:30:07 -------
What does it mean: \&quot;checks environment variables in this order\&quot;:
latter override former or vice versa?

This still doen\'t explain why only changing LANG with the rest of the
environment remaining the same we can get either English (with invalid (?)
LANG=\'\', LANG=en and LANG=de) or Russian (LANG=de_AT.UTF-8) Group names:

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t
koi8-r
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de
rpmquery -q minicom --qf=%{GROUP}\\\\n 
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
------- Comment #5 From 2002-10-09 20:35:22 -------
Here are the corresponding locale values: LC_MESSAGES is unset, LC_ALL is empty
(also unset).

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ locale
LANG=ru_RU.KOI8-R
LC_CTYPE=\&quot;ru_RU.KOI8-R\&quot;
LC_NUMERIC=\&quot;ru_RU.KOI8-R\&quot;
LC_TIME=\&quot;ru_RU.KOI8-R\&quot;
LC_COLLATE=\&quot;ru_RU.KOI8-R\&quot;
LC_MONETARY=\&quot;ru_RU.KOI8-R\&quot;
LC_MESSAGES=\&quot;ru_RU.KOI8-R\&quot;
LC_PAPER=\&quot;ru_RU.KOI8-R\&quot;
LC_NAME=\&quot;ru_RU.KOI8-R\&quot;
LC_ADDRESS=\&quot;ru_RU.KOI8-R\&quot;
LC_TELEPHONE=\&quot;ru_RU.KOI8-R\&quot;
LC_MEASUREMENT=\&quot;ru_RU.KOI8-R\&quot;
LC_IDENTIFICATION=\&quot;ru_RU.KOI8-R\&quot;
LC_ALL=
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
LANG=de_AT.UTF-8 locale
LANG=de_AT.UTF-8
LC_CTYPE=\&quot;de_AT.UTF-8\&quot;
LC_NUMERIC=\&quot;de_AT.UTF-8\&quot;
LC_TIME=\&quot;de_AT.UTF-8\&quot;
LC_COLLATE=\&quot;de_AT.UTF-8\&quot;
LC_MONETARY=\&quot;de_AT.UTF-8\&quot;
LC_MESSAGES=\&quot;de_AT.UTF-8\&quot;
LC_PAPER=\&quot;de_AT.UTF-8\&quot;
LC_NAME=\&quot;de_AT.UTF-8\&quot;
LC_ADDRESS=\&quot;de_AT.UTF-8\&quot;
LC_TELEPHONE=\&quot;de_AT.UTF-8\&quot;
LC_MEASUREMENT=\&quot;de_AT.UTF-8\&quot;
LC_IDENTIFICATION=\&quot;de_AT.UTF-8\&quot;
LC_ALL=
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de locale
LANG=de
LC_CTYPE=\&quot;de\&quot;
LC_NUMERIC=\&quot;de\&quot;
LC_TIME=\&quot;de\&quot;
LC_COLLATE=\&quot;de\&quot;
LC_MONETARY=\&quot;de\&quot;
LC_MESSAGES=\&quot;de\&quot;
LC_PAPER=\&quot;de\&quot;
LC_NAME=\&quot;de\&quot;
LC_ADDRESS=\&quot;de\&quot;
LC_TELEPHONE=\&quot;de\&quot;
LC_MEASUREMENT=\&quot;de\&quot;
LC_IDENTIFICATION=\&quot;de\&quot;
LC_ALL=
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
------- Comment #6 From 2002-10-09 20:35:22 -------
Here are the corresponding locale values: LC_MESSAGES is unset, LC_ALL is empty
(also unset).

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ locale
LANG=ru_RU.KOI8-R
LC_CTYPE=\&quot;ru_RU.KOI8-R\&quot;
LC_NUMERIC=\&quot;ru_RU.KOI8-R\&quot;
LC_TIME=\&quot;ru_RU.KOI8-R\&quot;
LC_COLLATE=\&quot;ru_RU.KOI8-R\&quot;
LC_MONETARY=\&quot;ru_RU.KOI8-R\&quot;
LC_MESSAGES=\&quot;ru_RU.KOI8-R\&quot;
LC_PAPER=\&quot;ru_RU.KOI8-R\&quot;
LC_NAME=\&quot;ru_RU.KOI8-R\&quot;
LC_ADDRESS=\&quot;ru_RU.KOI8-R\&quot;
LC_TELEPHONE=\&quot;ru_RU.KOI8-R\&quot;
LC_MEASUREMENT=\&quot;ru_RU.KOI8-R\&quot;
LC_IDENTIFICATION=\&quot;ru_RU.KOI8-R\&quot;
LC_ALL=
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
LANG=de_AT.UTF-8 locale
LANG=de_AT.UTF-8
LC_CTYPE=\&quot;de_AT.UTF-8\&quot;
LC_NUMERIC=\&quot;de_AT.UTF-8\&quot;
LC_TIME=\&quot;de_AT.UTF-8\&quot;
LC_COLLATE=\&quot;de_AT.UTF-8\&quot;
LC_MONETARY=\&quot;de_AT.UTF-8\&quot;
LC_MESSAGES=\&quot;de_AT.UTF-8\&quot;
LC_PAPER=\&quot;de_AT.UTF-8\&quot;
LC_NAME=\&quot;de_AT.UTF-8\&quot;
LC_ADDRESS=\&quot;de_AT.UTF-8\&quot;
LC_TELEPHONE=\&quot;de_AT.UTF-8\&quot;
LC_MEASUREMENT=\&quot;de_AT.UTF-8\&quot;
LC_IDENTIFICATION=\&quot;de_AT.UTF-8\&quot;
LC_ALL=
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de locale
LANG=de
LC_CTYPE=\&quot;de\&quot;
LC_NUMERIC=\&quot;de\&quot;
LC_TIME=\&quot;de\&quot;
LC_COLLATE=\&quot;de\&quot;
LC_MONETARY=\&quot;de\&quot;
LC_MESSAGES=\&quot;de\&quot;
LC_PAPER=\&quot;de\&quot;
LC_NAME=\&quot;de\&quot;
LC_ADDRESS=\&quot;de\&quot;
LC_TELEPHONE=\&quot;de\&quot;
LC_MEASUREMENT=\&quot;de\&quot;
LC_IDENTIFICATION=\&quot;de\&quot;
LC_ALL=
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
------- Comment #7 From 2002-10-09 20:37:35 -------
Several more strange examples:

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de_AT.UTF-8 LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ echo $LANGUAGE 
ru_RU.KOI8-R
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
------- Comment #8 From 2002-10-09 20:37:35 -------
Several more strange examples:

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de_AT.UTF-8 LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ echo $LANGUAGE 
ru_RU.KOI8-R
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
------- Comment #9 From 2002-10-09 20:46:15 -------
If we assume that LANGUAGE is the variable that specifies the language of Group
value (and neither LANG nor LC_*), then this test remains unexplained:

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
LANGUAGE=ru_RU.KOI8-R LANG=invalid rpmquery -q minicom --qf=%{GROUP}\\\\n 
Communications
------- Comment #10 From 2002-10-09 20:46:15 -------
If we assume that LANGUAGE is the variable that specifies the language of Group
value (and neither LANG nor LC_*), then this test remains unexplained:

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
LANGUAGE=ru_RU.KOI8-R LANG=invalid rpmquery -q minicom --qf=%{GROUP}\\\\n 
Communications
------- Comment #11 From 2002-10-09 21:10:30 -------
I\'ve planned to add LANGUAGE to the head of the variables list rpm uses in
that algorithm.
rpm-4.1 also checks LANGUAGE.

(first nonempty variable from that list is used.)
------- Comment #12 From 2002-10-09 21:10:30 -------
I\'ve planned to add LANGUAGE to the head of the variables list rpm uses in
that algorithm.
rpm-4.1 also checks LANGUAGE.

(first nonempty variable from that list is used.)
------- Comment #13 From 2002-10-09 22:51:28 -------
After reading the info page on \&quot;Using gettextized software\&quot;, I
think that 2 parallel list hav to be used by rpm for getting different
information: (LANGUAGE, ) LC_ALL, LC_MESSAGES, LANG should be examined to get
the language (perhaps, a short name like \&quot;de\&quot;), and LANGUAGE,
LC_ALL, LC_CTYPE, LANG for the codeset to convert to. I\'m not sure whether
LANGUAGE should be included in the second list.

It seems that rpm does all this already (without LANGUAGE in the 2nd list):

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de
LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru
LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=en
LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru
LC_CTYPE=de_DE.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t
koi8-r
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru
LC_CTYPE=de rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru
LC_CTYPE=invalid rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t
koi8-r
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de:ru
LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
LANGUAGE=de:ru_RU.UTF-8 LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom
--qf=%{GROUP}\\\\n 
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
LANGUAGE=de:ru_RU.UTF-8 LC_CTYPE=de_DE.ISO8859-1 rpmquery -q minicom
--qf=%{GROUP}\\\\n 
????????????

If LC_TYPE is invalid, it outputs the English value.

I do not see any important misbahaviour. It seems it was my error to report
this bug because I didn\'t understand how all the locale variables work
together quiet well.

This concerns GROUP. I see misbahaviour in recoding SUMMARY. They should be
recoded just as the GROUPs are (and the error messages), I think. A testing
script is attached. I run it twice:

cat /user/imz/test_i18n_rpm_language.sh |  sh
cat /user/imz/test_i18n_rpm_language.sh | sed -e \'s!LANGUAGE!LC_MESSAGES!g\' |
sh

The results are different.
------- Comment #14 From 2002-10-09 22:51:28 -------
After reading the info page on \&quot;Using gettextized software\&quot;, I
think that 2 parallel list hav to be used by rpm for getting different
information: (LANGUAGE, ) LC_ALL, LC_MESSAGES, LANG should be examined to get
the language (perhaps, a short name like \&quot;de\&quot;), and LANGUAGE,
LC_ALL, LC_CTYPE, LANG for the codeset to convert to. I\'m not sure whether
LANGUAGE should be included in the second list.

It seems that rpm does all this already (without LANGUAGE in the 2nd list):

[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de
LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru
LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=en
LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru
LC_CTYPE=de_DE.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t
koi8-r
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru
LC_CTYPE=de rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru
LC_CTYPE=invalid rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t
koi8-r
Communications
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de:ru
LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n 
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
LANGUAGE=de:ru_RU.UTF-8 LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom
--qf=%{GROUP}\\\\n 
Коммуникации
[<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
LANGUAGE=de:ru_RU.UTF-8 LC_CTYPE=de_DE.ISO8859-1 rpmquery -q minicom
--qf=%{GROUP}\\\\n 
????????????

If LC_TYPE is invalid, it outputs the English value.

I do not see any important misbahaviour. It seems it was my error to report
this bug because I didn\'t understand how all the locale variables work
together quiet well.

This concerns GROUP. I see misbahaviour in recoding SUMMARY. They should be
recoded just as the GROUPs are (and the error messages), I think. A testing
script is attached. I run it twice:

cat /user/imz/test_i18n_rpm_language.sh |  sh
cat /user/imz/test_i18n_rpm_language.sh | sed -e \'s!LANGUAGE!LC_MESSAGES!g\' |
sh

The results are different.
------- Comment #15 From 2004-06-27 02:23:50 -------
Fixed in rpm-4.0.4-alt39