I\'m testing how rpm prints localized values in mainly UTF-8 locales. The results differ between rpm-4.0.4-alt4 and rpm-4.0.4-alt7, but both are not correct. --- With rpm-4.0.4-alt7: $ LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r Коммуникации date demonstrates that genrally the specified locale is valid: $ LANG=de_AT.UTF-8 date -d \'Jan 6 2002\' Son Jц╓n 6 00:00:00 MSK 2002 With LANG=\'\', LANG=en and LANG=de rpm prints the English word (which I consider to be correct if there is no German translation available). With LANG=en_US, LANG=de_DE -- the Russian in KOI8-R (incorrect). rpm-4.0.4-alt4 is the same except for that it doesn\'t covert the output string to UTF-8 -- always KOI8-R. ---
It\'s because of algorithm implemented in rpm. lib/header.c:headerFindI18NString() checks environment variables in this order: LC_ALL, LC_MESSAGES, LANG.
What does it mean: \"checks environment variables in this order\": latter override former or vice versa? This still doen\'t explain why only changing LANG with the rest of the environment remaining the same we can get either English (with invalid (?) LANG=\'\', LANG=en and LANG=de) or Russian (LANG=de_AT.UTF-8) Group names: [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r Коммуникации [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de rpmquery -q minicom --qf=%{GROUP}\\\\n Communications [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
Here are the corresponding locale values: LC_MESSAGES is unset, LC_ALL is empty (also unset). [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ locale LANG=ru_RU.KOI8-R LC_CTYPE=\"ru_RU.KOI8-R\" LC_NUMERIC=\"ru_RU.KOI8-R\" LC_TIME=\"ru_RU.KOI8-R\" LC_COLLATE=\"ru_RU.KOI8-R\" LC_MONETARY=\"ru_RU.KOI8-R\" LC_MESSAGES=\"ru_RU.KOI8-R\" LC_PAPER=\"ru_RU.KOI8-R\" LC_NAME=\"ru_RU.KOI8-R\" LC_ADDRESS=\"ru_RU.KOI8-R\" LC_TELEPHONE=\"ru_RU.KOI8-R\" LC_MEASUREMENT=\"ru_RU.KOI8-R\" LC_IDENTIFICATION=\"ru_RU.KOI8-R\" LC_ALL= [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de_AT.UTF-8 locale LANG=de_AT.UTF-8 LC_CTYPE=\"de_AT.UTF-8\" LC_NUMERIC=\"de_AT.UTF-8\" LC_TIME=\"de_AT.UTF-8\" LC_COLLATE=\"de_AT.UTF-8\" LC_MONETARY=\"de_AT.UTF-8\" LC_MESSAGES=\"de_AT.UTF-8\" LC_PAPER=\"de_AT.UTF-8\" LC_NAME=\"de_AT.UTF-8\" LC_ADDRESS=\"de_AT.UTF-8\" LC_TELEPHONE=\"de_AT.UTF-8\" LC_MEASUREMENT=\"de_AT.UTF-8\" LC_IDENTIFICATION=\"de_AT.UTF-8\" LC_ALL= [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de locale LANG=de LC_CTYPE=\"de\" LC_NUMERIC=\"de\" LC_TIME=\"de\" LC_COLLATE=\"de\" LC_MONETARY=\"de\" LC_MESSAGES=\"de\" LC_PAPER=\"de\" LC_NAME=\"de\" LC_ADDRESS=\"de\" LC_TELEPHONE=\"de\" LC_MEASUREMENT=\"de\" LC_IDENTIFICATION=\"de\" LC_ALL= [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
Several more strange examples: [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r Коммуникации [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r Communications [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de_AT.UTF-8 LANG=de_AT.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r Communications [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ echo $LANGUAGE ru_RU.KOI8-R [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$
If we assume that LANGUAGE is the variable that specifies the language of Group value (and neither LANG nor LC_*), then this test remains unexplained: [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru_RU.KOI8-R LANG=invalid rpmquery -q minicom --qf=%{GROUP}\\\\n Communications
I\'ve planned to add LANGUAGE to the head of the variables list rpm uses in that algorithm. rpm-4.1 also checks LANGUAGE. (first nonempty variable from that list is used.)
After reading the info page on \"Using gettextized software\", I think that 2 parallel list hav to be used by rpm for getting different information: (LANGUAGE, ) LC_ALL, LC_MESSAGES, LANG should be examined to get the language (perhaps, a short name like \"de\"), and LANGUAGE, LC_ALL, LC_CTYPE, LANG for the codeset to convert to. I\'m not sure whether LANGUAGE should be included in the second list. It seems that rpm does all this already (without LANGUAGE in the 2nd list): [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n Communications [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n Коммуникации [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=en LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n Communications [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru LC_CTYPE=de_DE.UTF-8 rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r Коммуникации [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru LC_CTYPE=de rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r Communications [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=ru LC_CTYPE=invalid rpmquery -q minicom --qf=%{GROUP}\\\\n | iconv -f utf-8 -t koi8-r Communications [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de:ru LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n Коммуникации [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de:ru_RU.UTF-8 LC_CTYPE=ru_RU.KOI8-R rpmquery -q minicom --qf=%{GROUP}\\\\n Коммуникации [<a href="mailto:imz@altair" target="_new">imz@altair</a> imz]$ LANGUAGE=de:ru_RU.UTF-8 LC_CTYPE=de_DE.ISO8859-1 rpmquery -q minicom --qf=%{GROUP}\\\\n ???????????? If LC_TYPE is invalid, it outputs the English value. I do not see any important misbahaviour. It seems it was my error to report this bug because I didn\'t understand how all the locale variables work together quiet well. This concerns GROUP. I see misbahaviour in recoding SUMMARY. They should be recoded just as the GROUPs are (and the error messages), I think. A testing script is attached. I run it twice: cat /user/imz/test_i18n_rpm_language.sh | sh cat /user/imz/test_i18n_rpm_language.sh | sed -e \'s!LANGUAGE!LC_MESSAGES!g\' | sh The results are different.
Fixed in rpm-4.0.4-alt39