Bug 17300

Summary: can't open a file if its name contains strange chars
Product: Branch 4.0 Reporter: Ivan Zakharyaschev <imz>
Component: openoffice.orgAssignee: Valery Inozemtsev <shrek>
Status: CLOSED WONTFIX QA Contact: Q.A. 4.0 <qa-4.0>
Severity: normal    
Priority: P2 CC: dottedmag, vvk
Version: 4.0   
Hardware: all   
OS: Linux   
Attachments:
Description Flags
encoding.tar none

Description Ivan Zakharyaschev 2008-09-23 17:31:08 MSD
Created attachment 2945 [details]
encoding.tar

openoffice.org-2.3.1.1-alt4.M40.1 in Lite 4.0.3

I have a with a strange name (I got it via rsync from a system with another locale). But whatever the name of a file is, a program must open it if it is given as an argument. This is not the case with OOo:

$ echo *
óÐÉÓÏË.doc
$ ooffice óÐÉÓÏË.doc

An error message appears: "/home/imz/bugreports/encoding/??????.doc не существует." (The question marks are not ordinary question marks, but question marks in black diamonds.)

The environment:

$ locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
$ 

A proof that good programs can open this file:

$ file óÐÉÓÏË.doc 
óÐÉÓÏË.doc: Microsoft Office Document
$ 

(How to reproduce: if you don't know how to create a file with such name, try the attached .tar.)
Comment 1 Mikhail Gusarov 2008-09-23 18:01:24 MSD
Looks like sort-of output of 'echo *' is a feature of your terminal emulator which can interpret invalid UTF-8 characters (you can check it by echo * | iconv -f UTF-8 -t UTF-8).

Ability to use non-UTF-8 names in UTF-8 locale should be considered as a feature, and refusal of opening such files should not be treated as a bug.
Comment 2 Valery Inozemtsev 2008-09-23 19:42:02 MSD
no comments
Comment 3 Ivan Zakharyaschev 2008-09-23 20:47:10 MSD
(In reply to comment #1)
> Looks like sort-of output of 'echo *' is a feature of your terminal emulator which can interpret invalid UTF-8 characters (you can check it
> by echo * | iconv -f UTF-8 -t UTF-8).

I don't care how it is displayed, but it's a real path, and it points to an existing file. If it was invalid, the filesystem should have refused to create it.

"file" can open it, "abiword" can open it:

$ abiword * -t txt -o a.txt; cat a.txt
a
$ 

but OOo can't. Why? Because the file opening or option parsing code is broken in OOo.