Summary: | Java source files misdetected as Perl | ||||||
---|---|---|---|---|---|---|---|
Product: | Sisyphus | Reporter: | Mikhail Zabaluev <mhz> | ||||
Component: | file | Assignee: | placeholder <placeholder> | ||||
Status: | CLOSED FIXED | QA Contact: | qa-sisyphus | ||||
Severity: | normal | ||||||
Priority: | P5 | CC: | at, glebfm, ldv, placeholder, vt | ||||
Version: | unstable | ||||||
Hardware: | all | ||||||
OS: | Linux | ||||||
Attachments: |
|
Description
Mikhail Zabaluev
2003-03-10 13:08:18 MSK
Could you attach an example, please. Created attachment 356 [details]
A file from the Jakarta log4j project
file recognizes it as "Perl5 module source text".
When you comment this "Perl5 module source text" rule out, file will misdetect perl package files as "ASCII Java program text". There should be a more elaborate heuristic. Components of a Java package name are delimited with dots. Since widely-available Java package names should be global by convention (e.g. org.altlinux.oursoftware.ourpackage), single-component package names are unlikely. In Perl, package names are delimited with ':' (or "'", but that's obscure), and single-component package names are common. I haven't had time to master syntax of magic files, so I'll put it down in regex parlour. Here's a pattern to Java files: package[[:space:]]+[A-Za-z][A-Za-z0-9]*\.[A-Za-z] If that doesn't match, the following matches Perl modules: package[[:space:]]+[A-Za-z] The matcher in libmagic has string limit (32), so your regex is too long. This one line seems to be enough: 0 regex \^package[\ \ ]+[A-Za-z][^.;]*; Perl5 module source text Please create empty magic_file, add this line there and test with "file -m magic_file". Tested with the pattern as suggested. The regex doesn't mismatch Java files. But .pm files that do contain "package" are all detected as "ASCII English text" or "ASCII C++ program text". Tested on XML-LibXML-1.56. Ok, fixed in file-4.07-alt3. |