ALT Linux Bugzilla – #2348
Java source files misdetected as Perl
Last modified: 2005-07-13 15:45:20
You need to
before you can comment on or make changes to this bug.
Huge majority of Java source files contain the keyword \"package\".
This word is treated as an indication of a Perl package.
I believe, the following line in the magic file is to blame:
0 string package Perl5 module source text
Could you attach an example, please.
Created an attachment (id=356) [details]
A file from the Jakarta log4j project
file recognizes it as "Perl5 module source text".
When you comment this "Perl5 module source text" rule out,
file will misdetect perl package files as "ASCII Java program text".
There should be a more elaborate heuristic.
Components of a Java package name are delimited with dots. Since
widely-available Java package names should be global by convention (e.g.
org.altlinux.oursoftware.ourpackage), single-component package names are
unlikely. In Perl, package names are delimited with ':' (or "'", but that's
obscure), and single-component package names are common.
I haven't had time to master syntax of magic files, so I'll put it down in regex
Here's a pattern to Java files:
If that doesn't match, the following matches Perl modules:
The matcher in libmagic has string limit (32), so your regex is too long.
This one line seems to be enough:
0 regex \^package[\ \ ]+[A-Za-z][^.;]*; Perl5 module source text
Please create empty magic_file, add this line there and test with "file -m magic_file".
Tested with the pattern as suggested.
The regex doesn't mismatch Java files.
But .pm files that do contain "package" are all detected as "ASCII English text"
or "ASCII C++ program text". Tested on XML-LibXML-1.56.
Ok, fixed in file-4.07-alt3.