Huge majority of Java source files contain the keyword \"package\". This word is treated as an indication of a Perl package.
I believe, the following line in the magic file is to blame:
0 string package Perl5 module source text
Could you attach an example, please.
Created attachment 356 [details]
A file from the Jakarta log4j project
file recognizes it as "Perl5 module source text".
When you comment this "Perl5 module source text" rule out,
file will misdetect perl package files as "ASCII Java program text".
There should be a more elaborate heuristic.
Components of a Java package name are delimited with dots. Since
widely-available Java package names should be global by convention (e.g.
org.altlinux.oursoftware.ourpackage), single-component package names are
unlikely. In Perl, package names are delimited with ':' (or "'", but that's
obscure), and single-component package names are common.
I haven't had time to master syntax of magic files, so I'll put it down in regex
Here's a pattern to Java files:
If that doesn't match, the following matches Perl modules:
The matcher in libmagic has string limit (32), so your regex is too long.
This one line seems to be enough:
0 regex \^package[\ \ ]+[A-Za-z][^.;]*; Perl5 module source text
Please create empty magic_file, add this line there and test with "file -m magic_file".
Tested with the pattern as suggested.
The regex doesn't mismatch Java files.
But .pm files that do contain "package" are all detected as "ASCII English text"
or "ASCII C++ program text". Tested on XML-LibXML-1.56.
Ok, fixed in file-4.07-alt3.