Bug 2348 - Java source files misdetected as Perl
Summary: Java source files misdetected as Perl
Alias: None
Product: Sisyphus
Classification: Development
Component: file (show other bugs)
Version: unstable
Hardware: all Linux
: P5 normal
Assignee: placeholder@altlinux.org
QA Contact: qa-sisyphus
Depends on:
Reported: 2003-03-10 13:08 MSK by Mikhail Zabaluev
Modified: 2005-07-13 15:45 MSD (History)
5 users (show)

See Also:

A file from the Jakarta log4j project (8.81 KB, text/plain)
2004-03-07 14:18 MSK, Mikhail Zabaluev
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mikhail Zabaluev 2003-03-10 13:08:18 MSK
Huge majority of Java source files contain the keyword \"package\". This word is treated as an indication of a Perl package.


I believe, the following line in the magic file is to blame:

0       string          package         Perl5 module source text

Comment 1 Dmitry V. Levin 2004-03-01 19:16:45 MSK
Could you attach an example, please. 
Comment 2 Mikhail Zabaluev 2004-03-07 14:18:58 MSK
Created attachment 356 [details]
A file from the Jakarta log4j project

file recognizes it as "Perl5 module source text".
Comment 3 Dmitry V. Levin 2004-03-08 13:06:46 MSK
When you comment this "Perl5 module source text" rule out, 
file will misdetect perl package files as "ASCII Java program text". 
Comment 4 Mikhail Zabaluev 2004-03-08 14:30:42 MSK
There should be a more elaborate heuristic.

Components of a Java package name are delimited with dots. Since
widely-available Java package names should be global by convention (e.g.
org.altlinux.oursoftware.ourpackage), single-component package names are
unlikely. In Perl, package names are delimited with ':' (or "'", but that's
obscure), and single-component package names are common.

I haven't had time to master syntax of magic files, so I'll put it down in regex

Here's a pattern to Java files:


If that doesn't match, the following matches Perl modules:

Comment 5 Dmitry V. Levin 2004-03-09 01:13:28 MSK
The matcher in libmagic has string limit (32), so your regex is too long. 
This one line seems to be enough: 
0	regex	\^package[\ \	]+[A-Za-z][^.;]*;		Perl5 module source text 
Please create empty magic_file, add this line there and test with "file -m magic_file". 
Comment 6 Mikhail Zabaluev 2004-03-09 11:03:50 MSK
Tested with the pattern as suggested.
The regex doesn't mismatch Java files.
But .pm files that do contain "package" are all detected as "ASCII English text"
or "ASCII C++ program text". Tested on XML-LibXML-1.56.
Comment 7 Dmitry V. Levin 2004-03-09 14:46:59 MSK
Ok, fixed in file-4.07-alt3.