Bug 2348 - Java source files misdetected as Perl
Summary: Java source files misdetected as Perl
Status: CLOSED FIXED
Alias: None
Product: Sisyphus
Classification: Development
Component: file (show other bugs)
Version: unstable
Hardware: all Linux
: P5 normal
Assignee: placeholder@altlinux.org
QA Contact: qa-sisyphus
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-03-10 13:08 MSK by Mikhail Zabaluev
Modified: 2005-07-13 15:45 MSD (History)
5 users (show)

See Also:


Attachments
A file from the Jakarta log4j project (8.81 KB, text/plain)
2004-03-07 14:18 MSK, Mikhail Zabaluev
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mikhail Zabaluev 2003-03-10 13:08:18 MSK
Huge majority of Java source files contain the keyword \"package\". This word is treated as an indication of a Perl package.

---

---
I believe, the following line in the magic file is to blame:

0       string          package         Perl5 module source text


Comment 1 Dmitry V. Levin 2004-03-01 19:16:45 MSK
Could you attach an example, please. 
Comment 2 Mikhail Zabaluev 2004-03-07 14:18:58 MSK
Created attachment 356 [details]
A file from the Jakarta log4j project

file recognizes it as "Perl5 module source text".
Comment 3 Dmitry V. Levin 2004-03-08 13:06:46 MSK
When you comment this "Perl5 module source text" rule out, 
file will misdetect perl package files as "ASCII Java program text". 
Comment 4 Mikhail Zabaluev 2004-03-08 14:30:42 MSK
There should be a more elaborate heuristic.

Components of a Java package name are delimited with dots. Since
widely-available Java package names should be global by convention (e.g.
org.altlinux.oursoftware.ourpackage), single-component package names are
unlikely. In Perl, package names are delimited with ':' (or "'", but that's
obscure), and single-component package names are common.

I haven't had time to master syntax of magic files, so I'll put it down in regex
parlour.

Here's a pattern to Java files:

package[[:space:]]+[A-Za-z][A-Za-z0-9]*\.[A-Za-z]

If that doesn't match, the following matches Perl modules:

package[[:space:]]+[A-Za-z]
Comment 5 Dmitry V. Levin 2004-03-09 01:13:28 MSK
The matcher in libmagic has string limit (32), so your regex is too long. 
 
This one line seems to be enough: 
0	regex	\^package[\ \	]+[A-Za-z][^.;]*;		Perl5 module source text 
 
Please create empty magic_file, add this line there and test with "file -m magic_file". 
Comment 6 Mikhail Zabaluev 2004-03-09 11:03:50 MSK
Tested with the pattern as suggested.
The regex doesn't mismatch Java files.
But .pm files that do contain "package" are all detected as "ASCII English text"
or "ASCII C++ program text". Tested on XML-LibXML-1.56.
Comment 7 Dmitry V. Levin 2004-03-09 14:46:59 MSK
Ok, fixed in file-4.07-alt3.