Bug 2348

Summary: Java source files misdetected as Perl
Product: Sisyphus Reporter: Mikhail Zabaluev <mhz>
Component: fileAssignee: placeholder <placeholder>
Status: CLOSED FIXED QA Contact: qa-sisyphus
Severity: normal    
Priority: P5 CC: at, glebfm, ldv, placeholder, vt
Version: unstable   
Hardware: all   
OS: Linux   
Attachments:
Description Flags
A file from the Jakarta log4j project none

Description Mikhail Zabaluev 2003-03-10 13:08:18 MSK
Huge majority of Java source files contain the keyword \&quot;package\&quot;. This word is treated as an indication of a Perl package.

---

---
I believe, the following line in the magic file is to blame:

0       string          package         Perl5 module source text


Comment 1 Dmitry V. Levin 2004-03-01 19:16:45 MSK
Could you attach an example, please. 
Comment 2 Mikhail Zabaluev 2004-03-07 14:18:58 MSK
Created attachment 356 [details]
A file from the Jakarta log4j project

file recognizes it as "Perl5 module source text".
Comment 3 Dmitry V. Levin 2004-03-08 13:06:46 MSK
When you comment this "Perl5 module source text" rule out, 
file will misdetect perl package files as "ASCII Java program text". 
Comment 4 Mikhail Zabaluev 2004-03-08 14:30:42 MSK
There should be a more elaborate heuristic.

Components of a Java package name are delimited with dots. Since
widely-available Java package names should be global by convention (e.g.
org.altlinux.oursoftware.ourpackage), single-component package names are
unlikely. In Perl, package names are delimited with ':' (or "'", but that's
obscure), and single-component package names are common.

I haven't had time to master syntax of magic files, so I'll put it down in regex
parlour.

Here's a pattern to Java files:

package[[:space:]]+[A-Za-z][A-Za-z0-9]*\.[A-Za-z]

If that doesn't match, the following matches Perl modules:

package[[:space:]]+[A-Za-z]
Comment 5 Dmitry V. Levin 2004-03-09 01:13:28 MSK
The matcher in libmagic has string limit (32), so your regex is too long. 
 
This one line seems to be enough: 
0	regex	\^package[\ \	]+[A-Za-z][^.;]*;		Perl5 module source text 
 
Please create empty magic_file, add this line there and test with "file -m magic_file". 
Comment 6 Mikhail Zabaluev 2004-03-09 11:03:50 MSK
Tested with the pattern as suggested.
The regex doesn't mismatch Java files.
But .pm files that do contain "package" are all detected as "ASCII English text"
or "ASCII C++ program text". Tested on XML-LibXML-1.56.
Comment 7 Dmitry V. Levin 2004-03-09 14:46:59 MSK
Ok, fixed in file-4.07-alt3.