Lines 30-36
Link Here
|
30 |
@copying |
30 |
@copying |
31 |
This manual is for @command{grep}, a pattern matching engine. |
31 |
This manual is for @command{grep}, a pattern matching engine. |
32 |
|
32 |
|
33 |
Copyright @copyright{} 1999--2002, 2005, 2008--2022 Free Software Foundation, |
33 |
Copyright @copyright{} 1999--2002, 2005, 2008--2023 Free Software Foundation, |
34 |
Inc. |
34 |
Inc. |
35 |
|
35 |
|
36 |
@quotation |
36 |
@quotation |
Lines 202-207
Link Here
|
202 |
Obtain patterns from @var{file}, one per line. |
202 |
Obtain patterns from @var{file}, one per line. |
203 |
If this option is used multiple times or is combined with the |
203 |
If this option is used multiple times or is combined with the |
204 |
@option{-e} (@option{--regexp}) option, search for all patterns given. |
204 |
@option{-e} (@option{--regexp}) option, search for all patterns given. |
|
|
205 |
When @var{file} is @samp{-}, read patterns from standard input. |
205 |
The empty file contains zero patterns, and therefore matches nothing. |
206 |
The empty file contains zero patterns, and therefore matches nothing. |
206 |
(@option{-f} is specified by POSIX.) |
207 |
(@option{-f} is specified by POSIX.) |
207 |
|
208 |
|
Lines 223-229
Link Here
|
223 |
it yields ``S''. Another example: the lowercase German letter ``ß'' |
224 |
it yields ``S''. Another example: the lowercase German letter ``ß'' |
224 |
(U+00DF, LATIN SMALL LETTER SHARP S) is normally capitalized as the |
225 |
(U+00DF, LATIN SMALL LETTER SHARP S) is normally capitalized as the |
225 |
two-character string ``SS'' but it does not match ``SS'', and it might |
226 |
two-character string ``SS'' but it does not match ``SS'', and it might |
226 |
not match the uppercase letter ``ẞ'' (U+1E9E, LATIN CAPITAL LETTER |
227 |
not match the uppercase letter |
|
|
228 |
@c texinfo version 2023-03-04.12 complains about the following, saying |
229 |
@c "Character missing, sorry: LONG S." For now, omit it if tex. |
230 |
@ifnottex |
231 |
``ẞ'' |
232 |
@end ifnottex |
233 |
(U+1E9E, LATIN CAPITAL LETTER |
227 |
SHARP S) even though lowercasing the latter yields the former. |
234 |
SHARP S) even though lowercasing the latter yields the former. |
228 |
|
235 |
|
229 |
@option{-y} is an obsolete synonym that is provided for compatibility. |
236 |
@option{-y} is an obsolete synonym that is provided for compatibility. |
Lines 265-272
Link Here
|
265 |
regular expression with @samp{\<} and @samp{\>}. For example, although |
272 |
regular expression with @samp{\<} and @samp{\>}. For example, although |
266 |
@samp{grep -w @@} matches a line containing only @samp{@@}, @samp{grep |
273 |
@samp{grep -w @@} matches a line containing only @samp{@@}, @samp{grep |
267 |
'\<@@\>'} cannot match any line because @samp{@@} is not a |
274 |
'\<@@\>'} cannot match any line because @samp{@@} is not a |
268 |
word constituent. @xref{The Backslash Character and Special |
275 |
word constituent. @xref{Special Backslash Expressions}. |
269 |
Expressions}. |
|
|
270 |
|
276 |
|
271 |
@item -x |
277 |
@item -x |
272 |
@itemx --line-regexp |
278 |
@itemx --line-regexp |
Lines 301-307
Link Here
|
301 |
@opindex --color |
307 |
@opindex --color |
302 |
@opindex --colour |
308 |
@opindex --colour |
303 |
@cindex highlight, color, colour |
309 |
@cindex highlight, color, colour |
304 |
Surround the matched (non-empty) strings, matching lines, context lines, |
310 |
Surround matched non-empty strings, matching lines, context lines, |
305 |
file names, line numbers, byte offsets, and separators (for fields and |
311 |
file names, line numbers, byte offsets, and separators (for fields and |
306 |
groups of context lines) with escape sequences to display them in color |
312 |
groups of context lines) with escape sequences to display them in color |
307 |
on the terminal. |
313 |
on the terminal. |
Lines 309-319
Link Here
|
309 |
and default to @samp{ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36} |
315 |
and default to @samp{ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36} |
310 |
for bold red matched text, magenta file names, green line numbers, |
316 |
for bold red matched text, magenta file names, green line numbers, |
311 |
green byte offsets, cyan separators, and default terminal colors otherwise. |
317 |
green byte offsets, cyan separators, and default terminal colors otherwise. |
312 |
The deprecated environment variable @env{GREP_COLOR} is still supported, |
318 |
@xref{Environment Variables}. |
313 |
but its setting does not have priority; |
319 |
|
314 |
it defaults to @samp{01;31} (bold red) |
320 |
@var{WHEN} is @samp{always} to use colors, @samp{never} to not use |
315 |
which only covers the color for matched text. |
321 |
colors, or @samp{auto} to use colors if standard output is associated |
316 |
@var{WHEN} is @samp{never}, @samp{always}, or @samp{auto}. |
322 |
with a terminal device and the @env{TERM} environment variable's value |
|
|
323 |
suggests that the terminal supports colors. |
324 |
Plain @option{--color} is treated like @option{--color=auto}; |
325 |
if no @option{--color} option is given, the default is @option{--color=never}. |
317 |
|
326 |
|
318 |
@item -L |
327 |
@item -L |
319 |
@itemx --files-without-match |
328 |
@itemx --files-without-match |
Lines 341-346
Link Here
|
341 |
@opindex --max-count |
350 |
@opindex --max-count |
342 |
@cindex max-count |
351 |
@cindex max-count |
343 |
Stop after the first @var{num} selected lines. |
352 |
Stop after the first @var{num} selected lines. |
|
|
353 |
If @var{num} is zero, @command{grep} stops right away without reading input. |
354 |
A @var{num} of @minus{}1 is treated as infinity and @command{grep} |
355 |
does not stop; this is the default. |
356 |
|
344 |
If the input is standard input from a regular file, |
357 |
If the input is standard input from a regular file, |
345 |
and @var{num} selected lines are output, |
358 |
and @var{num} selected lines are output, |
346 |
@command{grep} ensures that the standard input is positioned |
359 |
@command{grep} ensures that the standard input is positioned |
Lines 381-387
Link Here
|
381 |
@opindex -o |
394 |
@opindex -o |
382 |
@opindex --only-matching |
395 |
@opindex --only-matching |
383 |
@cindex only matching |
396 |
@cindex only matching |
384 |
Print only the matched (non-empty) parts of matching lines, |
397 |
Print only the matched non-empty parts of matching lines, |
385 |
with each such part on a separate output line. |
398 |
with each such part on a separate output line. |
386 |
Output lines use the same delimiters as input, and delimiters are null |
399 |
Output lines use the same delimiters as input, and delimiters are null |
387 |
bytes if @option{-z} (@option{--null-data}) is also used (@pxref{Other |
400 |
bytes if @option{-z} (@option{--null-data}) is also used (@pxref{Other |
Lines 813-820
Link Here
|
813 |
@node Environment Variables |
826 |
@node Environment Variables |
814 |
@section Environment Variables |
827 |
@section Environment Variables |
815 |
|
828 |
|
816 |
The behavior of @command{grep} is affected |
829 |
The behavior of @command{grep} is affected by several environment |
817 |
by the following environment variables. |
830 |
variables, the most important of which control the locale, which |
|
|
831 |
specifies how @command{grep} interprets characters in its patterns and |
832 |
data. |
818 |
|
833 |
|
819 |
@vindex LANGUAGE @r{environment variable} |
834 |
@vindex LANGUAGE @r{environment variable} |
820 |
@vindex LC_ALL @r{environment variable} |
835 |
@vindex LC_ALL @r{environment variable} |
Lines 826-833
Link Here
|
826 |
in that order. |
841 |
in that order. |
827 |
The first of these variables that is set specifies the locale. |
842 |
The first of these variables that is set specifies the locale. |
828 |
For example, if @env{LC_ALL} is not set, |
843 |
For example, if @env{LC_ALL} is not set, |
829 |
but @env{LC_COLLATE} is set to @samp{pt_BR}, |
844 |
but @env{LC_COLLATE} is set to @samp{pt_BR.UTF-8}, |
830 |
then the Brazilian Portuguese locale is used |
845 |
then a Brazilian Portuguese locale is used |
831 |
for the @env{LC_COLLATE} category. |
846 |
for the @env{LC_COLLATE} category. |
832 |
As a special case for @env{LC_MESSAGES} only, the environment variable |
847 |
As a special case for @env{LC_MESSAGES} only, the environment variable |
833 |
@env{LANGUAGE} can contain a colon-separated list of languages that |
848 |
@env{LANGUAGE} can contain a colon-separated list of languages that |
Lines 839-845
Link Here
|
839 |
with national language support (NLS). |
854 |
with national language support (NLS). |
840 |
The shell command @code{locale -a} lists locales that are currently available. |
855 |
The shell command @code{locale -a} lists locales that are currently available. |
841 |
|
856 |
|
842 |
Many of the environment variables in the following list let you |
857 |
@cindex environment variables |
|
|
858 |
The following environment variables affect the behavior of @command{grep}. |
859 |
|
860 |
@table @env |
861 |
|
862 |
@item GREP_COLOR |
863 |
@vindex GREP_COLOR @r{environment variable} |
864 |
@cindex highlight markers |
865 |
This obsolescent variable interacts with @env{GREP_COLORS} |
866 |
confusingly, and @command{grep} warns if it is set and is not |
867 |
overridden by @env{GREP_COLORS}. Instead of |
868 |
@samp{GREP_COLOR='@var{color}'}, you can use |
869 |
@samp{GREP_COLORS='mt=@var{color}'}. |
870 |
|
871 |
@item GREP_COLORS |
872 |
@vindex GREP_COLORS @r{environment variable} |
873 |
@cindex highlight markers |
874 |
This variable controls how the @option{--color} option highlights output. |
875 |
Its value is a colon-separated list of @code{terminfo} capabilities |
876 |
that defaults to @samp{ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36} |
877 |
with the @samp{rv} and @samp{ne} boolean capabilities omitted (i.e., false). |
878 |
The two-letter capability names |
879 |
refer to terminal ``capabilities,'' the ability |
880 |
of a terminal to highlight text, or change its color, and so on. |
881 |
These capabilities are stored in an online database and accessed by |
882 |
the @code{terminfo} library. |
883 |
Non-empty capability values |
843 |
control highlighting using |
884 |
control highlighting using |
844 |
Select Graphic Rendition (SGR) |
885 |
Select Graphic Rendition (SGR) |
845 |
commands interpreted by the terminal or terminal emulator. |
886 |
commands interpreted by the terminal or terminal emulator. |
Lines 867-904
Link Here
|
867 |
and @samp{48;5;0} to @samp{48;5;255} |
908 |
and @samp{48;5;0} to @samp{48;5;255} |
868 |
for 88-color and 256-color modes background colors. |
909 |
for 88-color and 256-color modes background colors. |
869 |
|
910 |
|
870 |
The two-letter names used in the @env{GREP_COLORS} environment variable |
|
|
871 |
(and some of the others) refer to terminal ``capabilities,'' the ability |
872 |
of a terminal to highlight text, or change its color, and so on. |
873 |
These capabilities are stored in an online database and accessed by |
874 |
the @code{terminfo} library. |
875 |
|
876 |
@cindex environment variables |
877 |
|
878 |
@table @env |
879 |
|
880 |
@item GREP_COLOR |
881 |
@vindex GREP_COLOR @r{environment variable} |
882 |
@cindex highlight markers |
883 |
This variable specifies the color used to highlight matched (non-empty) text. |
884 |
It is deprecated in favor of @env{GREP_COLORS}, but still supported. |
885 |
The @samp{mt}, @samp{ms}, and @samp{mc} capabilities of @env{GREP_COLORS} |
886 |
have priority over it. |
887 |
It can only specify the color used to highlight |
888 |
the matching non-empty text in any matching line |
889 |
(a selected line when the @option{-v} command-line option is omitted, |
890 |
or a context line when @option{-v} is specified). |
891 |
The default is @samp{01;31}, |
892 |
which means a bold red foreground text on the terminal's default background. |
893 |
|
894 |
@item GREP_COLORS |
895 |
@vindex GREP_COLORS @r{environment variable} |
896 |
@cindex highlight markers |
897 |
This variable specifies the colors and other attributes |
898 |
used to highlight various parts of the output. |
899 |
Its value is a colon-separated list of @code{terminfo} capabilities |
900 |
that defaults to @samp{ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36} |
901 |
with the @samp{rv} and @samp{ne} boolean capabilities omitted (i.e., false). |
902 |
Supported capabilities are as follows. |
911 |
Supported capabilities are as follows. |
903 |
|
912 |
|
904 |
@table @code |
913 |
@table @code |
Lines 1009-1015
Link Here
|
1009 |
@cindex national language support |
1018 |
@cindex national language support |
1010 |
@cindex NLS |
1019 |
@cindex NLS |
1011 |
These variables specify the locale for the @env{LC_COLLATE} category, |
1020 |
These variables specify the locale for the @env{LC_COLLATE} category, |
1012 |
which might affect how range expressions like @samp{[a-z]} are |
1021 |
which might affect how range expressions like @samp{a-z} are |
1013 |
interpreted. |
1022 |
interpreted. |
1014 |
|
1023 |
|
1015 |
@item LC_ALL |
1024 |
@item LC_ALL |
Lines 1052-1071
Link Here
|
1052 |
by default, |
1061 |
by default, |
1053 |
such options are permuted to the front of the operand list |
1062 |
such options are permuted to the front of the operand list |
1054 |
and are treated as options. |
1063 |
and are treated as options. |
1055 |
Also, @env{POSIXLY_CORRECT} disables special handling of an |
|
|
1056 |
invalid bracket expression. @xref{invalid-bracket-expr}. |
1057 |
|
1064 |
|
1058 |
@item _@var{N}_GNU_nonoption_argv_flags_ |
1065 |
@item TERM |
1059 |
@vindex _@var{N}_GNU_nonoption_argv_flags_ @r{environment variable} |
1066 |
@vindex TERM @r{environment variable} |
1060 |
(Here @code{@var{N}} is @command{grep}'s numeric process ID.) |
1067 |
This variable specifies the output terminal type, which can affect |
1061 |
If the @var{i}th character of this environment variable's value is @samp{1}, |
1068 |
what the @option{--color} option does. @xref{General Output Control}. |
1062 |
do not consider the @var{i}th operand of @command{grep} to be an option, |
|
|
1063 |
even if it appears to be one. |
1064 |
A shell can put this variable in the environment for each command it runs, |
1065 |
specifying which operands are the results of file name wildcard expansion |
1066 |
and therefore should not be treated as options. |
1067 |
This behavior is available only with the GNU C library, |
1068 |
and only when @env{POSIXLY_CORRECT} is not set. |
1069 |
|
1069 |
|
1070 |
@end table |
1070 |
@end table |
1071 |
|
1071 |
|
Lines 1148-1153
Link Here
|
1148 |
@samp{grep@ -P} may warn of unimplemented features. |
1148 |
@samp{grep@ -P} may warn of unimplemented features. |
1149 |
@xref{Other Options}. |
1149 |
@xref{Other Options}. |
1150 |
|
1150 |
|
|
|
1151 |
For documentation, refer to @url{https://www.pcre.org/}, with these caveats: |
1152 |
@itemize |
1153 |
@item |
1154 |
@samp{\d} matches only the ten ASCII digits |
1155 |
(and @samp{\D} matches the complement), regardless of locale. |
1156 |
Use @samp{\p@{Nd@}} to also match non-ASCII digits. |
1157 |
(The behavior of @samp{\d} and @samp{\D} is unspecified after |
1158 |
in-regexp directives like @samp{(?aD)}.) |
1159 |
|
1160 |
@item |
1161 |
Although PCRE tracks the syntax and semantics of Perl's regular |
1162 |
expressions, the match is not always exact. For example, Perl |
1163 |
evolves and a Perl installation may predate or postdate the PCRE2 |
1164 |
installation on the same host, or their Unicode versions may differ, |
1165 |
or Perl and PCRE2 may disagree about an obscure construct. |
1166 |
|
1167 |
@item |
1168 |
By default, @command{grep} applies each regexp to a line at a time, |
1169 |
so the @samp{(?s)} directive (making @samp{.} match line breaks) |
1170 |
is generally ineffective. |
1171 |
However, with @option{-z} (@option{--null-data}) it can work: |
1172 |
@example |
1173 |
$ printf 'a\nb\n' |grep -zP '(?s)a.b' |
1174 |
a |
1175 |
b |
1176 |
@end example |
1177 |
But beware: with the @option{-z} (@option{--null-data}) and a file |
1178 |
containing no NUL byte, grep must read the entire file into memory |
1179 |
before processing any of it. |
1180 |
Thus, it will exhaust memory and fail for some large files. |
1181 |
@end itemize |
1182 |
|
1151 |
@end table |
1183 |
@end table |
1152 |
|
1184 |
|
1153 |
|
1185 |
|
Lines 1162-1183
Link Here
|
1162 |
three different versions of regular expression syntax: |
1194 |
three different versions of regular expression syntax: |
1163 |
basic (BRE), extended (ERE), and Perl-compatible (PCRE). |
1195 |
basic (BRE), extended (ERE), and Perl-compatible (PCRE). |
1164 |
In GNU @command{grep}, |
1196 |
In GNU @command{grep}, |
1165 |
there is no difference in available functionality between the basic and |
1197 |
basic and extended regular expressions are merely different notations |
1166 |
extended syntaxes. |
1198 |
for the same pattern-matching functionality. |
1167 |
In other implementations, basic regular expressions are less powerful. |
1199 |
In other implementations, basic regular expressions are ordinarily |
|
|
1200 |
less powerful than extended, though occasionally it is the other way around. |
1168 |
The following description applies to extended regular expressions; |
1201 |
The following description applies to extended regular expressions; |
1169 |
differences for basic regular expressions are summarized afterwards. |
1202 |
differences for basic regular expressions are summarized afterwards. |
1170 |
Perl-compatible regular expressions give additional functionality, and |
1203 |
Perl-compatible regular expressions have different functionality, and |
1171 |
are documented in the @i{pcre2syntax}(3) and @i{pcre2pattern}(3) manual |
1204 |
are documented in the @i{pcre2syntax}(3) and @i{pcre2pattern}(3) manual |
1172 |
pages, but work only if PCRE is available in the system. |
1205 |
pages, but work only if PCRE is available in the system. |
1173 |
|
1206 |
|
1174 |
@menu |
1207 |
@menu |
1175 |
* Fundamental Structure:: |
1208 |
* Fundamental Structure:: |
1176 |
* Character Classes and Bracket Expressions:: |
1209 |
* Character Classes and Bracket Expressions:: |
1177 |
* The Backslash Character and Special Expressions:: |
1210 |
* Special Backslash Expressions:: |
1178 |
* Anchoring:: |
1211 |
* Anchoring:: |
1179 |
* Back-references and Subexpressions:: |
1212 |
* Back-references and Subexpressions:: |
1180 |
* Basic vs Extended:: |
1213 |
* Basic vs Extended:: |
|
|
1214 |
* Problematic Expressions:: |
1181 |
* Character Encoding:: |
1215 |
* Character Encoding:: |
1182 |
* Matching Non-ASCII:: |
1216 |
* Matching Non-ASCII:: |
1183 |
@end menu |
1217 |
@end menu |
Lines 1257-1265
Link Here
|
1257 |
matches any string formed by concatenating two substrings |
1291 |
matches any string formed by concatenating two substrings |
1258 |
that respectively match the concatenated expressions. |
1292 |
that respectively match the concatenated expressions. |
1259 |
|
1293 |
|
1260 |
Two regular expressions may be joined by the infix operator @samp{|}; |
1294 |
@cindex alternatives in regular expressions |
1261 |
the resulting regular expression |
1295 |
Two regular expressions may be joined by the infix operator @samp{|}. |
1262 |
matches any string matching either alternate expression. |
1296 |
The resulting regular expression matches any string matching either of |
|
|
1297 |
the two expressions, which are called @dfn{alternatives}. |
1263 |
|
1298 |
|
1264 |
Repetition takes precedence over concatenation, |
1299 |
Repetition takes precedence over concatenation, |
1265 |
which in turn takes precedence over alternation. |
1300 |
which in turn takes precedence over alternation. |
Lines 1267-1272
Link Here
|
1267 |
to override these precedence rules and form a subexpression. |
1302 |
to override these precedence rules and form a subexpression. |
1268 |
An unmatched @samp{)} matches just itself. |
1303 |
An unmatched @samp{)} matches just itself. |
1269 |
|
1304 |
|
|
|
1305 |
Not every character string is a valid regular expression. |
1306 |
@xref{Problematic Expressions}. |
1307 |
|
1270 |
@node Character Classes and Bracket Expressions |
1308 |
@node Character Classes and Bracket Expressions |
1271 |
@section Character Classes and Bracket Expressions |
1309 |
@section Character Classes and Bracket Expressions |
1272 |
|
1310 |
|
Lines 1294-1300
Link Here
|
1294 |
In other locales, the sorting sequence is not specified, and |
1332 |
In other locales, the sorting sequence is not specified, and |
1295 |
@samp{[a-d]} might be equivalent to @samp{[abcd]} or to |
1333 |
@samp{[a-d]} might be equivalent to @samp{[abcd]} or to |
1296 |
@samp{[aBbCcDd]}, or it might fail to match any character, or the set of |
1334 |
@samp{[aBbCcDd]}, or it might fail to match any character, or the set of |
1297 |
characters that it matches might even be erratic. |
1335 |
characters that it matches might be erratic, or it might be invalid. |
1298 |
To obtain the traditional interpretation |
1336 |
To obtain the traditional interpretation |
1299 |
of bracket expressions, you can use the @samp{C} locale by setting the |
1337 |
of bracket expressions, you can use the @samp{C} locale by setting the |
1300 |
@env{LC_ALL} environment variable to the value @samp{C}. |
1338 |
@env{LC_ALL} environment variable to the value @samp{C}. |
Lines 1397-1408
Link Here
|
1397 |
part of the symbolic names, and must be included in addition to |
1435 |
part of the symbolic names, and must be included in addition to |
1398 |
the brackets delimiting the bracket expression. |
1436 |
the brackets delimiting the bracket expression. |
1399 |
|
1437 |
|
1400 |
@anchor{invalid-bracket-expr} |
|
|
1401 |
If you mistakenly omit the outer brackets, and search for say, @samp{[:upper:]}, |
1438 |
If you mistakenly omit the outer brackets, and search for say, @samp{[:upper:]}, |
1402 |
GNU @command{grep} prints a diagnostic and exits with status 2, on |
1439 |
GNU @command{grep} prints a diagnostic and exits with status 2, on |
1403 |
the assumption that you did not intend to search for the nominally |
1440 |
the assumption that you did not intend to search for the |
1404 |
equivalent regular expression: @samp{[:epru]}. |
1441 |
regular expression @samp{[:epru]}. |
1405 |
Set the @env{POSIXLY_CORRECT} environment variable to disable this feature. |
|
|
1406 |
|
1442 |
|
1407 |
Special characters lose their special meaning inside bracket expressions. |
1443 |
Special characters lose their special meaning inside bracket expressions. |
1408 |
|
1444 |
|
Lines 1433-1439
Link Here
|
1433 |
|
1469 |
|
1434 |
@item - |
1470 |
@item - |
1435 |
represents the range if it's not first or last in a list or the ending point |
1471 |
represents the range if it's not first or last in a list or the ending point |
1436 |
of a range. |
1472 |
of a range. To make the @samp{-} a list item, it is best to put it last. |
1437 |
|
1473 |
|
1438 |
@item ^ |
1474 |
@item ^ |
1439 |
represents the characters not in the list. |
1475 |
represents the characters not in the list. |
Lines 1442-1449
Link Here
|
1442 |
|
1478 |
|
1443 |
@end table |
1479 |
@end table |
1444 |
|
1480 |
|
1445 |
@node The Backslash Character and Special Expressions |
1481 |
@node Special Backslash Expressions |
1446 |
@section The Backslash Character and Special Expressions |
1482 |
@section Special Backslash Expressions |
1447 |
@cindex backslash |
1483 |
@cindex backslash |
1448 |
|
1484 |
|
1449 |
The @samp{\} character followed by a special character is a regular |
1485 |
The @samp{\} character followed by a special character is a regular |
Lines 1478-1488
Link Here
|
1478 |
@item \S |
1514 |
@item \S |
1479 |
Match non-whitespace, it is a synonym for @samp{[^[:space:]]}. |
1515 |
Match non-whitespace, it is a synonym for @samp{[^[:space:]]}. |
1480 |
|
1516 |
|
|
|
1517 |
@item \] |
1518 |
Match @samp{]}. |
1519 |
|
1520 |
@item \@} |
1521 |
Match @samp{@}}. |
1522 |
|
1481 |
@end table |
1523 |
@end table |
1482 |
|
1524 |
|
1483 |
For example, @samp{\brat\b} matches the separate word @samp{rat}, |
1525 |
For example, @samp{\brat\b} matches the separate word @samp{rat}, |
1484 |
@samp{\Brat\B} matches @samp{crate} but not @samp{furry rat}. |
1526 |
@samp{\Brat\B} matches @samp{crate} but not @samp{furry rat}. |
1485 |
|
1527 |
|
|
|
1528 |
The behavior of @command{grep} is unspecified if a unescaped backslash |
1529 |
is not followed by a special character, a nonzero digit, or a |
1530 |
character in the above list. Although @command{grep} might issue a |
1531 |
diagnostic and/or give the backslash an interpretation now, its |
1532 |
behavior may change if the syntax of regular expressions is extended |
1533 |
in future versions. |
1534 |
|
1486 |
@node Anchoring |
1535 |
@node Anchoring |
1487 |
@section Anchoring |
1536 |
@section Anchoring |
1488 |
@cindex anchoring |
1537 |
@cindex anchoring |
Lines 1518-1565
Link Here
|
1518 |
@section Basic vs Extended Regular Expressions |
1567 |
@section Basic vs Extended Regular Expressions |
1519 |
@cindex basic regular expressions |
1568 |
@cindex basic regular expressions |
1520 |
|
1569 |
|
1521 |
In basic regular expressions the characters @samp{?}, @samp{+}, |
1570 |
Basic regular expressions differ from extended regular expressions |
|
|
1571 |
in the following ways: |
1572 |
|
1573 |
@itemize |
1574 |
@item |
1575 |
The characters @samp{?}, @samp{+}, |
1522 |
@samp{@{}, @samp{|}, @samp{(}, and @samp{)} lose their special meaning; |
1576 |
@samp{@{}, @samp{|}, @samp{(}, and @samp{)} lose their special meaning; |
1523 |
instead use the backslashed versions @samp{\?}, @samp{\+}, @samp{\@{}, |
1577 |
instead use the backslashed versions @samp{\?}, @samp{\+}, @samp{\@{}, |
1524 |
@samp{\|}, @samp{\(}, and @samp{\)}. Also, a backslash is needed |
1578 |
@samp{\|}, @samp{\(}, and @samp{\)}. Also, a backslash is needed |
1525 |
before an interval expression's closing @samp{@}}, and an unmatched |
1579 |
before an interval expression's closing @samp{@}}. |
1526 |
@code{\)} is invalid. |
1580 |
|
|
|
1581 |
@item |
1582 |
An unmatched @samp{\)} is invalid. |
1583 |
|
1584 |
@item |
1585 |
If an unescaped @samp{^} appears neither first, nor directly after |
1586 |
@samp{\(} or @samp{\|}, it is treated like an ordinary character and |
1587 |
is not an anchor. |
1588 |
|
1589 |
@item |
1590 |
If an unescaped @samp{$} appears neither last, nor directly before |
1591 |
@samp{\|} or @samp{\)}, it is treated like an ordinary character and |
1592 |
is not an anchor. |
1593 |
|
1594 |
@item |
1595 |
If an unescaped @samp{*} appears first, or appears directly after |
1596 |
@samp{\(} or @samp{\|} or anchoring @samp{^}, it is treated like an |
1597 |
ordinary character and is not a repetition operator. |
1598 |
@end itemize |
1599 |
|
1600 |
@node Problematic Expressions |
1601 |
@section Problematic Regular Expressions |
1527 |
|
1602 |
|
1528 |
Portable scripts should avoid the following constructs, as |
1603 |
@cindex invalid regular expressions |
1529 |
POSIX says they produce unspecified results: |
1604 |
@cindex unspecified behavior in regular expressions |
|
|
1605 |
Some strings are @dfn{invalid regular expressions} and cause |
1606 |
@command{grep} to issue a diagnostic and fail. For example, @samp{xy\1} |
1607 |
is invalid because there is no parenthesized subexpression for the |
1608 |
back-reference @samp{\1} to refer to. |
1609 |
|
1610 |
Also, some regular expressions have @dfn{unspecified behavior} and |
1611 |
should be avoided even if @command{grep} does not currently diagnose |
1612 |
them. For example, @samp{xy\0} has unspecified behavior because |
1613 |
@samp{0} is not a special character and @samp{\0} is not a special |
1614 |
backslash expression (@pxref{Special Backslash Expressions}). |
1615 |
Unspecified behavior can be particularly problematic because the set |
1616 |
of matched strings might be only partially specified, or not be |
1617 |
specified at all, or the expression might even be invalid. |
1618 |
|
1619 |
The following regular expression constructs are invalid on all |
1620 |
platforms conforming to POSIX, so portable scripts can assume that |
1621 |
@command{grep} rejects these constructs: |
1530 |
|
1622 |
|
1531 |
@itemize @bullet |
1623 |
@itemize @bullet |
1532 |
@item |
1624 |
@item |
1533 |
Extended regular expressions that use back-references. |
1625 |
A basic regular expression containing a back-reference @samp{\@var{n}} |
|
|
1626 |
preceded by fewer than @var{n} closing parentheses. For example, |
1627 |
@samp{\(a\)\2} is invalid. |
1628 |
|
1534 |
@item |
1629 |
@item |
1535 |
Basic regular expressions that use @samp{\?}, @samp{\+}, or @samp{\|}. |
1630 |
A bracket expression containing @samp{[:} that does not start a |
|
|
1631 |
character class; and similarly for @samp{[=} and @samp{[.}. For |
1632 |
example, @samp{[a[:b]} and @samp{[a[:ouch:]b]} are invalid. |
1633 |
@end itemize |
1634 |
|
1635 |
GNU @command{grep} treats the following constructs as invalid. |
1636 |
However, other @command{grep} implementations might allow them, so |
1637 |
portable scripts should not rely on their being invalid: |
1638 |
|
1639 |
@itemize @bullet |
1536 |
@item |
1640 |
@item |
1537 |
Empty parenthesized regular expressions like @samp{()}. |
1641 |
Unescaped @samp{\} at the end of a regular expression. |
|
|
1642 |
|
1538 |
@item |
1643 |
@item |
1539 |
Empty alternatives (as in, e.g, @samp{a|}). |
1644 |
Unescaped @samp{[} that does not start a bracket expression. |
|
|
1645 |
|
1540 |
@item |
1646 |
@item |
1541 |
Repetition operators that immediately follow empty expressions, |
1647 |
A @samp{\@{} in a basic regular expression that does not start an |
1542 |
unescaped @samp{$}, or other repetition operators. |
1648 |
interval expression. |
|
|
1649 |
|
1543 |
@item |
1650 |
@item |
1544 |
Interval expressions containing repetition counts greater than 255. |
1651 |
A basic regular expression with unbalanced @samp{\(} or @samp{\)}, |
|
|
1652 |
or an extended regular expression with unbalanced @samp{(}. |
1653 |
|
1545 |
@item |
1654 |
@item |
1546 |
A backslash escaping an ordinary character (e.g., @samp{\S}), |
1655 |
In the POSIX locale, a range expression like @samp{z-a} that |
1547 |
unless it is a back-reference. |
1656 |
represents zero elements. A non-GNU @command{grep} might treat it as |
|
|
1657 |
a valid range that never matches. |
1658 |
|
1548 |
@item |
1659 |
@item |
1549 |
An unescaped @samp{[} that is not part of a bracket expression. |
1660 |
An interval expression with a repetition count greater than 32767. |
|
|
1661 |
(The portable POSIX limit is 255, and even interval expressions with |
1662 |
smaller counts can be impractically slow on all known implementations.) |
1663 |
|
1550 |
@item |
1664 |
@item |
1551 |
In extended regular expressions, an unescaped @samp{@{} that is not |
1665 |
A bracket expression that contains at least three elements, the first |
1552 |
part of an interval expression. |
1666 |
and last of which are both @samp{:}, or both @samp{.}, or both |
|
|
1667 |
@samp{=}. For example, a non-GNU @command{grep} might treat |
1668 |
@samp{[:alpha:]} like @samp{[[:alpha:]]}, or like @samp{[:ahlp]}. |
1553 |
@end itemize |
1669 |
@end itemize |
1554 |
|
1670 |
|
1555 |
@cindex interval expressions |
1671 |
The following constructs have well-defined behavior in GNU |
1556 |
GNU @samp{grep@ -E} treats @samp{@{} as special |
1672 |
@command{grep}. However, they have unspecified behavior elsewhere, so |
1557 |
only if it begins a valid interval expression. |
1673 |
portable scripts should avoid them: |
1558 |
For example, the command |
1674 |
|
1559 |
@samp{grep@ -E@ '@{1'} searches for the two-character string @samp{@{1} |
1675 |
@itemize @bullet |
1560 |
instead of reporting a syntax error in the regular expression. |
1676 |
@item |
1561 |
POSIX allows this behavior as an extension, but portable scripts |
1677 |
Special backslash expressions like @samp{\b}, @samp{\<}, and @samp{\]}. |
1562 |
should avoid it. |
1678 |
@xref{Special Backslash Expressions}. |
|
|
1679 |
|
1680 |
@item |
1681 |
A basic regular expression that uses @samp{\?}, @samp{\+}, or @samp{\|}. |
1682 |
|
1683 |
@item |
1684 |
An extended regular expression that uses back-references. |
1685 |
|
1686 |
@item |
1687 |
An empty regular expression, subexpression, or alternative. For |
1688 |
example, @samp{(a|bc|)} is not portable; a portable equivalent is |
1689 |
@samp{(a|bc)?}. |
1690 |
|
1691 |
@item |
1692 |
In a basic regular expression, an anchoring @samp{^} that appears |
1693 |
directly after @samp{\(}, or an anchoring @samp{$} that appears |
1694 |
directly before @samp{\)}. |
1695 |
|
1696 |
@item |
1697 |
In a basic regular expression, a repetition operator that |
1698 |
directly follows another repetition operator. |
1699 |
|
1700 |
@item |
1701 |
In an extended regular expression, unescaped @samp{@{} |
1702 |
that does not begin a valid interval expression. |
1703 |
GNU @command{grep} treats the @samp{@{} as an ordinary character. |
1704 |
|
1705 |
@item |
1706 |
A null character or an encoding error in either pattern or input data. |
1707 |
@xref{Character Encoding}. |
1708 |
|
1709 |
@item |
1710 |
An input file that ends in a non-newline character, |
1711 |
where GNU @command{grep} silently supplies a newline. |
1712 |
@end itemize |
1713 |
|
1714 |
The following constructs have unspecified behavior, in both GNU |
1715 |
and other @command{grep} implementations. Scripts should avoid |
1716 |
them whenever possible. |
1717 |
|
1718 |
@itemize |
1719 |
@item |
1720 |
A backslash escaping an ordinary character, unless it is a |
1721 |
back-reference like @samp{\1} or a special backslash expression like |
1722 |
@samp{\<} or @samp{\b}. @xref{Special Backslash Expressions}. For |
1723 |
example, @samp{\x} has unspecified behavior now, and a future version |
1724 |
of @command{grep} might specify @samp{\x} to have a new behavior. |
1725 |
|
1726 |
@item |
1727 |
A repetition operator that appears directly after an anchor, or at the |
1728 |
start of a complete regular expression, parenthesized subexpression, |
1729 |
or alternative. For example, @samp{+|^*(+a|?-b)} has unspecified |
1730 |
behavior, whereas @samp{\+|^\*(\+a|\?-b)} is portable. |
1731 |
|
1732 |
@item |
1733 |
A range expression outside the POSIX locale. For example, in some |
1734 |
locales @samp{[a-z]} might match some characters that are not |
1735 |
lowercase letters, or might not match some lowercase letters, or might |
1736 |
be invalid. With GNU @command{grep} it is not documented whether |
1737 |
these range expressions use native code points, or use the collating |
1738 |
sequence specified by the @env{LC_COLLATE} category, or have some |
1739 |
other interpretation. Outside the POSIX locale, it is portable to use |
1740 |
@samp{[[:lower:]]} to match a lower-case letter, or |
1741 |
@samp{[abcdefghijklmnopqrstuvwxyz]} to match an ASCII lower-case |
1742 |
letter. |
1743 |
|
1744 |
@end itemize |
1563 |
|
1745 |
|
1564 |
@node Character Encoding |
1746 |
@node Character Encoding |
1565 |
@section Character Encoding |
1747 |
@section Character Encoding |
Lines 1865-1871
Link Here
|
1865 |
|
2047 |
|
1866 |
To match empty lines, use the pattern @samp{^$}. To match blank |
2048 |
To match empty lines, use the pattern @samp{^$}. To match blank |
1867 |
lines, use the pattern @samp{^[[:blank:]]*$}. To match no lines at |
2049 |
lines, use the pattern @samp{^[[:blank:]]*$}. To match no lines at |
1868 |
all, use the command @samp{grep -f /dev/null}. |
2050 |
all, use an extended regular expression like @samp{a^} or @samp{$a}. |
|
|
2051 |
To match every line, a portable script should use a pattern like |
2052 |
@samp{^} instead of the empty pattern, as POSIX does not specify the |
2053 |
behavior of the empty pattern. |
1869 |
|
2054 |
|
1870 |
@item |
2055 |
@item |
1871 |
How can I search in both standard input and in files? |
2056 |
How can I search in both standard input and in files? |
Lines 1947-1958
Link Here
|
1947 |
that were the counterparts of the modern @samp{grep -E} and @samp{grep -F}. |
2132 |
that were the counterparts of the modern @samp{grep -E} and @samp{grep -F}. |
1948 |
Although breaking up @command{grep} into three programs was perhaps |
2133 |
Although breaking up @command{grep} into three programs was perhaps |
1949 |
useful on the small computers of the 1970s, @command{egrep} and |
2134 |
useful on the small computers of the 1970s, @command{egrep} and |
1950 |
@command{fgrep} were not standardized by POSIX and are no longer needed. |
2135 |
@command{fgrep} were deemed obsolescent by POSIX in 1992, |
1951 |
In the current GNU implementation, @command{egrep} and @command{fgrep} |
2136 |
removed from POSIX in 2001, deprecated by GNU Grep 2.5.3 in 2007, |
1952 |
issue a warning and then act like their modern counterparts; |
2137 |
and changed to issue obsolescence warnings by GNU Grep 3.8 in 2022; |
1953 |
eventually, they are planned to be removed entirely. |
2138 |
eventually, they are planned to be removed entirely. |
1954 |
|
2139 |
|
1955 |
If you prefer the old names, you can use use your own substitutes, |
2140 |
If you prefer the old names, you can use your own substitutes, |
1956 |
such as a shell script named @command{egrep} with the following |
2141 |
such as a shell script named @command{egrep} with the following |
1957 |
contents: |
2142 |
contents: |
1958 |
|
2143 |
|