9.3 Metacharacters
The following characters have special meaning in search patterns:
.
|
Match any single character except newline.
|
*
|
Match any number (or none) of the single character that immediately
precedes it. The preceding character also can be a regular expression
(e.g., since . (dot) means any character, .* means match any number of any
character—except newlines).
|
^
|
Match the beginning of the line or string.
|
$
|
Match the end of the line or string.
|
[ ]
|
Match any one of the enclosed characters. A hyphen (-) indicates a range of consecutive
characters. A circumflex (^) as the
first character in the brackets reverses the sense: it matches any
one character not in the list. A hyphen or close
bracket (]) as the first character
is treated as a member of the list. All other metacharacters are
treated as members of the list.
|
[^ ]
|
Match anything except enclosed characters.
|
\{n,m\}
|
Match a range of occurrences of the single character that immediately
precedes it. The preceding character also can be a regular
expression. \{n\} matches exactly
n occurrences, \{n,\} matches at least
n occurrences, and \{n,m\} matches any number of occurrences between
n and m.
|
{n,m}
|
Like \{n,m\}. Available in grep by default and in gawk with the -Wre-interval option.
|
\
|
Turn off the special meaning of the character that follows.
|
\( \)
|
Save the matched text enclosed between \( and \) in
a special holding space. Up to nine patterns can be saved on a single
line. They can be "replayed" in the
same pattern or within substitutions by the escape sequences
\1 to \9.
|
\n
|
Reuse matched text stored in nth \( \).
|
\<
|
Match the beginning of a word.
|
\>
|
Match the end of a word.
|
+
|
Match one or more instances of preceding regular expression.
|
?
|
Match zero or one instance of preceding regular expression.
|
|
|
Match the regular expression specified before or after.
|
( )
|
In egrep and gawk, group regular expressions.
|
Many utilities support POSIX character
lists, which are useful for matching non-ASCII characters in
languages other than English. These lists are recognized only within
[ ] ranges. A typical use would be
[[:lower:]], which in English is the
same as [a-z].
The following table lists POSIX character lists:
[:alnum:]
|
Alphanumeric characters
|
[:alpha:]
|
Alphabetic characters, uppercase and lowercase
|
[:blank:]
|
Printable whitespace: spaces and tabs but not control characters
|
[:cntrl:]
|
Control characters, such as ^A
through ^Z
|
[:digit:]
|
Decimal digits
|
[:graph:]
|
Printable characters, excluding whitespace
|
[:lower:]
|
Lowercase alphabetic characters
|
[:print:]
|
Printable characters, including whitespace but not control characters
|
[:punct:]
|
Punctuation, a subclass of printable characters
|
[:space:]
|
Whitespace, including spaces, tabs, and some control characters
|
[:upper:]
|
Uppercase alphabetic characters
|
[:xdigit:]
|
Hexadecimal digits
|
The following characters have special meaning in replacement patterns:
\
|
Turn off the special meaning of the character that follows.
|
\n
|
Restore the nth pattern previously saved by
\( and \). n is a number from 1
to 9, matching the patterns searched sequentially from left to right.
|
&
|
Reuse the search pattern as part of the replacement pattern.
|
~
|
Reuse the previous replacement pattern in the current replacement
pattern.
|
\e
|
End replacement pattern started by \L or \U.
|
\E
|
End replacement pattern started by \L or \U.
|
\l
|
Convert first character of replacement pattern to lowercase.
|
\L
|
Convert replacement pattern to lowercase.
|
\u
|
Convert first character of replacement pattern to uppercase.
|
\U
|
Convert replacement pattern to uppercase.
|
|