Team LiB   Previous Section   Next Section

9.3 Metacharacters

The following characters have special meaning in search patterns:

Character

Action

.

Match any single character except newline.

*

Match any number (or none) of the single character that immediately precedes it. The preceding character also can be a regular expression (e.g., since . (dot) means any character, .* means match any number of any character—except newlines).

^

Match the beginning of the line or string.

$

Match the end of the line or string.

[ ]

Match any one of the enclosed characters. A hyphen (-) indicates a range of consecutive characters. A circumflex (^) as the first character in the brackets reverses the sense: it matches any one character not in the list. A hyphen or close bracket (]) as the first character is treated as a member of the list. All other metacharacters are treated as members of the list.

[^ ]

Match anything except enclosed characters.

\{n,m\}

Match a range of occurrences of the single character that immediately precedes it. The preceding character also can be a regular expression. \{n\} matches exactly n occurrences, \{n,\} matches at least n occurrences, and \{n,m\} matches any number of occurrences between n and m.

{n,m}

Like \{n,m\}. Available in grep by default and in gawk with the -Wre-interval option.

\

Turn off the special meaning of the character that follows.

\( \)

Save the matched text enclosed between \( and \) in a special holding space. Up to nine patterns can be saved on a single line. They can be "replayed" in the same pattern or within substitutions by the escape sequences \1 to \9.

\n

Reuse matched text stored in nth \( \).

\<

Match the beginning of a word.

\>

Match the end of a word.

+

Match one or more instances of preceding regular expression.

?

Match zero or one instance of preceding regular expression.

|

Match the regular expression specified before or after.

( )

In egrep and gawk, group regular expressions.

Many utilities support POSIX character lists, which are useful for matching non-ASCII characters in languages other than English. These lists are recognized only within [ ] ranges. A typical use would be [[:lower:]], which in English is the same as [a-z].

The following table lists POSIX character lists:

Notation

Matches

[:alnum:]

Alphanumeric characters

[:alpha:]

Alphabetic characters, uppercase and lowercase

[:blank:]

Printable whitespace: spaces and tabs but not control characters

[:cntrl:]

Control characters, such as ^A through ^Z

[:digit:]

Decimal digits

[:graph:]

Printable characters, excluding whitespace

[:lower:]

Lowercase alphabetic characters

[:print:]

Printable characters, including whitespace but not control characters

[:punct:]

Punctuation, a subclass of printable characters

[:space:]

Whitespace, including spaces, tabs, and some control characters

[:upper:]

Uppercase alphabetic characters

[:xdigit:]

Hexadecimal digits

The following characters have special meaning in replacement patterns:

Character

Action

\

Turn off the special meaning of the character that follows.

\n

Restore the nth pattern previously saved by \( and \). n is a number from 1 to 9, matching the patterns searched sequentially from left to right.

&

Reuse the search pattern as part of the replacement pattern.

~

Reuse the previous replacement pattern in the current replacement pattern.

\e

End replacement pattern started by \L or \U.

\E

End replacement pattern started by \L or \U.

\l

Convert first character of replacement pattern to lowercase.

\L

Convert replacement pattern to lowercase.

\u

Convert first character of replacement pattern to uppercase.

\U

Convert replacement pattern to uppercase.

    Team LiB   Previous Section   Next Section