1.7 Regular Expressions
JavaScript supports regular expressions for pattern matching with the
same syntax as the Perl programming language. JavaScript 1.2 supports
Perl 4 regular expressions, and JavaScript 1.5 adds supports for some
of the additional features of Perl 5 regular expressions. A regular
expression is specified literally in a JavaScript program as a
sequence of characters within slash (/)
characters, optionally followed by one or more of the modifier
characters g (global search), i
(case-insensitive search), and m (multi-line mode;
a JavaScript 1.5 feature). In addition to this literal syntax,
RegExp objects can be created with the
RegExp( ) constructor, which accepts the pattern
and modifier characters as string arguments, without the slash
characters.
A full explanation of regular expression syntax is beyond the scope
of this book, but the tables in the following subsections offer brief
syntax summaries.
1.7.1 Literal characters
Letters, numbers, and most other characters are literals in a regular
expression: they simply match themselves. As we'll
see in the sections that follow, however, there are a number of
punctuation characters and escape sequences (beginning with
\) that have special meanings. The simplest of
these escape sequences provide alternative ways of representing
literal characters:
\n, \r, \t
|
Match literal newline, carriage return, tab
|
\\, \/,
\*,\+, \?,
etc.
|
Match a punctuation character literally, ignoring or escaping its
special meaning
|
\xnn
|
The character with hexadecimal encoding nn.
|
\uxxxx
|
The Unicode character with hexadecimal encoding
xxxx.
|
1.7.2 Character classes
Regular expression syntax uses square brackets to represent character
sets or classes in a pattern. In addition, escape sequences define
certain commonly-used character classes, as shown in the following
table.
[...]
|
Match any one character between brackets
|
[^...]
|
Match any one character not between brackets
|
.
|
Match any character other than newline
|
\w, \W
|
Match any word/non-word character
|
\s, \S
|
Match any whitespace/non-whitespace
|
\d, \D
|
Match any digit/non-digit
|
1.7.3 Repetition
The following table shows regular expression syntax that controls the
number of times that a match may be repeated.
?
|
Optional term; Match zero or one time
|
+
|
Match previous term one or more times
|
*
|
Match previous term zero or more times
|
{n}
|
Match previous term exactly n times
|
{n,}
|
Match previous term n or more times
|
{n,m}
|
Match at least n but no more than
m times
|
In JavaScript 1.5, any of the repetition characters may be followed
by a question mark to make them non-greedy, which means they match as
few repetitions as possible while still allowing the complete pattern
to match.
1.7.4 Grouping and alternation
Regular expressions use parentheses to group subexpressions, just as
mathematical expressions do. Parentheses are useful, for example, to
allow a repetition character to be applied to an entire
subexpression. They are also useful with the |
character, which is used to separate alternatives. Parenthesized
groups have a special behavior: when a pattern match is found, the
text that matches each group is saved and can be referred to by group
number. The following table summarizes this syntax.
a |
b
|
Match either a or
b
|
(sub)
|
Group subexpression sub into a single
term, and remember the text that it matched
|
(?:sub)
|
Group subexpression sub but do not number
the group or remember the text it matches (JS 1.5)
|
\n
|
Match exactly the same characters that were matched by group number
n
|
$n
|
In replacement strings, substitute the text that matched the
nth subexpression
|
1.7.5 Anchoring match position
An anchor in a regular expression matches a
position in a string (such as the beginning or the end of the string)
without matching any of the characters of a string. It can be used to
restrict (or anchor) a match to a specific position within a string.
^, $
|
Require match at beginning/end of a string, or in multiline mode,
beginning/end of a line
|
\b, \B
|
Require match at a word boundary/non-boundary
|
(?=p)
|
Look-ahead assertion: require that the following characters match the
pattern p, but do not include them in the
match. (JS 1.5)
|
(?!p)
|
Negative look-ahead assertion: require that the following characters
do not match the pattern p. (JS 1.5)
|
|