Team LiB   Previous Section   Next Section

1.7 Regular Expressions

JavaScript supports regular expressions for pattern matching with the same syntax as the Perl programming language. JavaScript 1.2 supports Perl 4 regular expressions, and JavaScript 1.5 adds supports for some of the additional features of Perl 5 regular expressions. A regular expression is specified literally in a JavaScript program as a sequence of characters within slash (/) characters, optionally followed by one or more of the modifier characters g (global search), i (case-insensitive search), and m (multi-line mode; a JavaScript 1.5 feature). In addition to this literal syntax, RegExp objects can be created with the RegExp( ) constructor, which accepts the pattern and modifier characters as string arguments, without the slash characters.

A full explanation of regular expression syntax is beyond the scope of this book, but the tables in the following subsections offer brief syntax summaries.

1.7.1 Literal characters

Letters, numbers, and most other characters are literals in a regular expression: they simply match themselves. As we'll see in the sections that follow, however, there are a number of punctuation characters and escape sequences (beginning with \) that have special meanings. The simplest of these escape sequences provide alternative ways of representing literal characters:

Character

Meaning

\n, \r, \t

Match literal newline, carriage return, tab

\\, \/, \*,\+, \?, etc.

Match a punctuation character literally, ignoring or escaping its special meaning

\xnn

The character with hexadecimal encoding nn.

\uxxxx

The Unicode character with hexadecimal encoding xxxx.

1.7.2 Character classes

Regular expression syntax uses square brackets to represent character sets or classes in a pattern. In addition, escape sequences define certain commonly-used character classes, as shown in the following table.

Character

Meaning

[...]

Match any one character between brackets

[^...]

Match any one character not between brackets

.

Match any character other than newline

\w, \W

Match any word/non-word character

\s, \S

Match any whitespace/non-whitespace

\d, \D

Match any digit/non-digit

1.7.3 Repetition

The following table shows regular expression syntax that controls the number of times that a match may be repeated.

Character

Meaning

?

Optional term; Match zero or one time

+

Match previous term one or more times

*

Match previous term zero or more times

{n}

Match previous term exactly n times

{n,}

Match previous term n or more times

{n,m}

Match at least n but no more than m times

In JavaScript 1.5, any of the repetition characters may be followed by a question mark to make them non-greedy, which means they match as few repetitions as possible while still allowing the complete pattern to match.

1.7.4 Grouping and alternation

Regular expressions use parentheses to group subexpressions, just as mathematical expressions do. Parentheses are useful, for example, to allow a repetition character to be applied to an entire subexpression. They are also useful with the | character, which is used to separate alternatives. Parenthesized groups have a special behavior: when a pattern match is found, the text that matches each group is saved and can be referred to by group number. The following table summarizes this syntax.

Character

Meaning

a | b

Match either a or b

(sub)

Group subexpression sub into a single term, and remember the text that it matched

(?:sub)

Group subexpression sub but do not number the group or remember the text it matches (JS 1.5)

\n

Match exactly the same characters that were matched by group number n

$n

In replacement strings, substitute the text that matched the nth subexpression

1.7.5 Anchoring match position

An anchor in a regular expression matches a position in a string (such as the beginning or the end of the string) without matching any of the characters of a string. It can be used to restrict (or anchor) a match to a specific position within a string.

Character

Meaning

^, $

Require match at beginning/end of a string, or in multiline mode, beginning/end of a line

\b, \B

Require match at a word boundary/non-boundary

(?=p)

Look-ahead assertion: require that the following characters match the pattern p, but do not include them in the match. (JS 1.5)

(?!p)

Negative look-ahead assertion: require that the following characters do not match the pattern p. (JS 1.5)

    Team LiB   Previous Section   Next Section