1.10 JavaScript
JavaScript
introduced Perl-like regular expression support with Version 1.2.
This reference covers Version 1.5 as defined by the ECMA standard.
Supporting implementations include Microsoft Internet Explorer 5.5+
and Netscape Navigator 6+. JavaScript uses a Traditional NFA match
engine. For an explanation of the rules behind an NFA engine, see
Section 1.2.
1.10.1 Supported Metacharacters
JavaScript supports the metacharacters and metasequences listed in
Table 1-41 through Table 1-45. For expanded definitions of each metacharacter,
see Section 1.2.1.
Table 1-41. Character representations|
\0
|
Null character, \x00.
|
\b
|
Backspace, \x08, supported only in character class.
|
\n
|
Newline, \x0A.
|
\r
|
Carriage return, \x0D.
|
\f
|
Form feed, \x0C.
|
\t
|
Horizontal tab, \x09.
|
\t
|
Vertical tab, \x0B.
|
\xhh
|
Character specified by a two-digit hexadecimal code.
|
\uhhhh
|
Character specified by a four-digit hexadecimal code.
|
\cchar
|
Named control character.
|
Table 1-42. Character classes and class-like constructs|
[...]
|
A single character listed or contained within a listed range.
|
[^...]
|
A single character not listed and not contained within a listed range.
|
.
|
Any character except a line terminator,
[^\x0A\x0D\u2028\u2029].
|
\w
|
Word character, [a-zA-Z0-9_].
|
\W
|
Non-word character, [^a-zA-Z0-9_].
|
\d
|
Digit character, [0-9].
|
\D
|
Non-digit character, [^0-9].
|
\s
|
Whitespace character.
|
\S
|
Non-whitespace character.
|
Table 1-43. Anchors and other zero-width tests|
^
|
Start of string, or after any newline if in multiline match mode,
/m.
|
$
|
End of search string or before a string-ending newline, or before any
newline if in multiline match mode, /m.
|
\b
|
Word boundary.
|
\B
|
Not-word-boundary.
|
(?=...)
|
Positive lookahead.
|
(?!...)
|
Negative lookahead.
|
Table 1-44. Mode modifiers|
m
|
^ and $ match next to embedded
line terminators.
|
i
|
Case-insensitive match.
|
Table 1-45. Grouping, capturing, conditional, and control|
(...)
|
Group subpattern and capture submatch into
\1,\2,... and
$1, $2,....
|
\n
|
In a regular expression, contains text matched by the
nth capture group.
|
$n
|
In a replacement string, contains text matched by the
nth capture group.
|
(?:...)
|
Group subpattern, but do not capture submatch.
|
...|...
|
Try subpatterns in alternation.
|
*
|
Match 0 or more times.
|
+
|
Match 1 or more times.
|
?
|
Match 1 or 0 times.
|
{n}
|
Match exactly n times.
|
{n,}
|
Match at least n times.
|
{x,y}
|
Match at least x times but no more than
y times.
|
*?
|
Match 0 or more times, but as few times as possible.
|
+?
|
Match 1 or more times, but as few times as possible.
|
??
|
Match 0 or 1 times, but as few times as possible.
|
{n}?
|
Match at least n times, but as few times
as possible.
|
{x,y}?
|
Match at least x times, no more than
y times, and as few times as possible.
|
1.10.2 Pattern-Matching Methods and Objects
JavaScript provides convenient pattern-matching methods in
String objects, as well as a
RegExp object for more complex pattern matching.
JavaScript strings use the backslash for escapes, and therefore any
escapes destined for the regular expression engine should be double
escaped (e.g., "\\w" instead of
"\w"). You can also use the regular expression
literal syntax,
/pattern/img.
Strings support four convenience methods for pattern matching. Each
method takes a pattern argument, which may
be either a RegExp object or a string containing a
regular expression pattern.
Methods
- search( pattern)
-
Match pattern against the string returning
either the character position of the start of the first matching
substring or -1.
- replace( pattern, replacement)
-
The replace( ) method searches the string for a
match of pattern and replaces the matched
substring with replacement. If
pattern has global mode set, then all
matches of pattern are replaced. The
replacement string may have
$n constructs that are
replaced with the matched text of the nth
capture group in pattern.
- match( pattern)
-
Match pattern against the string returning
either an array or -1. Element 0 of the array
contains the full match. Additional elements contain submatches from
capture groups. In global (g) mode, the array
contains all matches of pattern with no
capture group submatches.
- split( pattern, limit)
-
Return an array of strings broken around
pattern. If
limit, the array contains at most the
first limit substrings broken around
pattern. If
pattern contains capture groups, captured
substrings are returned as elements after each split substring.
Models a regular expression and contains methods for pattern matching.
Constructor
- new RegExp( pattern, attributes)
- / pattern/attributes
-
RegExp objects can be created with either the
RegExp( ) constructor or a special literal syntax
/.../. The parameter
pattern is a required regular expression
pattern, and the parameter attributes is
an optional string containing any of the mode modifiers
g, i, or m.
The parameter pattern can also be a
RegExp object, but the
attributes parameter then becomes
required.
The constructor can throw two expceptions.
SyntaxError is thrown if
pattern is malformed or if
attributes contains invalid mode
modifiers. TypeError is thrown if
pattern is a RegExp
object and the attributes parameter is
omitted.
Instance properties
- global
-
Boolean, if RegExp has g
attribute.
- ignoreCase
-
Boolean, if RegExp has i
attribute.
- lastIndex
-
The character position of the last match.
- multiline
-
Boolean, if RegExp has m
attribute.
- source
-
The text pattern used to create this object.
Methods
- exec( text)
-
Search text and return an array of strings
if the search succeeds and null if it fails.
Element 0 of the array contains the substring matched by the entire
regular expression. Additional elements correspond to capture groups.
If the global flag (g) is set, then
lastIndex is set to the character position after
the match or zero if there was no match. Successive exec(
) or test( ) calls will start at
lastIndex. Note that lastIndex
is a property of the regular expression, not the string being
searched. You must reset lastIndex manually if you
are using a RegExp object in global mode to search
multiple strings.
- test( text)
-
Return true if the RegExp
object matches text. The test(
) method behaves in the same way as exec(
) when used in global mode: successive calls start at
lastIndex even if used on different strings.
1.10.3 Examples
Example 1-28. Simple match
//Match Spider-Man, Spiderman, SPIDER-MAN, etc.
var dailybugle = "Spider-Man Menaces City!";
//regex must match entire string
var regex = /spider[- ]?man/i;
if (dailybugle.search(regex)) {
//do something
}
Example 1-29. Match and capture group
//Match dates formatted like MM/DD/YYYY, MM-DD-YY,...
var date = "12/30/1969";
var p =
new RegExp("(\\d\\d)[-/](\\d\\d)[-/](\\d\\d(?:\\d\\d)?)");
var result = p.exec(date);
if (result != null) {
var month = result[1];
var day = result[2];
var year = result[3];
Example 1-30. Simple substitution
//Convert <br> to <br /> for XHTML compliance
String text = "Hello world. <br>";
var pattern = /<br>/ig;
test.replace(pattern, "<br />");
Example 1-31. Harder substitution
//urlify - turn URL's into HTML links
var text = "Check the website, http://www.oreilly.com/catalog/repr.";
var regex =
"\\b" // start at word boundary
+ "(" // capture to $1
+ "(https?|telnet|gopher|file|wais|ftp) :"
// resource and colon
+ "[\\w/\\#~:.?+=&%@!\\-]+?" // one or more valid chars
// take little as possible
+ ")"
+ "(?=" // lookahead
+ "[.:?\\-]*" // for possible punct
+ "(?:[^\\w/\\#~:.?+=&%@!\\-]"// invalid character
+ "|$)" // or end of string
+ ")";
text.replace(regex, "<a href=\"$1\">$1</a>");
1.10.4 Other Resources
JavaScript: The Definitive Guide, by David
Flanagan (O'Reilly), is a reference for all
JavaScript, including regular expressions.
|