[ Team LiB ] Previous Section Next Section

1.10 JavaScript

JavaScript introduced Perl-like regular expression support with Version 1.2. This reference covers Version 1.5 as defined by the ECMA standard. Supporting implementations include Microsoft Internet Explorer 5.5+ and Netscape Navigator 6+. JavaScript uses a Traditional NFA match engine. For an explanation of the rules behind an NFA engine, see Section 1.2.

1.10.1 Supported Metacharacters

JavaScript supports the metacharacters and metasequences listed in Table 1-41 through Table 1-45. For expanded definitions of each metacharacter, see Section 1.2.1.

Table 1-41. Character representations

Sequence

Meaning

\0

Null character, \x00.

\b

Backspace, \x08, supported only in character class.

\n

Newline, \x0A.

\r

Carriage return, \x0D.

\f

Form feed, \x0C.

\t

Horizontal tab, \x09.

\t

Vertical tab, \x0B.

\xhh

Character specified by a two-digit hexadecimal code.

\uhhhh

Character specified by a four-digit hexadecimal code.

\cchar

Named control character.

Table 1-42. Character classes and class-like constructs

Class

Meaning

[...]

A single character listed or contained within a listed range.

[^...]

A single character not listed and not contained within a listed range.

.

Any character except a line terminator, [^\x0A\x0D\u2028\u2029].

\w

Word character, [a-zA-Z0-9_].

\W

Non-word character, [^a-zA-Z0-9_].

\d

Digit character, [0-9].

\D

Non-digit character, [^0-9].

\s

Whitespace character.

\S

Non-whitespace character.

Table 1-43. Anchors and other zero-width tests

Sequence

Meaning

^

Start of string, or after any newline if in multiline match mode, /m.

$

End of search string or before a string-ending newline, or before any newline if in multiline match mode, /m.

\b

Word boundary.

\B

Not-word-boundary.

(?=...)

Positive lookahead.

(?!...)

Negative lookahead.

Table 1-44. Mode modifiers

Modifier

Meaning

m

^ and $ match next to embedded line terminators.

i

Case-insensitive match.

Table 1-45. Grouping, capturing, conditional, and control

Sequence

Meaning

(...)

Group subpattern and capture submatch into \1,\2,... and $1, $2,....

\n

In a regular expression, contains text matched by the nth capture group.

$n

In a replacement string, contains text matched by the nth capture group.

(?:...)

Group subpattern, but do not capture submatch.

...|...

Try subpatterns in alternation.

*

Match 0 or more times.

+

Match 1 or more times.

?

Match 1 or 0 times.

{n}

Match exactly n times.

{n,}

Match at least n times.

{x,y}

Match at least x times but no more than y times.

*?

Match 0 or more times, but as few times as possible.

+?

Match 1 or more times, but as few times as possible.

??

Match 0 or 1 times, but as few times as possible.

{n}?

Match at least n times, but as few times as possible.

{x,y}?

Match at least x times, no more than y times, and as few times as possible.

1.10.2 Pattern-Matching Methods and Objects

JavaScript provides convenient pattern-matching methods in String objects, as well as a RegExp object for more complex pattern matching. JavaScript strings use the backslash for escapes, and therefore any escapes destined for the regular expression engine should be double escaped (e.g., "\\w" instead of "\w"). You can also use the regular expression literal syntax, /pattern/img.

String

Strings support four convenience methods for pattern matching. Each method takes a pattern argument, which may be either a RegExp object or a string containing a regular expression pattern.

Methods

search( pattern)

Match pattern against the string returning either the character position of the start of the first matching substring or -1.

replace( pattern, replacement)

The replace( ) method searches the string for a match of pattern and replaces the matched substring with replacement. If pattern has global mode set, then all matches of pattern are replaced. The replacement string may have $n constructs that are replaced with the matched text of the nth capture group in pattern.

match( pattern)

Match pattern against the string returning either an array or -1. Element 0 of the array contains the full match. Additional elements contain submatches from capture groups. In global (g) mode, the array contains all matches of pattern with no capture group submatches.

split( pattern, limit)

Return an array of strings broken around pattern. If limit, the array contains at most the first limit substrings broken around pattern. If pattern contains capture groups, captured substrings are returned as elements after each split substring.

RegExp

Models a regular expression and contains methods for pattern matching.

Constructor

new RegExp( pattern, attributes)
/ pattern/attributes

RegExp objects can be created with either the RegExp( ) constructor or a special literal syntax /.../. The parameter pattern is a required regular expression pattern, and the parameter attributes is an optional string containing any of the mode modifiers g, i, or m. The parameter pattern can also be a RegExp object, but the attributes parameter then becomes required.

The constructor can throw two expceptions. SyntaxError is thrown if pattern is malformed or if attributes contains invalid mode modifiers. TypeError is thrown if pattern is a RegExp object and the attributes parameter is omitted.

Instance properties

global

Boolean, if RegExp has g attribute.

ignoreCase

Boolean, if RegExp has i attribute.

lastIndex

The character position of the last match.

multiline

Boolean, if RegExp has m attribute.

source

The text pattern used to create this object.

Methods

exec( text)

Search text and return an array of strings if the search succeeds and null if it fails. Element 0 of the array contains the substring matched by the entire regular expression. Additional elements correspond to capture groups.

If the global flag (g) is set, then lastIndex is set to the character position after the match or zero if there was no match. Successive exec( ) or test( ) calls will start at lastIndex. Note that lastIndex is a property of the regular expression, not the string being searched. You must reset lastIndex manually if you are using a RegExp object in global mode to search multiple strings.

test( text)

Return true if the RegExp object matches text. The test( ) method behaves in the same way as exec( ) when used in global mode: successive calls start at lastIndex even if used on different strings.

1.10.3 Examples

Example 1-28. Simple match
//Match Spider-Man, Spiderman, SPIDER-MAN, etc.
    var dailybugle = "Spider-Man Menaces City!";

    //regex must match entire string
    var regex = /spider[- ]?man/i;
  
    if (dailybugle.search(regex)) {
      //do something
    }
Example 1-29. Match and capture group
//Match dates formatted like MM/DD/YYYY, MM-DD-YY,...
    var date = "12/30/1969";
    var p = 
      new RegExp("(\\d\\d)[-/](\\d\\d)[-/](\\d\\d(?:\\d\\d)?)");

    var result = p.exec(date);
    if (result != null) {
      var month = result[1];
      var day   = result[2];
      var year  = result[3];
Example 1-30. Simple substitution
//Convert <br> to <br /> for XHTML compliance
    String text = "Hello world. <br>";
    
    var pattern = /<br>/ig;

    test.replace(pattern, "<br />");
Example 1-31. Harder substitution
//urlify - turn URL's into HTML links
   var text = "Check the website, http://www.oreilly.com/catalog/repr.";
   var regex =                                                
        "\\b"                       // start at word boundary
     +  "("                         // capture to $1
     +  "(https?|telnet|gopher|file|wais|ftp) :"
                                    // resource and colon
     +  "[\\w/\\#~:.?+=&%@!\\-]+?"  // one or more valid chars
                                    // take little as possible
      +  ")"                                                               
     +  "(?="                       // lookahead
     +  "[.:?\\-]*"                 // for possible punct
     +  "(?:[^\\w/\\#~:.?+=&%@!\\-]"// invalid character
     +  "|$)"                       // or end of string  
     +  ")";

    text.replace(regex, "<a href=\"$1\">$1</a>");

1.10.4 Other Resources

  • JavaScript: The Definitive Guide, by David Flanagan (O'Reilly), is a reference for all JavaScript, including regular expressions.

    [ Team LiB ] Previous Section Next Section