only for RuBoard - do not distribute or recompile Previous Section Next Section

Appendix B. Regular Expressions

The following tables summarize the regular expression grammar and syntax supported by the regular expression classes in System.Text.RegularExpressions. Each of the modifiers and qualifiers in the tables can substantially change the behavior of the matching and searching patterns. For further information on regular expressions, we recommend the definitive Mastering Regular Expressions by Jeffrey E. F. Friedl (O'Reilly).

All the syntax described in Table B-1 through Table B-10 should match the Perl5 syntax, with specific exceptions noted.

Table B-1. Character escapes

Escape Code Sequence

Meaning

Hexadecimal equivalent

\a

Bell

\u0007
\b

Backspace

\u0008
\t

Tab

\u0009
\r

Carriage return

\u000D
\v

Vertical tab

\u000B
\f

Form feed

\u000C
\n

Newline

\u000A
\e

Escape

\u001B
\040

ASCII character as octal


\x20 

ASCII character as hex


\cC

ASCII control character


\u0020

Unicode character as hex


\
non-escape

A nonescape character


Special case: within a regular expression, \b means word boundary, except in a [] set, in which \b means the backspace character.

Table B-2. Substitutions

Expression

Meaning

$group-number

Substitutes last substring matched by group-number

${group-name}

Substitutes last substring matched by (?<group-name>)

$$

Substitutes a literal "$"

$&

Substitutes copy of the entire match

$'

Substitutes text of the input string preceding match

$'

Substitutes text of the input string following match

$+

Substitutes the last captures group

$_

Substitutes the entire input string

Substitutions are specified only within a replacement pattern.

Table B-3. Character sets

Expression

Meaning

.

Matches any character except \n

[characterlist]

Matches a single character in the list

[^characterlist]

Matches a single character not in the list

[char0-char1]

Matches a single character in a range

\w

Matches a word character; same as [a-zA-Z_0-9]

\W

Matches a nonword character

\s

Matches a space character; same as [ \n\r\t\v\f]

\S

Matches a nonspace character

\d

Matches a decimal digit; same as [0-9]

\D

Matches a nondigit

Table B-4. Positioning assertions

Expression

Meaning

^

Beginning of line

$

End of line

\A

Beginning of string

\Z

End of line or string

\z

Exactly the end of string

\G

Where search started

\b

On a word boundary

\B

Not on a word boundary

Table B-5. Quantifiers

Quantifier

Meaning

*

0 or more matches

+

1 or more matches

?

0 or 1 matches

{n}

Exactly n matches

{n,}

At least nmatches

{n,m}

At least n, but no more than m matches

*?

Lazy *, finds first match that has minimum repeats

+?

Lazy +, minimum repeats, but at least 1

??

Lazy ?, zero or minimum repeats

{n}?

Lazy {n}, exactly n matches

{n,}?

Lazy {n}, minimum repeats, but at least n

{n,m}?

Lazy {n,m}, minimum repeats, but at least n, and no more than m

Table B-6. Grouping constructs

Syntax

Meaning

( )

Capture matched substring

(?<name>)

Capture matched substring into group name[A]

(?<number>)

Capture matched substring into group number[A]

(?<name1-name2>)

Undefine name2 and store interval and current group into name1; if name2 is undefined, matching backtracks; name1 is optional[A]

(?: )

Noncapturing group

(?imnsx-imnsx: )

Apply or disable matching options

(?= )

Continue matching only if subexpression matches on right[C]

(?! ) 

Continue matching only if subexpression doesn't match on right[C]

(?<= )

Continue matching only if subexpression matches on left[B] [C]

(?<! )

Continue matching only if subexpression doesn't match on left[B][C]

(?> )

Subexpression is matched once, but isn't backtracked

[A] Single quotes may be used instead of angle brackets, for example (?'name').

[B] This construct doesn't backtrack; this is to remain compatible with Perl5.

[C] Zero-width assertion; does not consume any characters.

The named capturing group syntax follows a suggestion made by Jeffrey Friedl in Mastering Regular Expressions (O'Reilly). All other grouping constructs use the Perl5 syntax.

Table B-7. Back references

Parameter syntax

Meaning

\count

Back reference count occurrences

\k<name>

Named back reference

Table B-8. Alternation

Expression syntax

Meaning

|

Logical OR

(?(expression)yes|no)

Matches yes if expression matches, else no; the no is optional

(?(name)yes|no)

Matches yes if named string has a match, else no; the no is optional

Table B-9. Miscellaneous constructs

Expression syntax

Meaning

(?imnsx-imnsx)

Set or disable options in midpattern

(?# )

Inline comment

# [to end of line]

X-mode comment (requires x option or IgnorePatternWhitespace)

Table B-10. Regular expression options

Option

RegexOption value

Meaning

i

IgnoreCase

Case-insensitive match

m

MultiLine

Multiline mode; changes ^ and $ so they match beginning and ending of any line

n

ExplicitCapture

Capture explicitly named or numbered groups


Compiled

Compile to MSIL

s

SingleLine

Single-line mode; changes meaning of "." so it matches every character

x

IgnorePatternWhitespace

Eliminates unescaped whitespace from the pattern


RightToLeft

Search from right to left; can't be specified in midstream

only for RuBoard - do not distribute or recompile Previous Section Next Section