1.5 .NET and C#
Microsoft's .NET framework
provides a consistent and powerful set of regular expression classes
for all .NET implementations. The following sections list the .NET
regular expression syntax, the core .NET classes, and C# examples.
Microsoft's .NET uses a Traditional NFA match
engine. For an explanation of the rules behind a Traditional NFA
engine, see Section 1.2.
1.5.1 Supported Metacharacters
.NET supports the metacharacters and metasequences listed in
Table 1-15 through Table 1-8.
For expanded definitions of each metacharacter, see Section 1.2.1.
Table 1-15. Character representations|
\a
|
Alert (bell), x07.
|
\b
|
Backspace, x08, supported only in character class.
|
\e
|
ESC character, x1B.
|
\n
|
Newline, x0A.
|
\r
|
Carriage return, x0D.
|
\f
|
Form feed, x0C.
|
\t
|
Horizontal tab, x09.
|
\v
|
Vertical tab, x0B.
|
\0octal
|
Character specified by a two-digit octal code.
|
\xhex
|
Character specified by a two-digit hexadecimal code.
|
\uhex
|
Character specified by a four-digit hexadecimal code.
|
\cchar
|
Named control character.
|
Table 1-16. Character classes and class-like constructs|
[...]
|
A single character listed or contained within a listed range.
|
[^...]
|
A single character not listed and not contained within a listed range.
|
.
|
Any character, except a line terminator (unless single-line mode,
s).
|
\w
|
Word character,
[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}] or
[a-zA-Z_0-9] in ECMAScript
mode.
|
\W
|
Non-word character,
[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}] or
[^a-zA-Z_0-9] in ECMAScript
mode.
|
\d
|
Digit, \p{Nd} or [0-9] in
ECMAScript mode.
|
\D
|
Non-digit, \P{Nd} or [^0-9] in
ECMAScript mode.
|
\s
|
Whitespace character, [ \f\n\r\t\v\x85\p{Z}] or
[ \f\n\r\t\v] in ECMAScript
mode.
|
\S
|
Non-whitespace character, [^ \f\n\r\t\v\x85\p{Z}]
or [^ \f\n\r\t\v] in ECMAScript
mode.
|
\p{prop}
|
Character contained by given Unicode block or property.
|
\P{prop}
|
Character not contained by given Unicode block or property.
|
Table 1-17. Anchors and other zero-width tests|
^
|
Start of string, or after any newline if in
MULTILINE mode.
|
\A
|
Beginning of string, in all match modes.
|
$
|
End of string, or before any newline if in
MULTILINE mode.
|
\Z
|
End of string but before any final line terminator, in all match
modes.
|
\z
|
End of string, in all match modes.
|
\b
|
Boundary between a \w character and a
\W character.
|
\B
|
Not-word-boundary.
|
\G
|
End of the previous match.
|
(?=...)
|
Positive lookahead.
|
(?!...)
|
Negative lookahead.
|
(?<=...)
|
Positive lookbehind.
|
(?<!...)
|
Negative lookbehind.
|
Table 1-18. Comments and mode modifiers|
Singleline
|
s
|
Dot (.) matches any character, including a line
terminator.
|
Multiline
|
m
|
^ and $ match next to embedded
line terminators.
|
IgnorePatternWhitespace
|
x
|
Ignore whitespace and allow embedded comments starting with
#.
|
IgnoreCase
|
i
|
Case-insensitive match based on characters in the current culture.
|
CultureInvariant
|
i
|
Culture-insensitive match.
|
ExplicitCapture
|
n
|
Allow named capture groups, but treat parentheses as non-capturing
groups.
|
Compiled
| |
Compile regular expression.
|
RightToLeft
| |
Search from right to left, starting to the left of the start position.
|
ECMAScript
| |
Enables ECMAScript compliance when used with
IgnoreCase or Multiline.
|
(?imnsx-imnsx)
| |
Turn match flags on or off for rest of pattern.
|
(?imnsx-imnsx:...)
| |
Turn match flags on or off for the rest of the subexpression.
|
(?#...)
| |
Treat substring as a comment.
|
#...
| |
Treat rest of line as a comment in /x mode.
|
Table 1-19. Grouping, capturing, conditional, and control|
(...)
|
Grouping. Submatches fill
\1,\2,... and
$1, $2,....
|
\n
|
In a regular expression, match what was matched by the
nth earlier submatch.
|
$n
|
In a replacement string, contains the nth
earlier submatch.
|
(?<name>...)
|
Captures matched substring into group,
name.
|
(?:...)
|
Grouping-only parentheses, no capturing.
|
(?>...)
|
Disallow backtracking for subpattern.
|
...|...
|
Alternation; match one or the other.
|
*
|
Match 0 or more times.
|
+
|
Match 1 or more times.
|
?
|
Match 1 or 0 times.
|
{n}
|
Match exactly n times.
|
{n,}
|
Match at least n times.
|
{x,y}
|
Match at least x times, but no more than
y times.
|
*?
|
Match 0 or more times, but as few times as possible.
|
+?
|
Match 1 or more times, but as few times as possible.
|
??
|
Match 0 or 1 times, but as few times as possible.
|
{n,}?
|
Match at least n times, but as few times
as possible.
|
{x,y}?
|
Match at least x times, no more than
y times, but as few times as possible.
|
Table 1-20. Replacement sequences|
$1, $2, ...
|
Captured submatches.
|
${name}
|
Matched text of a named capture group.
|
$'
|
Text before match.
|
$&
|
Text of match.
|
$'
|
Text after match.
|
$+
|
Last parenthesized match.
|
$_
|
Copy of original input string.
|
1.5.2 Regular Expression Classes and Interfaces
.NET defines its regular expression support in the
System.Text.RegularExpressions module. The
RegExp( ) constructor handles regular expression
creation, and the rest of the RegExp methods
handle pattern matching. The Groups and
Match classes contain information about each
match.
C#'s raw string syntax, @"",
allows you to define regular expression patterns without having to
escape embedded backslashes.
This class handles the creation of regular expressions and pattern
matching. Several static methods allow for pattern matching without
creating a RegExp object.
Methods
- public Regex(string pattern)
- public Regex(string pattern, RegexOptions options)
-
Return a regular expression object based on
pattern and with the optional mode
modifiers, options.
- public static void CompileToAssembly(RegexCompilationInfo[ ] regexinfos, System.Reflection.AssemblyName assemblyname)
- public static void CompileToAssembly(RegexCompilationInfo[ ] regexinfos, System.Reflection.AssemblyName assemblyname)
- public static void CompileToAssembly(RegexCompilationInfo[ ] regexinfos, System.Reflection.AssemblyName assemblyname, System.Reflection.Emit.CustomAttributeBuilder[ ] attributes)
- public static void CompileToAssembly(RegexCompilationInfo[ ] regexinfos, System.Reflection.AssemblyName assemblyname, System.Reflection.Emit.CustomAttributeBuilder[ ] attributes, string resourceFile)
-
Compile one or more Regex objects to an assembly.
The regexinfos array describes the regular
expressions to include. The assembly filename is
assemblyname. The array
attributes defines attributes for the
assembly. resourceFile is the name of a
Win32 resource file to include in the assembly.
- public static string Escape(string str)
-
Return a string with all regular expression metacharacters, pound
characters (#), and whitespace escaped.
- public static bool IsMatch(string input, string pattern)
- public static bool IsMatch(string input, string pattern, RegexOptions options)
- public bool IsMatch(string input)
- public bool IsMatch(string input, int startat)
-
Return the success of a single match against the input string
input. Static versions of this method
require the regular expression pattern.
The options parameter allows for optional
mode modifiers (OR'd together). The
startat parameter defines a starting
position in input to start matching.
- public static Match Match(string input, string pattern)
- public static Match Match(string input, string pattern, RegExpOptions options)
- public Match Match(string input)
- public Match Match(string input, int startat)
- public Match Match(string input, int startat, int length)
-
Perform a single match against the input string
input and return information about the
match in a Match object. Static versions of this
method require the regular expression
pattern. The
options parameter allows for optional mode
modifiers (OR'd together). The
startat and
length parameters define a starting
position and the number of characters after the starting position to
perform the match.
- public static MatchCollection Matches(string input, string pattern)
- public static MatchCollection Matches(string input, string pattern, RegExpOptions options)
- public MatchCollection Matches(string input)
- public MatchCollection Matches(string input, int startat)
-
Find all matches in the input string
input, and return information about the
matches in a MatchCollection object. Static
versions of this method require the regular expression
pattern. The
options parameter allows for optional mode
modifiers (OR'd together). The
startat parameter defines a starting
position in input to perform the match.
- public static string Replace(string input, pattern, MatchEvaluator evaluator)
- public static string Replace(string input, pattern, MatchEvaluator evaluator, RegexOptions options)
- public static string Replace(string input, pattern, string replacement)
- public static string Replace(string input, pattern, string replacement, RegexOptions options)
- public string Replace(string input, MatchEvaluator evaluator)
- public string Replace(string input, MatchEvaluator evaluator, int count)
- public string Replace(string input, MatchEvaluator evaluator, int count, int startat)
- public string Replace(string input, string replacement)
- public string Replace(string input, string replacement, int count)
- public string Replace(string input, string replacement, int count, int startat)
-
Return a string in which each match in
input is replaced with either the
evaluation of the replacement string or a
call to a MatchEvaluator object. The string
replacement can contain backreferences to
captured text with the
$n or
${name}
syntax.
The options parameter allows for optional
mode modifiers (OR'd together). The
count paramenter limits the number of
replacements. The startat parameter
defines a starting position in input to
start the replacement.
- public static string[ ] Split(string input, string pattern)
- public static string[ ] Split(string input, string pattern, RegexOptions options)
- public static string[ ] Split(string input)
- public static string[ ] Split(string input, int count)
- public static string[ ] Split(string input, int count, int startat)
-
Return an array of strings broken around matches of the regex
pattern. If specified, no more than count
strings are returned. You can specify a starting position in
input with
startat.
Properties
- public bool Success
-
Indicates whether the match was successful.
- public string Value
-
Text of the match.
- public int Length
-
Number of characters in the matched text.
- public int Index
-
Zero-based character index of the start of the match.
- public GroupCollection Groups
-
A GroupCollection object where
Groups[0].value contains the text of the entire
match, and each additional Groups element contains
the text matched by a capture group.
Methods
- public Match NextMatch( )
-
Return a Match object for the next match of the
regex in the input string.
- public virtual string Result(string result)
-
Return result with special replacement
sequences replaced by values from the previous match.
- public static Match Synchronized(Match inner)
-
Return a Match object identical to
inner, except also safe for multithreaded
use.
Properties
- public bool Success
-
True if the group participated in the match.
- public string Value
-
Text captured by this group.
- public int Length
-
Number of characters captured by this group.
- public int Index
-
Zero-based character index of the start of the text captured by this
group.
1.5.3 Unicode Support
.NET provides built-in support for Unicode 3.1, including full
support in the \w, \d, and
\s sequences. The range of characters matched can
be limited to ASCII characters by turning on
ECMAScript mode. Case-insensitive matching is
limited to the characters of the current language defined in
Thread.CurrentCulture, unless the
CultureInvariant option is set.
.NET supports the standard Unicode properties (see Table 1-2) and blocks. Only the short form of property
names are supported. Block names require the Is
prefix and must use the simple name form, without spaces or
underscores.
1.5.4 Examples
Example 1-9. Simple match
//Match Spider-Man, Spiderman, SPIDER-MAN, etc.
namespace Regex_PocketRef
{
using System.Text.RegularExpressions;
class SimpleMatchTest
{
static void Main( )
{
string dailybugle = "Spider-Man Menaces City!";
string regex = "spider[- ]?man";
if (Regex.IsMatch(dailybugle, regex, RegexOptions.IgnoreCase)) {
//do something
}
}
}
Example 1-10. Match and capture group
//Match dates formatted like MM/DD/YYYY, MM-DD-YY,...
using System.Text.RegularExpressions;
class MatchTest
{
static void Main( )
{
string date = "12/30/1969";
Regex r =
new Regex( @"(\d\d)[-/](\d\d)[-/](\d\d(?:\d\d)?)" );
Match m = r.Match(date);
if (m.Success) {
string month = m.Groups[1].Value;
string day = m.Groups[2].Value;
string year = m.Groups[3].Value;
}
}
}
Example 1-11. Simple substitution
//Convert <br> to <br /> for XHTML compliance
using System.Text.RegularExpressions;
class SimpleSubstitutionTest
{
static void Main( )
{
string text = "Hello world. <br>";
string regex = "<br>";
string replacement = "<br />";
string result =
Regex.Replace(text, regex, replacement, RegexOptions.IgnoreCase);
}
}
Example 1-12. Harder substitution
//urlify - turn URL's into HTML links
using System.Text.RegularExpressions;
public class Urlify
{
static Main ( )
{
string text = "Check the website, http://www.oreilly.com/catalog/repr.";
string regex =
@"\b # start at word boundary
( # capture to $1
(https?|telnet|gopher|file|wais|ftp) :
# resource and colon
[\w/#~:.?+=&%@!\-] +? # one or more valid
# characters
# but take as little as
# possible
)
(?= # lookahead
[.:?\-] * # for possible
# punctuation
(?: [^\w/#~:.?+=&%@!\-] # invalid character
| $ ) # or end of string
)";
Regex r = new Regex(regex, RegexOptions.IgnoreCase
| RegexOptions.IgnorePatternWhitespace);
string result = r.Replace(text, "<a href=\"$1\">$1</a>");
}
}
1.5.5 Other Resources
|