only for RuBoard - do not distribute or recompile |
The FCL includes support for performing regular expression matching and replacement capabilities. The expressions are based on Perl5 regexp, including lazy quantifiers (e.g., ??, *?, +?, and {n,m}?), positive and negative lookahead, and conditional evaluation.
The types mentioned in this section all exist in the System.Text.RegularExpressions namespace.
The Regex class is the heart of the FCL regular expression support. Used both as an object instance and a static type, the Regex class represents an immutable, compiled instance of a regular expression that can be applied to a string via a matching process.
Internally, the regular expression is stored as either a sequence of internal regular expression bytecodes that are interpreted at match time or as compiled MSIL opcodes that are JIT-compiled by the CLR at runtime. This allows you to make a tradeoff between worsened regular expression startup time and memory utilization versus higher raw match performance at runtime.
For more information on the regular expression options, supported character escapes, substitution patterns, character sets, positioning assertions, quantifiers, grouping constructs, backreferences, and alternation, see Appendix B.
The Match class represents the result of applying a regular expression to a string, looking for the first successful match. The MatchCollection class contains a collection of Match instances that represent the result of applying a regular expression to a string recursively until the first unsuccessful match occurs.
The Group class represents the results from a single grouping expression. From this class, it is possible to drill down to the individual subexpression matches with the Captures property.
The CaptureCollection class contains a collection of Capture instances, each representing the results of a single subexpression match.
Combining these classes, you can create the following example:
/* * Sample showing multiple groups * and groups with multiple captures */ using System; using System.Text.RegularExpressions; class Test { static void Main( ) { string text = "abracadabra1abracadabra2abracadabra3"; string pat = @" ( # start the first group abra # match the literal 'abra' ( # start the second (inner) group cad # match the literal 'cad' )? # end the second (optional) group ) # end the first group + # match one or more occurences "; Console.WriteLine("Original text = [{0}]", text); // Create the Regex. IgnorePatternWhitespace permits // whitespace and comments. Regex r = new Regex(pat, RegexOptions.IgnorePatternWhitespace); int[] gnums = r.GetGroupNumbers( ); // get the list of group numbers Match m = r.Match(text); // get first match while (m.Success) { Console.WriteLine("Match found:"); // start at group 1 for (int i = 1; i < gnums.Length; i++) { Group g = m.Groups[gnums[i]]; // get the group for this match Console.WriteLine("\tGroup{0}=[{1}]", gnums[i], g); CaptureCollection cc = g.Captures; // get caps for this group for (int j = 0; j < cc.Count; j++) { Capture c = cc[j]; Console.WriteLine("\t\tCapture{0}=[{1}] Index={2} Length={3}", j, c, c.Index, c.Length); } } m = m.NextMatch( ); // get next match } // end while } }
The preceding example produces the following output:
Original text = [abracadabra1abracadabra2abracadabra3] Match found: Group1=[abra] Capture0=[abracad] Index=0 Length=7 Capture1=[abra] Index=7 Length=4 Group2=[cad] Capture0=[cad] Index=4 Length=3 Match found: Group1=[abra] Capture0=[abracad] Index=12 Length=7 Capture1=[abra] Index=19 Length=4 Group2=[cad] Capture0=[cad] Index=16 Length=3 Match found: Group1=[abra] Capture0=[abracad] Index=24 Length=7 Capture1=[abra] Index=31 Length=4 Group2=[cad] Capture0=[cad] Index=28 Length=3
only for RuBoard - do not distribute or recompile |