[ Team LiB ] |
10.1 StringsC# treats strings as first-class types that are flexible, powerful, and easy to use. Each string object is an immutable sequence of Unicode characters. In other words, methods that appear to change the string actually return a modified copy; the original string remains intact. When you declare a C# string using the string keyword, you are in fact declaring the object to be of the type System.String, one of the built-in types provided by the .NET Framework Class Library. A C# string type is a System.String type, and we will use the names interchangeably throughout the chapter. The declaration of the System.String class is: public sealed class String : IComparable, ICloneable, IConvertible, IEnumerable This declaration reveals that the class is sealed, meaning that it is not possible to derive from the string class. The class also implements four system interfaces—IComparable, ICloneable, IConvertible, and IEnumerable—that dictate functionality that System.String shares with other classes in the .NET Framework. As seen in Chapter 9, the IComparable interface is implemented by types whose values can be ordered. Strings, for example, can be alphabetized; any given string can be compared with another string to determine which should come first in an ordered list. IComparable classes implement the CompareTo method. IEnumerable, also discussed in Chapter 9, lets you use the foreach construct to enumerate a string as a collection of chars. ICloneable objects can create new instances with the same value as the original instance. In this case, it is possible to clone a string to produce a new string with the same values (characters) as the original. ICloneable classes implement the Clone( ) method. IConvertible classes provide methods to facilitate conversion to other primitive types such as ToInt32( ), ToDouble( ), ToDecimal( ), etc. 10.1.1 Creating StringsThe most common way to create a string is to assign a quoted string of characters, known as a string literal, to a user-defined variable of type string: string newString = "This is a string literal"; Quoted strings can include escape characters, such as "\n" or "\t," which begin with a backslash character (\) and are used to indicate where line breaks or tabs are to appear. Because the backslash is itself used in some command-line syntaxes, such as URLs or directory paths, in a quoted string the backslash must be preceded by another backslash. Strings can also be created using verbatim string literals, which start with the (@) symbol. This tells the String constructor that the string should be used verbatim, even if it spans multiple lines or includes escape characters. In a verbatim string literal, backslashes and the characters that follow them are simply considered additional characters of the string. Thus, the following two definitions are equivalent: string literalOne = "\\\\MySystem\\MyDirectory\\ProgrammingC#.cs"; string verbatimLiteralOne = @"\\MySystem\MyDirectory\ProgrammingC#.cs"; In the first line, a nonverbatim string literal is used, and so the backslash characters (\) must be escaped. This means it must be preceded by a second backslash character. In the second line, a verbatim literal string is used, so the extra backslash is not needed. A second example illustrates multiline verbatim strings: string literalTwo = "Line One\nLine Two"; string verbatimLiteralTwo = @"Line One Line Two"; Again, these declarations are interchangeable. Which one you use is a matter of convenience and personal style. 10.1.2 The ToString( ) MethodAnother common way to create a string is to call the ToString( ) method on an object and assign the result to a string variable. All the built-in types override this method to simplify the task of converting a value (often a numeric value) to a string representation of that value. In the following example, the ToString( ) method of an integer type is called to store its value in a string: int myInteger = 5; string integerString = myInteger.ToString( ); The call to myInteger.ToString( ) returns a String object, which is then assigned to integerString. The .NET String class provides a wealth of overloaded constructors that support a variety of techniques for assigning string values to string types. Some of these constructors enable you to create a string by passing in a character array or character pointer. Passing in a character array as a parameter to the constructor of the String creates a CLR-compliant new instance of a string. Passing in a character pointer creates a noncompliant, "unsafe" instance. 10.1.3 Manipulating StringsThe string class provides a host of methods for comparing, searching, and manipulating strings, as shown in Table 10-1.
Example 10-1 illustrates the use of some of these methods, including Compare( ), Concat( ) (and the overloaded + operator), Copy( ) (and the = operator), Insert( ), EndsWith( ), and IndexOf( ). Example 10-1. Working with stringsnamespace Programming_CSharp { using System; public class StringTester { static void Main( ) { // create some strings to work with string s1 = "abcd"; string s2 = "ABCD"; string s3 = @"Liberty Associates, Inc. provides custom .NET development, on-site Training and Consulting"; int result; // hold the results of comparisons // compare two strings, case sensitive result = string.Compare(s1, s2); Console.WriteLine( "compare s1: {0}, s2: {1}, result: {2}\n", s1, s2, result); // overloaded compare, takes boolean "ignore case" //(true = ignore case) result = string.Compare(s1,s2, true); Console.WriteLine("compare insensitive\n"); Console.WriteLine("s4: {0}, s2: {1}, result: {2}\n", s1, s2, result); // concatenation method string s6 = string.Concat(s1,s2); Console.WriteLine( "s6 concatenated from s1 and s2: {0}", s6); // use the overloaded operator string s7 = s1 + s2; Console.WriteLine( "s7 concatenated from s1 + s2: {0}", s7); // the string copy method string s8 = string.Copy(s7); Console.WriteLine( "s8 copied from s7: {0}", s8); // use the overloaded operator string s9 = s8; Console.WriteLine("s9 = s8: {0}", s9); // three ways to compare. Console.WriteLine( "\nDoes s9.Equals(s8)?: {0}", s9.Equals(s8)); Console.WriteLine( "Does Equals(s9,s8)?: {0}", string.Equals(s9,s8)); Console.WriteLine( "Does s9==s8?: {0}", s9 == s8); // Two useful properties: the index and the length Console.WriteLine( "\nString s9 is {0} characters long. ", s9.Length); Console.WriteLine( "The 5th character is {1}\n", s9.Length, s9[4]); // test whether a string ends with a set of characters Console.WriteLine("s3:{0}\nEnds with Training?: {1}\n", s3, s3.EndsWith("Training") ); Console.WriteLine( "Ends with Consulting?: {0}", s3.EndsWith("Consulting")); // return the index of the substring Console.WriteLine( "\nThe first occurrence of Training "); Console.WriteLine ("in s3 is {0}\n", s3.IndexOf("Training")); // insert the word excellent before "training" string s10 = s3.Insert(101,"excellent "); Console.WriteLine("s10: {0}\n",s10); // you can combine the two as follows: string s11 = s3.Insert(s3.IndexOf("Training"), "excellent "); Console.WriteLine("s11: {0}\n",s11); } } } Output: compare s1: abcd, s2: ABCD, result: -1 compare insensitive s4: abcd, s2: ABCD, result: 0 s6 concatenated from s1 and s2: abcdABCD s7 concatenated from s1 + s2: abcdABCD s8 copied from s7: abcdABCD s9 = s8: abcdABCD Does s9.Equals(s8)?: True Does Equals(s9,s8)?: True Does s9==s8?: True String s9 is 8 characters long. The 5th character is A s3:Liberty Associates, Inc. provides custom .NET development, on-site Training and Consulting Ends with Training?: False Ends with Consulting?: True The first occurrence of Training in s3 is 101 s10: Liberty Associates, Inc. provides custom .NET development, on-site excellent Training and Consulting s11: Liberty Associates, Inc. provides custom .NET development, on-site excellent Training and Consulting Example 10-1 begins by declaring three strings: string s1 = "abcd"; string s2 = "ABCD"; string s3 = @"Liberty Associates, Inc. provides custom .NET development, on-site Training and Consulting"; The first two are string literals, and the third is a verbatim string literal. We begin by comparing s1 to s2. The Compare( ) method is a public static method of string, and it is overloaded. The first overloaded version takes two strings and compares them: // compare two strings, case sensitive result = string.Compare(s1, s2); Console.WriteLine("compare s1: {0}, s2: {1}, result: {2}\n", s1, s2, result); This is a case-sensitive comparison and returns different values, depending on the results of the comparison:
In this case, the output properly indicates that s1 is "less than" s2. In Unicode (as in ASCII), a lowercase letter has a smaller value than an uppercase letter: compare s1: abcd, s2: ABCD, result: -1 The second comparison uses an overloaded version of Compare( ) that takes a third, Boolean parameter, whose value determines whether case should be ignored in the comparison. If the value of this "ignore case" parameter is true, the comparison is made without regard to case, as in the following: result = string.Compare(s1,s2, true); Console.WriteLine("compare insensitive\n"); Console.WriteLine("s4: {0}, s2: {1}, result: {2}\n", s1, s2, result);
This time the case is ignored and the result is 0, indicating that the two strings are identical (without regard to case): compare insensitive s4: abcd, s2: ABCD, result: 0 Example 10-1 then concatenates some strings. There are a couple of ways to accomplish this. You can use the Concat( ) method, which is a static public method of string: string s6 = string.Concat(s1,s2); or you can simply use the overloaded concatenation (+) operator: string s7 = s1 + s2; In both cases, the output reflects that the concatenation was successful: s6 concatenated from s1 and s2: abcdABCD s7 concatenated from s1 + s2: abcdABCD Similarly, creating a new copy of a string can be accomplished in two ways. First, you can use the static Copy( ) method: string s8 = string.Copy(s7); Otherwise, for convenience, you might instead use the overloaded assignment operator (=), which will implicitly make a copy: string s9 = s8; Once again, the output reflects that each method has worked: s8 copied from s7: abcdABCD s9 = s8: abcdABCD The .NET String class provides three ways to test for the equality of two strings. First, you can use the overloaded Equals( ) method and ask s9 directly whether s8 is of equal value: Console.WriteLine("\nDoes s9.Equals(s8)?: {0}", s9.Equals(s8)); A second technique is to pass both strings to String's static method Equals( ): Console.WriteLine("Does Equals(s9,s8)?: {0}", string.Equals(s9,s8)); A final method is to use the overloaded equality operator (==) of String: Console.WriteLine("Does s9==s8?: {0}", s9 == s8); In each of these cases, the returned result is a Boolean value, as shown in the output: Does s9.Equals(s8)?: True Does Equals(s9,s8)?: True Does s9==s8?: True The equality operator is the most natural when you have two string objects. However, some languages, such as VB.NET, do not support operator overloading, so be sure to override the Equals( ) instance method as well. The next several lines in Example 10-1 use the index operator ([]) to find a particular character within a string, and use the Length property to return the length of the entire string: Console.WriteLine("\nString s9 is {0} characters long., s9.Length); Console.WriteLine("The 5th character is {1}\n", s9.Length, s9[4]); Here's the output: String s9 is 8 characters long. The 5th character is A The EndsWith( ) method asks a string whether a substring is found at the end of the string. Thus, you might first ask s3 if it ends with Training (which it does not) and then if it ends with Consulting (which it does): // test whether a string ends with a set of characters Console.WriteLine("s3:{0}\nEnds with Training?: {1}\n", s3, s3.EndsWith("Training") ); Console.WriteLine("Ends with Consulting?: {0}", s3.EndsWith("Consulting")); The output reflects that the first test fails and the second succeeds: s3:Liberty Associates, Inc. provides custom .NET development, on-site Training and Consulting Ends with Training?: False Ends with Consulting?: True The IndexOf( ) method locates a substring within our string, and the Insert( ) method inserts a new substring into a copy of the original string. The following code locates the first occurrence of Training in s3: Console.WriteLine("\nThe first occurrence of Training "); Console.WriteLine ("in s3 is {0}\n", s3.IndexOf("Training")); The output indicates that the offset is 101: The first occurrence of Training in s3 is 101 You can then use that value to insert the word excellent, followed by a space, into that string. Actually, the insertion is into a copy of the string returned by the Insert( ) method and assigned to s10: string s10 = s3.Insert(101,"excellent"); Console.WriteLine("s10: {0}\n",s10); Here's the output: s10: Liberty Associates, Inc. provides custom .NET development, on-site excellent Training and Consulting Finally, you can combine these operations to make a more efficient insertion statement: string s11 = s3.Insert(s3.IndexOf("Training"),"excellent "); Console.WriteLine("s11: {0}\n",s11); with the identical output: s11: Liberty Associates, Inc. provides custom .NET development, on-site excellent Training and Consulting 10.1.4 Finding SubstringsThe String type provides an overloaded Substring( ) method for extracting substrings from within strings. Both versions take an index indicating where to begin the extraction, and one of the two versions takes a second index to indicate where to end the search. The Substring( ) method is illustrated in Example 10-2. Example 10-2. Using the Substring( ) methodnamespace Programming_CSharp { using System; using System.Text; public class StringTester { static void Main( ) { // create some strings to work with string s1 = "One Two Three Four"; int ix; // get the index of the last space ix=s1.LastIndexOf(" "); // get the last word. string s2 = s1.Substring(ix+1); // set s1 to the substring starting at 0 // and ending at ix (the start of the last word // thus s1 has one two three s1 = s1.Substring(0,ix); // find the last space in s1 (after two) ix = s1.LastIndexOf(" "); // set s3 to the substring starting at // ix, the space after "two" plus one more // thus s3 = "three" string s3 = s1.Substring(ix+1); // reset s1 to the substring starting at 0 // and ending at ix, thus the string "one two" s1 = s1.Substring(0,ix); // reset ix to the space between // "one" and "two" ix = s1.LastIndexOf(" "); // set s4 to the substring starting one // space after ix, thus the substring "two" string s4 = s1.Substring(ix+1); // reset s1 to the substring starting at 0 // and ending at ix, thus "one" s1 = s1.Substring(0,ix); // set ix to the last space, but there is // none so ix now = -1 ix = s1.LastIndexOf(" "); // set s5 to the substring at one past // the last space. there was no last space // so this sets s5 to the substring starting // at zero string s5 = s1.Substring(ix+1); Console.WriteLine ("s2: {0}\ns3: {1}",s2,s3); Console.WriteLine ("s4: {0}\ns5: {1}\n",s4,s5); Console.WriteLine ("s1: {0}\n",s1); } } } Output: s2: Four s3: Three s4: Two s5: One s1: One Example 10-2 is not an elegant solution to the problem of extracting words from a string, but it is a good first approximation, and it illustrates a useful technique. The example begins by creating a string, s1: string s1 = "One Two Three Four"; Then ix is assigned the value of the last space in the string: ix=s1.LastIndexOf(" "); Then the substring that begins one space later is assigned to the new string, s2: string s2 = s1.Substring(ix+1); This extracts from x1+1 to the end of the line, assigning to s2 the value Four. The next step is to remove the word Four from s1. You can do this by assigning to s1 the substring of s1, which begins at 0 and ends at ix: s1 = s1.Substring(0,ix); Reassign ix to the last (remaining) space, which points you to the beginning of the word Three, which we then extract into string s3. Continue like this until s4 and s5 are populated. Finally, print the results: s2: Four s3: Three s4: Two s5: One s1: One This isn't elegant, but it works and it illustrates the use of Substring. This is not unlike using pointer arithmetic in C++, but without the pointers and unsafe code. 10.1.5 Splitting StringsA more effective solution to the problem illustrated in Example 10-2 is to use the Split( ) method of String, whose job is to parse a string into substrings. To use Split( ), pass in an array of delimiters (characters that will indicate a split in the words), and the method returns an array of substrings. Example 10-3 illustrates. Example 10-3. Using the Split( ) methodnamespace Programming_CSharp { using System; using System.Text; public class StringTester { static void Main( ) { // create some strings to work with string s1 = "One,Two,Three Liberty Associates, Inc."; // constants for the space and comma characters const char Space = ' '; const char Comma = ','; // array of delimiters to split the sentence with char[] delimiters = new char[] { Space, Comma }; string output = ""; int ctr = 1; // split the string and then iterate over the // resulting array of strings foreach (string subString in s1.Split(delimiters)) { output += ctr++; output += ": "; output += subString; output += "\n"; } Console.WriteLine(output); } } } Output: 1: One 2: Two 3: Three 4: Liberty 5: Associates 6: 7: Inc. You start by creating a string to parse: string s1 = "One,Two,Three Liberty Associates, Inc."; The delimiters are set to the space and comma characters. You then call Split( ) on this string, and pass the results to the foreach loop: foreach (string subString in s1.Split(delimiters)) Start by initializing output to an empty string and then build up the output string in four steps. Concatenate the value of ctr. Next add the colon, then the substring returned by split, then the newline. With each concatenation, a new copy of the string is made, and all four steps are repeated for each substring found by Split( ). This repeated copying of string is terribly inefficient. The problem is that the string type is not designed for this kind of operation. What you want is to create a new string by appending a formatted string each time through the loop. The class you need is StringBuilder. 10.1.6 Manipulating Dynamic StringsThe System.Text.StringBuilder class is used for creating and modifying strings. Semantically, it is the encapsulation of a constructor for a String. The important members of StringBuilder are summarized in Table 10-2.
Unlike String, StringBuilder is mutable; when you modify a StringBuilder, you modify the actual string, not a copy. Example 10-4 replaces the String object in Example 10-3 with a StringBuilder object. Example 10-4. Using a StringBuildernamespace Programming_CSharp { using System; using System.Text; public class StringTester { static void Main( ) { // create some strings to work with string s1 = "One,Two,Three Liberty Associates, Inc."; // constants for the space and comma characters const char Space = ' '; const char Comma = ','; // array of delimiters to split the sentence with char[] delimiters = new char[] { Space, Comma }; // use a StringBuilder class to build the // output string StringBuilder output = new StringBuilder( ); int ctr = 1; // split the string and then iterate over the // resulting array of strings foreach (string subString in s1.Split(delimiters)) { // AppendFormat appends a formatted string output.AppendFormat("{0}: {1}\n",ctr++,subString); } Console.WriteLine(output); } } } Only the last part of the program is modified. Rather than using the concatenation operator to modify the string, use the AppendFormat( ) method of StringBuilder to append new, formatted strings as you create them. This is much easier and far more efficient. The output is identical: 1: One 2: Two 3: Three 4: Liberty 5: Associates 6: 7: Inc.
|
[ Team LiB ] |