1.20 String Handling

There is no basic type for strings in C. A string is simply a sequence of characters ending with the string terminator, stored in a char array. A string is represented by a char pointer that points to the first character in the string.

The customary functions for manipulating strings are declared in string.h. Those functions that modify a string return a pointer to the modified string. The functions used to search for a character or a substring return a pointer to the occurrence found, or a null pointer if the search was unsuccessful.

char *strcat ( char *s1 , const char *s2  );

Appends the string s2 to the end of s1. The first character copied from s2 replaces the string terminator character of s1.

char *strchr ( const char *s , int c  );

Locates the first occurrence of the character c in the string s.

int strcmp ( const char *s1 , const char *s2  );

Compares the strings s1 and s2, and returns a value that is greater than, equal to, or less than 0 to indicate whether s1 is greater than, equal to, or less than s2. A string is greater than another if the first character code in it which differs from the corresponding character code in the other string is greater than that character code.

int strcoll ( const char *s1 , const char *s2  );

Transforms an internal copy of the strings s1 and s2 using the function strxfrm(), then compares them using strcmp() and returns the result.

char *strcpy ( char *s1 , const char *s2  );

Copies s2 to the char array referenced by s1. This array must be large enough to contain s2 including its string terminator character '\0'.

int strcspn ( const char *s1 , const char *s2  );

Determines the length of the maximum initial substring of s1 that contains none of the characters found in s2.

size_t strlen ( const char *s  );

Returns the length of the string addressed by s. The length of the string is the number of characters it contains, excluding the string terminator character '\0'.

char *strncat ( char *s1 , const char *s2 , size_t n  );

Appends the first n characters of s2 (and the string terminator character) to s1.

int strncmp ( const char *s1 , const char *s2 , size_t n  );

Compares the first n characters of the strings s1 and s2. The return value is the same as for strcmp().

char *strncpy ( char *s1 , const char *s2 , size_t n  );

Copies the first n characters of s2 to the char array s1. The string terminator character '\0' is not appended.

char *strpbrk ( const char *s1 , const char *s2  );

Locates the first occurrence in s1 of any of the characters contained in s2.

char *strrchr ( const char *s , int c  );

Locates the last occurrence of the character c in the string s. The string terminator character '\0' is included in the search.

int strspn ( const char *s1 , const char *s2  );

Determines the length of the maximum initial substring of s1 that consists only of characters contained in s2.

char *strstr ( const char *s1 , const char *s2  );

Locates the first occurrence of s2 (without the terminating '\0') in s1.

char *strtok ( char *s1 , const char *s2  );

Breaks the string in s1 into the substrings ("tokens") delimited by any of the characters contained in s2.

size_t strxfrm ( char *s1 , const char *s2 , size_t n  );

Performs a locale-specific transformation (such as a case conversion) of s2 and copies the result to the char array with length n that is referenced by s1.

Similar functions for wide-character strings, declared in the header file wchar.h(*), have names beginning with wcs in place of str.

1.20.1 Conversion Between Strings and Numbers

A variety of functions are declared in the header file stdlib.h to obtain numerical interpretations of the initial digit characters in a string. The resulting number is the return value of the function.

int atoi ( const char *s  );  

Interprets the contents of the string s as a number with type int. The analogous functions atol(), atoll()(*), and atof() are used to convert a string into a number with type long, long long(*), or double.

double strtod ( const char *s , char **pptr  );

Serves a similar purpose to that of atof(), but takes the address of a char pointer as a second argument. If the char pointer referenced by pptr is not NULL, it is set to the first character in the string s (excluding any leading whitespace) that is not part of the substring representing a floating-point number.

The corresponding functions for conversion to the types float and long double are strtof()(*) and strtold()(*).

long strtol ( const char *s , char **pptr , int base  );

Converts a string to a number with type long. The third parameter is the base of the numeral string, and may be an integer between 2 and 36, or 0. If base is 0, the string s is interpreted as a numeral in base 8, 16, or 10, depending on whether it begins with 0, 0x, or one of the digits 1 to 9.

The analogous functions for converting a string to unsigned long, long long(*) or unsigned long long(*) are strtoul()(*), strtoll()(*), and strtoull()(*).

The header file inttypes.h(*) also declares the functions strtoimax() and strtoumax(), which convert the initial digits in a string to an integer of type intmax_t or uintmax_t.

Similar functions for wide-character strings are declared in the header file wchar.h(*). Their names begin with wcs in place of str.

The following function from the printf family is used to convert numeric values into a formatted numeral string:

int sprintf (char *s ,const char *format ,.../*a1 ,...,an */);

Copies the format string format to the char array referenced by s, with the conversion specifications replaced using the values in the argument list a1,...,an.

Numerical values can also be read from a string based on a format string:

int sscanf (char *s ,const char *format ,.../*a1 ,...,an */);

Reads and converts data from s, and copies the resulting values to the locations addressed by the argument list a1,...,an.

The functions vsprintf() and vsscanf() are similar to sprintf() and sscanf(), but with the variable argument list replaced by an object of type va_list that has been initialized using the va_start macro (see Section 1.11.4 earlier in this book). The functions snprintf() and vsnprintf() write a maximum of n characters, including the string terminator character, to the array referenced by s. These functions return the number of characters actually written to the array, not counting the string terminator character.

The corresponding formatted string input/output functions for wide-character strings are declared in wchar.h(*). Their names begin with sw (for "string, wide") in place of the initial s (for "string") in the names of the functions described above for char strings. For example, swprintf().

1.20.2 Multibyte Character Conversion

A multibyte character may occupy more than one byte in memory. The maximum number of bytes that can be used to represent a multibyte character is the value of the macro MB_CUR_MAX, which is defined in stdlib.h. Its value is dependent on the current locale. In the default locale "C", MB_CUR_MAX has the value 1.

Every multibyte character corresponds to exactly one character of type wchar_t . The functions for multibyte character conversion are declared in the header file stdlib.h.

int mblen ( const char *s , size_t max  );

Determines the length of the multibyte character pointed to by s. The maximum length of the character is specified by max. Accordingly, max must not exceed MB_CUR_MAX.

size_t wctomb ( char *s , wchar_t wc  );

Converts the wide character wc into the multibyte representation, and writes the corresponding multibyte character in the array addressed by s.

size_t wcstombs ( char *s , const wchar_t *p , size_t n  );

Converts the first n wide characters referenced by p into multibyte characters, and copies the results to the char array addressed by s.

size_t mbtowc ( wchar_t *p , const char *s , size_t max  );

Determines the wide character code corresponding to the multibyte character in s, whose maximum length is specified by max, and copies the result to the wchar_t variable referenced by p.

size_t mbstowcs ( wchar_t *p , const char *s , size_t n  );

Converts the first n multibyte characters of s into the wide characters and copies the result to the array addressed by s.

Similar functions with an additional r in their names (for restartable) are also declared in wchar.h(*). The restartable functions have an additional parameter, a pointer to the type mbstate_t, that must point to an object describing the current wide/multibyte character conversion state. Furthermore, the function mbsinit()(*) can be used to test whether the current conversion state is an initial conversion state.