Team LiB   Previous Section   Next Section

Built-in String Handling Functions

Lending the power of regular expressions to some simple data-handling operations is a bit like giving a Kalashnikov to a small fish. It's simply overkill. To prevent ourselves from getting carried away and throwing away potential speed, we'll summarize the more useful of Perl's built-in string handling functions in Table C-1. Repeat after us:

We're only allowed to use regular expressions if the built-in functions won't hack it.

In this table, the Perl function is shown in lowercase (e.g., index) and its replaceable parameters in uppercase (e.g., STRING). As with most things in Perl, many of the functions in Table C-1 use $_ as a default EXPRESSION value, if no EXPRESSION value is supplied.

Table C-1. Built-in Perl string-handling functions

Function

Description

index STRING, SUBSTRING [,OFFSET]

Returns the position of the first SUBSTRING in STRING, where the first position is zero. If OFFSET is given, it tells index how many characters to skip before searching:

index('Toad of Toad Hall', 'Toad') gives 0

index('Toad of Toad Hall', 'Toad', 1) gives 8

(-1 is returned if no match is found)

join EXPRESSION, LIST

Joins a LIST of strings into a single string, each separated by EXPRESSION (which can be an empty string, ""):

join ":", "Badger", "Ratty", "Mole" gives Badger:Ratty:Mole

lc EXPRESSION

Lowercases EXPRESSION:

lc "The Stoats took the Hall" gives the stoats took the hall

lcfirst EXPRESSION

Lowercases the first letter of EXPRESSION:

lcfirst "MyBeautifulMind" gives myBeautifulMind

length EXPRESSION

Gives the length of EXPRESSION:

length "Washerwoman" gives 11

reverse EXPRESSION

When used in a scalar context and with a single scalar, this reverses EXPRESSION:

reverse "Poop poop, said Toad" gives daoT dias ,poop pooP

(reverse is also often used in a list context to reverse arrays, hashes, and other listy type things.)

rindex STRING, SUBSTRING [,POSITION]

Similar to index, this returns the position of the rightmost SUBSTRING in STRING. The optional POSITION is the rightmost position which is acceptable:

rindex "Toad of Toad Hall", "Toad" gives 8

rindex "Toad of Toad Hall", "Toad", 7 gives 0

(-1 is returned if no match is found.)

split /PATTERN/, EXPRESSION, LIMIT

This function is the black sheep of the built-in string handling world, because it rather naughtily uses regular expressions to process the /PATTERN/ match, to split EXPRESSION strings into lists. After we've covered regular expressions proper, we'll come back to split, one of the most useful of the Perl munge operators.

sprintf FORMAT, LIST

Returns a formatted string in the manner of the ubiquitous printf conventions from the C programming language. The main sprintf Perl formatters are described in Table C-2. This is highly useful for reports.

substr EXPRESSION, OFFSET [,LENGTH] [,REPLACEMENT]

Extracts a substring out of EXPRESSION, starting at OFFSET, where the first position is zero: substr "Messing about in boats", 8 gives about in boats

If OFFSET is negative, the count starts from the right-hand side of the string: substr "Messing about in boats", -8 gives in boats

If LENGTH is omitted, everything to the end of the string is returned. Otherwise, LENGTH determines the length of the string returned: substr "Messing about in boats", 8, 5 gives about

If LENGTH is negative, this is how many characters are left off the end of the substring: substr "Messing about in boats", 8, -5 gives about in

The optional REPLACEMENT will replace the substring it finds in EXPRESSION:

$stoat1 = "Messing about in boats";
$stoat2 = substr $stoat1, 0, 16, "Wonderful";
print $stoat1, "\n";
print $stoat2, "\n";

This produces:

Wonderful boats
Messing about in

An alternative to using REPLACEMENT is to use substr on the left-hand side of an assignment operation:

$stoat = "Messing about in boats";
substr ($stoat, 0, 16) = "Wonderful";
print $stoat, "\n";

This produces:

Wonderful boats

uc EXPRESSION

Uppercases EXPRESSION:

uc "canal barge" gives CANAL BARGE

ucfirst EXPRESSION

Uppercases the first character of EXPRESSION:

ucfirst "railway engine" gives Railway engine

Table C-2. Perl formats for sprintf

Formatter

Description

%c

A character with the given number

%s

A string

%d

A signed integer, in decimal

%u

An unsigned integer, in decimal

%o

An unsigned integer, in octal

%x

An unsigned integer, in hexadecimal

%e

A floating-point number, in scientific notation (e.g., 1.00e+09 for 1 billion)

%f

A floating-point number, in fixed decimal notation

%g

A floating-point number, in either %e or %f notation

%X

Like %x, but using upper-case letters

%E

Like %e, but using an upper-case "E" (e.g., 1.00E+09)

%G

Like %g, but with an upper-case "E" (if applicable)

%b

An unsigned integer, in binary

%p

A pointer (outputs the Perl value's address in hexadecimal)

%n

This is a special formatter which stores the number of characters output so far into the next variable in the parameter list

%%

An ordinary percent sign

    Team LiB   Previous Section   Next Section