Built-in String Handling Functions
Lending the power of regular
expressions to some simple data-handling operations is a bit like
giving a Kalashnikov to a small fish. It's simply
overkill. To prevent ourselves from getting carried away and throwing
away potential speed, we'll summarize the more
useful of Perl's built-in string handling functions
in Table C-1. Repeat after us:
We're only allowed to use regular expressions if the
built-in functions won't hack it.
In this table, the Perl function is shown in lowercase (e.g.,
index) and its replaceable parameters in
uppercase (e.g., STRING). As with most things in
Perl, many of the functions in Table C-1 use
$_ as a default EXPRESSION
value, if no EXPRESSION value is supplied.
Table C-1. Built-in Perl string-handling functions
index STRING, SUBSTRING
[,OFFSET]
|
Returns the position of the first SUBSTRING in
STRING, where the first position is zero. If
OFFSET is given, it tells
index how many characters to skip before
searching:
index('Toad of Toad
Hall',
'Toad') gives 0
index('Toad of Toad
Hall', 'Toad',
1) gives 8
(-1 is returned if no match is found)
|
join EXPRESSION, LIST
|
Joins a LIST of strings into a single string,
each separated by EXPRESSION (which can be an
empty string, ""):
join ":",
"Badger",
"Ratty",
"Mole" gives
Badger:Ratty:Mole
|
lc EXPRESSION
|
Lowercases EXPRESSION:
lc "The Stoats took the
Hall" gives the stoats took the
hall
|
lcfirst EXPRESSION
|
Lowercases the first letter of EXPRESSION:
lcfirst
"MyBeautifulMind" gives
myBeautifulMind
|
length EXPRESSION
|
Gives the length of EXPRESSION:
length
"Washerwoman" gives
11
|
reverse EXPRESSION
|
When used in a scalar context and with a single scalar, this reverses
EXPRESSION:
reverse "Poop poop, said
Toad" gives daoT dias ,poop
pooP
(reverse is also often used in a list context to
reverse arrays, hashes, and other listy type things.)
|
rindex STRING, SUBSTRING
[,POSITION]
|
Similar to index, this returns the position of
the rightmost SUBSTRING in
STRING. The optional
POSITION is the rightmost position which is
acceptable:
rindex "Toad of Toad
Hall",
"Toad" gives 8
rindex "Toad of Toad
Hall", "Toad",
7 gives 0
(-1 is returned if no match is found.)
|
split /PATTERN/, EXPRESSION,
LIMIT
|
This function is the black sheep of the built-in string handling
world, because it rather naughtily uses regular expressions to
process the /PATTERN/ match, to split
EXPRESSION strings into lists. After
we've covered regular expressions proper,
we'll come back to split, one
of the most useful of the Perl munge operators.
|
sprintf FORMAT, LIST
|
Returns a formatted string in the manner of the ubiquitous
printf conventions from the C programming
language. The main sprintf Perl formatters are
described in Table C-2. This is highly useful for
reports.
|
substr EXPRESSION, OFFSET [,LENGTH]
[,REPLACEMENT]
|
Extracts a substring out of EXPRESSION, starting
at OFFSET, where the first position is zero:
substr "Messing about in
boats", 8 gives about in
boats
If OFFSET is negative, the count starts from the
right-hand side of the string: substr
"Messing about in boats",
-8 gives in boats
If LENGTH is omitted, everything to the end of
the string is returned. Otherwise, LENGTH
determines the length of the string returned: substr
"Messing about in boats", 8,
5 gives about
If LENGTH is negative, this is how many
characters are left off the end of the substring: substr
"Messing about in boats", 8,
-5 gives about in
The optional REPLACEMENT will replace the
substring it finds in EXPRESSION:
$stoat1 = "Messing about in boats";
$stoat2 = substr $stoat1, 0, 16, "Wonderful";
print $stoat1, "\n";
print $stoat2, "\n";
This produces:
Wonderful boats
Messing about in
An alternative to using REPLACEMENT is to use
substr on the left-hand side of an assignment
operation:
$stoat = "Messing about in boats";
substr ($stoat, 0, 16) = "Wonderful";
print $stoat, "\n";
This produces:
Wonderful boats
|
uc EXPRESSION
|
Uppercases EXPRESSION:
uc "canal
barge" gives CANAL
BARGE
|
ucfirst EXPRESSION
|
Uppercases the first character of EXPRESSION:
ucfirst "railway
engine" gives Railway
engine
|
Table C-2. Perl formats for sprintf
%c
|
A character with the given number
|
%s
|
A string
|
%d
|
A signed integer, in decimal
|
%u
|
An unsigned integer, in decimal
|
%o
|
An unsigned integer, in octal
|
%x
|
An unsigned integer, in hexadecimal
|
%e
|
A floating-point number, in scientific notation (e.g., 1.00e+09 for 1
billion)
|
%f
|
A floating-point number, in fixed decimal notation
|
%g
|
A floating-point number, in either %e or
%f notation
|
%X
|
Like %x, but using upper-case letters
|
%E
|
Like %e, but using an upper-case
"E" (e.g., 1.00E+09)
|
%G
|
Like %g, but with an upper-case
"E" (if applicable)
|
%b
|
An unsigned integer, in binary
|
%p
|
A pointer (outputs the Perl value's address in
hexadecimal)
|
%n
|
This is a special formatter which stores the number of characters
output so far into the next variable in the parameter list
|
%%
|
An ordinary percent sign
|
|