[ Team LiB ] Previous Section Next Section

Recipe 1.19 Trimming Blanks from the Ends of a String

1.19.1 Problem

You have read a string that may have leading or trailing whitespace, and you want to remove it.

1.19.2 Solution

Use a pair of pattern substitutions to get rid of them:

$string =~ s/^\s+//;
$string =~ s/\s+$//;

Or write a function that returns the new value:

$string = trim($string);
@many   = trim(@many);

sub trim {
    my @out = @_;
    for (@out) {
        s/^\s+//;          # trim left
        s/\s+$//;          # trim right
    }
    return @out =  = 1 
              ? $out[0]   # only one to return
              : @out;     # or many
}

1.19.3 Discussion

This problem has various solutions, but this one is the most efficient for the common case. This function returns new versions of the strings passed in to it with their leading and trailing whitespace removed. It works on both single strings and lists.

To remove the last character from the string, use the chop function. Be careful not to confuse this with the similar but different chomp function, which removes the last part of the string contained within that variable if and only if it is contained in the $/ variable, "\n" by default. These are often used to remove the trailing newline from input:

# print what's typed, but surrounded by > < symbols
while (<STDIN>) {
    chomp;
    print ">$_<\n";
}

This function can be embellished in any of several ways.

First, what should you do if several strings are passed in, but the return context demands a single scalar? As written, the function given in the Solution does a somewhat silly thing: it (inadvertently) returns a scalar representing the number of strings passed in. This isn't very useful. You could issue a warning or raise an exception. You could also squash the list of return values together.

For strings with spans of extra whitespace at points other than their ends, you could have your function collapse any remaining stretch of whitespace characters in the interior of the string down to a single space each by adding this line as the new last line of the loop:

s/\s+/ /g;                # finally, collapse middle

That way a string like " but\t\tnot here\n" would become "but not here". A more efficient alternative to the three substitution lines:

s/^\s+//;          
s/\s+$//;          
s/\s+/ /g;

would be:

$_ = join(' ', split(' '));

If the function isn't passed any arguments at all, it could act like chop and chomp by defaulting to $_. Incorporating all of these embellishments produces this function:

# 1. trim leading and trailing white space
# 2. collapse internal whitespace to single space each
# 3. take input from $_ if no arguments given
# 4. join return list into single scalar with intervening spaces 
#     if return is scalar context

sub trim {
    my @out = @_ ? @_ : $_;
    $_ = join(' ', split(' ')) for @out;
    return wantarray ? @out : "@out";
}

1.19.4 See Also

The s/// operator in perlre(1) and perlop(1) and Chapter 5 of Programming Perl; the chomp and chop functions in perlfunc(1) and Chapter 29 of Programming Perl; we trim leading and trailing whitespace in the getnum function in Recipe 2.1

    [ Team LiB ] Previous Section Next Section