Perl Cookbook

Perl CookbookSearch this book
Previous: 6.1. Copying and Substituting SimultaneouslyChapter 6
Pattern Matching
Next: 6.3. Matching Words
 

6.2. Matching Letters

Problem

You want to see whether a value only consists of alphabetic characters.

Solution

The obvious character class for matching regular letters isn't good enough in the general case:

if ($var =~ /^[A-Za-z]+$/) {
    # it is purely alphabetic
}

That's because it doesn't respect the user's locale settings. If you need to match letters with diacritics as well, use locale and match against a negated character class:

use locale;
if ($var =~ /^[^\W\d_]+$/) {
    print "var is purely alphabetic\n";
}

Discussion

Perl can't directly express "something alphabetic" independent of locale, so we have to be more clever. The \w regular expression notation matches one alphabetic, numeric, or underscore character. Therefore, \W is not one of those. The negated character class [^\W\d_] specifies a byte that must not be an alphanumunder, a digit, or an underscore. That leaves us with nothing but alphabetics, which is what we were looking for.

Here's how you'd use this in a program:

use locale;
use POSIX 'locale_h';

# the following locale string might be different on your system
unless (setlocale(LC_ALL, "fr_CA.ISO8859-1")) {
    die "couldn't set locale to French Canadian\n";
}

while (<DATA>) {
    chomp;
    if (/^[^\W\d_]+$/) {
        print "$_: alphabetic\n";
    } else {
        print "$_: line noise\n";
    }
}

__END__
silly
fa�ade
co�perate
ni�o
Ren�e
Moli�re
h�moglobin
na�ve
tsch��
random!stuff#here

See Also

The treatment of locales in Perl in perllocale (1); your system's locale (3) manpage; we discuss locales in greater depth in Recipe 6.12; the "Perl and the POSIX Locale" section of Chapter 7 of Mastering Regular Expressions


Previous: 6.1. Copying and Substituting SimultaneouslyPerl CookbookNext: 6.3. Matching Words
6.1. Copying and Substituting SimultaneouslyBook Index6.3. Matching Words