[ Team LiB ] Previous Section Next Section

Recipe 1.5 Using Named Unicode Characters

1.5.1 Problem

You want to use Unicode names for fancy characters in your code without worrying about their code points.

1.5.2 Solution

Place a use charnames at the top of your file, then freely insert "\N{CHARSPEC}" escapes into your string literals.

1.5.3 Discussion

The use charnames pragma lets you use symbolic names for Unicode characters. These are compile-time constants that you access with the \N{CHARSPEC} double-quoted string sequence. Several subpragmas are supported. The :full subpragma grants access to the full range of character names, but you have to write them out in full, exactly as they occur in the Unicode character database, including the loud, all-capitals notation. The :short subpragma gives convenient shortcuts. Any import without a colon tag is taken to be a script name, giving case-sensitive shortcuts for those scripts.

use charnames ':full';
print "\N{GREEK CAPITAL LETTER DELTA} is called delta.\n";

D is called delta. 

use charnames ':short';
print "\N{greek:Delta} is an upper-case delta.\n";

D is an upper-case delta. 

use charnames qw(cyrillic greek);
print "\N{Sigma} and \N{sigma} are Greek sigmas.\n";
print "\N{Be} and \N{be} are Cyrillic bes.\n";

S and  s are Greek sigmas. 
 and    are Cyrillic bes. 

Two functions, charnames::viacode and charnames::vianame, can translate between numeric code points and the long names. The Unicode documents use the notation U+XXXX to indicate the Unicode character whose code point is XXXX, so we'll use that here in our output.

use charnames qw(:full);
for $code (0xC4, 0x394) { 
    printf "Character U+%04X (%s) is named %s\n",
        $code, chr($code), charnames::viacode($code);
}

Character U+00C4 (Ä) is named LATIN CAPITAL LETTER A WITH DIAERESIS
Character U+0394 (D) is named GREEK CAPITAL LETTER DELTA

use charnames qw(:full);
$name = "MUSIC SHARP SIGN";
$code = charnames::vianame($name);
printf "%s is character U+%04X (%s)\n",
    $name, $code, chr($code); 

MUSIC SHARP SIGN is character U+266F (#)

Here's how to find the path to Perl's copy of the Unicode character database:

% perl -MConfig -le 'print "$Config{privlib}/unicore/NamesList.txt"'
/usr/local/lib/perl5/5.8.1/unicore/NamesList.txt

Read this file to learn the character names available to you.

1.5.4 See Also

The charnames(3) manpage and Chapter 31 of Programming Perl; the Unicode Character Database at http://www.unicode.org/

    [ Team LiB ] Previous Section Next Section