Team LiB   Previous Section   Next Section

Appendix C. The Essential Guide to Regular Expressions

The concept of regular expressions (or regexes as they're often known) is central to the Perl language. Regular expressions have been available for a long time in Unix tools such as grep, sed, awk, and egrep, and they have also made their way into Java and Python. But they are most closely associated with Perl where they are used extensively for pattern matching. They are also very important for data munging, as we describe in Appendix D.

Regular expressions are patterns of literals and metacharacters that match target combinations of characters embedded within input data. Although the simplest regular expression can be very simple indeed (it's simply a literal string), regexes can also be very complex. They can provide amazing efficiency, but can also lead to great frustration. We have found that unless you live in the same universe as Spock or Data, where regexes compete with music and chess for sublime mathematical resonance, they most likely mean pain, bashed foreheads, and late-night viewings of Casablanca and The Matrix to calm the nerves. It's only really by writing a million and one regexes that most people do eventually figure out what the heck is going on — and even then, there's more to learn.

In this appendix, we'll look at the origins of regular expressions and the main concepts underlying their use. We'll also examine Perl's built-in string-handling functions, which often supply enough functionality that you won't need to use regexes at all. We'll discuss the basics of constructing regular expressions and will pay special attention to the use of metacharacters and suffixes. Metacharacters are special characters such as the asterisk (*) that can be used to drive fuzzy nonliteral matching. Suffixes are special switches at the end of matches and substitutions that change their exact operation — for example, by making them replace strings globally across an entire input, rather than just substituting the first one.

Obviously, in this short appendix we can only scratch the surface of regular expressions. We strongly recommend that you consult the definitive reference on regular expressions, Jeffrey Friedl's excellent Mastering Regular Expressions (O'Reilly & Associates); because of its cover design, it's known as the Owl Book. You can also generate the full online documentation for Perl regular expressions with the following command:

$ perldoc perlre
    Team LiB   Previous Section   Next Section