[ Team LiB ] |
Recipe 6.21 Program: urlifyThis program puts HTML links around URLs in files. It doesn't work on all possible URLs, but does hit the most common ones. It tries to avoid including end-of-sentence punctuation in the marked-up URL. It is a typical Perl filter, so it can be fed input from a pipe: % gunzip -c ~/mail/archive.gz | urlify > archive.urlified or by supplying files on the command line: % urlify ~/mail/*.inbox > ~/allmail.urlified The program is shown in Example 6-10. Example 6-10. urlify#!/usr/bin/perl # urlify - wrap HTML links around URL-like constructs $protos = '(http|telnet|gopher|file|wais|ftp)'; $ltrs = '\w'; $gunk = ';/#~:.?+=&%@!\-'; $punc = '.:?\-'; $any = "${ltrs}${gunk}${punc}"; while (<>) { s{ \b # start at word boundary ( # begin $1 { $protos : # need resource and a colon [$any] +? # followed by on or more # of any valid character, but # be conservative and take only # what you need to.... ) # end $1 } (?= # look-ahead non-consumptive assertion [$punc]* # either 0 or more punctuation [^$any] # followed by a non-url char | # or else $ # then end of the string ) }{<A HREF="$1">$1</A>}igox; print; } |
[ Team LiB ] |