You want to convert ASCII text to HTML.
Use the simple little encoding filter in Example 20.3.
#!/usr/bin/perl -w -p00 # text2html - trivial html encoding of normal text # -p means apply this script to each record. # -00 mean that a record is now a paragraph use HTML::Entities; $_ = encode_entities($_, "\200-\377"); if (/^\s/) { # Paragraphs beginning with whitespace are wrapped in <PRE> s{(.*)$} {<PRE>\n$1</PRE>\n}s; # indented verbatim } else { s{^(>.*)} {$1<BR>}gm; # quoted text s{<URL:(.*?)>} {<A HREF="$1">$1</A>}gs # embedded URL (good) || s{(http:\S+)} {<A HREF="$1">$1</A>}gs; # guessed URL (bad) s{\*(\S+)\*} {<STRONG>$1</STRONG>}g; # this is *bold* here s{\b_(\S+)\_\b} {<EM>$1</EM>}g; # this is _italics_ here s{^} {<P>\n}; # add paragraph tag }
Converting arbitrary plain text to HTML has no general solution because there are too many different, conflicting ways of representing formatting information in a plain text file. The more you know about the input, the better the job you can do of formatting it.
For example, if you knew that you would be fed a mail message, you could add this block to format the mail headers:
BEGIN { print "<TABLE>"; $_ = encode_entities(scalar <>); s/\n\s+/ /g; # continuation lines while ( /^(\S+?:)\s*(.*)$/gm ) { # parse heading print "<TR><TH ALIGN='LEFT'>$1</TH><TD>$2</TD></TR>\n"; } print "</TABLE><HR>"; }
The documentation for the CPAN module HTML::Entities
Copyright © 2002 O'Reilly & Associates. All rights reserved.