17.2 Cleaning Up After Your HTML EditorAlthough you can create and edit HTML/XHTML documents with a text editor, such as vi or Notepad, most HTML authors use an application that is designed for creating web pages — several are free of charge, many offer a free evaluation period, and most are available for download over the Web. Be forewarned, though; in our experience, you will rarely (if ever) be able to create a web document from one of these editors without having to inspect, add to, edit, and sometimes even repair the source HTML that the editor generates. The following sections discuss a few things that you should know about and watch out for. 17.2.1 Where Did My Document Go?One of the first things you will notice is that many of the HTML editors automatically introduce into your document markup that you did not explicitly select or write. Remember this very simple HTML document that we started with in Chapter 2? <html> <head> <title>My first HTML document</title> </head> <body> <h2>My first HTML document</h2> Hello, <i>World Wide Web!</i> <!-- No "Hello, World" for us --> <p> Greetings from<br> <a href="http://www.ora.com">O'Reilly & Associates</a> <p> Composed with care by: <cite>(insert your name here)</cite> <br>©2000 and beyond </body> </html> Here it is what the source looks like after you load it into Microsoft Word 2000: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta http-equiv=Content-Type content="text/html; charset=us-ascii"> <meta name=ProgId content=Word.Document> <meta name=Generator content="Microsoft Word 9"> <meta name=Originator content="Microsoft Word 9"> <link rel=File-List href="./ch01-1_MS_files/filelist.xml"> <title>My first HTML document</title> <!--[if gte mso 9]><xml> <o:DocumentProperties> <o:Author>William Kennedy</o:Author> <o:LastAuthor>William Kennedy</o:LastAuthor> <o:Revision>2</o:Revision> <o:TotalTime>7</o:TotalTime> <o:Created>2002-06-19T18:58:00Z</o:Created> <o:LastSaved>2002-06-19T18:58:00Z</o:LastSaved> <o:Pages>1</o:Pages> <o:Words>26</o:Words> <o:Characters>152</o:Characters> <o:Company>ActivMedia Robotics</o:Company> <o:Lines>1</o:Lines> <o:Paragraphs>1</o:Paragraphs> <o:CharactersWithSpaces>186</o:CharactersWithSpaces> <o:Version>9.3821</o:Version> </o:DocumentProperties> </xml><![endif]--> <style> <!-- /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-parent:""; margin:0in; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-fareast-font-family:"Times New Roman";} p {font-size:12.0pt; font-family:"Times New Roman"; mso-fareast-font-family:"Times New Roman";} @page Section1 {size:8.5in 11.0in; margin:1.0in 1.25in 1.0in 1.25in; mso-header-margin:.5in; mso-footer-margin:.5in; mso-paper-source:0;} div.Section1 {page:Section1;} --> </style> <!--[if gte mso 9]><xml> <o:shapedefaults v:ext="edit" spidmax="1026"/> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext="edit"> <o:idmap v:ext="edit" data="1"/> </o:shapelayout></xml><![endif]--> </head> <body lang=EN-US link=blue vlink=blue style='tab-interval:.5in'> <div class=Section1> <h2>My first HTML document</h2> <p class=MsoNormal>Hello, <i>World Wide Web</i> </p> <!-- No "Hello, World" for us --> <p>Greetings from<br> <a href="http://www.ora.com">O'Reilly & Associates</a> </p> <p>Composed with care by: <cite>(insert your name here)</cite> <br> ©2002 and beyond </p> </div> </body> </html> Yeow! Where did the document go? Excessive markup makes the source document almost humanly impossible to read. What infuriates document purists like us, beyond the fact that lots of stuff that we neither wanted nor asked for was added, is that Word 2000 automatically treats any text document containing HTML markup as fodder for its mill. You can remove the .html or .htm suffix from the filename or delete <html> and <head> from the document, to no avail — Word will still get you. Microsoft isn't alone in cluttering the source. Most HTML editors add at least a <meta> tag that contains their product information. Many go through and "fix" your document to comply with current standards and practices, too — for example, by adding all those paragraph and list-item end tags that HTML allows you to omit. (From an XHTML standpoint, we admit that this meddling is probably valid.) To its credit, Word runs well, unlike other tools that routinely crashed without warning as we fought with their treatment of the markup. Microsoft even offers a Word plug-in that removes the additional markup, so that you can recover a reasonable facsimile of the original document.[2]
17.2.2 When and Why to Edit the EditorNo matter how good the HTML editor is, you'll inevitably have to edit the (albeit cluttered) source it generates. We've had to do it a lot ourselves, and so have all the web developers we've talked with over the last few years. Not all HTML editors provide an easy means to add JavaScript to your documents, and many are not up-to-date with the HTML/XHTML and CSS2 standards. Remember, too, that the popular browsers don't always agree on how they render a tag, and even different versions of the same browser may differ. Furthermore, even the best HTML editors don't necessarily support extensions to the language. So into the source you'll have to go, whether to include some HTML feature not yet supported by the editor (such as a new CSS2 property), to insert an attribute value or keyword, or to modify ones that the editor added. The tip is this: compose first. Try to start with a clean, finished document. Concentrate on content from the outset, and add the special effects later. Use a good HTML editor from the start, or prepare your documents in two steps with two different tools — a good content editor followed by a good HTML editor — particularly if you plan to distribute the document in a format other than HTML. 17.2.3 Use the BestIf you compose web pages, we can't imagine you not using an HTML editor of some sort. The convenience is just too compelling. But choose carefully: some HTML editors are abysmal, and you'll spend more time hunting down misplaced tags and errant attributes than you'll spend actually creating the document. Top tip: you get what you pay for. It's no surprise that HTML editors vary greatly in their features. Many editors let you switch the display from source text to what may appear when rendered by a browser. Some simply let you add tags and modify attribute values through pull-down menus and hot-key options. Others are WYSIWYG layout tools that make it easy to include graphics and other multimedia content. Other advanced features include embedding and testing applets and scripts. In general, HTML editors fall into one of two categories: either they are good layout tools, including advanced styling features and tools for dynamic content, or they excel at content creation and management. Obviously, if you are producing flashy, commercial web pages that rely on advanced layout techniques and include lots of different styles and dynamic content, use a good layout tool. If you are producing a content-rich document, use a tool that provides good editorial assistance. No matter which type you use, there are some common considerations to keep in mind when selecting an HTML editor:
|