CONTENTS

Chapter 13. Larger Web Site Examples I

13.1 "Things to Do When Visiting Chicago"

This chapter is the fourth part of our look at Python Internet programming, and continues the last chapter's discussion. In the prior chapter, we explored the fundamentals of server-side CGI scripting in Python. Armed with that knowledge, in this and the following chapter we move on to two larger case studies that underscore advanced CGI topics:

PyMailCgi

This chapter presents PyMailCgi,a web site for reading and sending email that illustrates security concepts, hidden form fields, URL generation, and more. Because this system is similar in spirit to the PyMailGui program shown in Chapter 11, this example also serves as a comparison of web and non-web applications.

PyErrata

Chapter 14, presents PyErrata, a web site for posting book comments and bugs that introduces database concepts in the CGI domain. This system demonstrates common ways to store data persistently on the server between web transactions, and addresses concurrent update problems inherent in the CGI model.

Both of these case studies are based on CGI scripting, but implement full-blown web sites that do something more useful than the last chapter's examples.

As usual, these chapters split their focus between application-level details and Python programming concepts. Because both of the case studies presented are fairly large, they illustrate system design concepts that are important in actual projects. They also say more about CGI scripts in general. PyMailCgi, for example, introduces the notions of state retention in hidden fields and URLs, as well as security concerns and encryption. PyErrata provides a vehicle for exploring persistent database concepts in the context of web sites.

Neither system here is particularly flashy or feature-rich as web sites go (in fact, the initial cut of PyMailCgi was thrown together during a layover at a Chicago airport). Alas, you will find neither dancing bears nor blinking lights at either of these sites. On the other hand, they were written to serve real purposes, speak more to us about CGI scripting, and hint at just how far Python server-side programs can take us. In Chapter 15, we will explore higher-level systems and tools that build upon ideas we will apply here. For now, let's have some fun with Python on the Web.

13.2 The PyMailCgi Web Site

Near the end of Chapter 11, we built a program called PyMailGui that implemented a complete Python+Tk email client GUI (if you didn't read that section, you may want to take a quick look at it now). Here, we're going to do something of the same, but on the Web: the system presented in this section, PyMailCgi, is a collection of CGI scripts that implement a simple web-based interface for sending and reading email in any browser.

Our goal in studying this system is partly to learn a few more CGI tricks, partly to learn a bit about designing larger Python systems in general, and partly to underscore the trade-offs between systems implemented for the Web (PyMailCgi) and systems written to run locally (PyMailGui). This chapter hints at some of these trade-offs along the way, and returns to explore them in more depth after the presentation of this system.

13.2.1 Implementation Overview

At the top level, PyMailCgi allows users to view incoming email with the POP interface and to send new mail by SMTP. Users also have the option of replying to, forwarding, or deleting an incoming email while viewing it. As implemented, anyone can send email from the PyMailCgi site, but to view your email, you generally have to install PyMailCgi at your own site with your own mail server information (due to security concerns described later).

Viewing and sending email sounds simple enough, but the interaction involved involves a number of distinct web pages, each requiring a CGI script or HTML file of its own. In fact, PyMailCgi is a fairly linear system -- in the most complex user interaction scenario, there are six states (and hence six web pages) from start to finish. Because each page is usually generated by a distinct file in the CGI world, that also implies six source files.

To help keep track of how all of PyMailCgi's files fit into the overall system, I wrote the file in Example 13-1 before starting any real programming. It informally sketches the user's flow through the system and the files invoked along the way. You can certainly use more formal notations to describe the flow of control and information through states such as web pages (e.g., dataflow diagrams), but for this simple example this file gets the job done.

Example 13-1. PP2E\Internet\Cgi-Web\PyMailCgi\pageflow.txt
file or script                          creates
--------------                          -------

[pymailcgi.html]                        Root window
 => [onRootViewLink.cgi]                Pop password window
     => [onViewPswdSubmit.cgi]          List window (loads all pop mail)
         => [onViewListLink.cgi]        View Window + pick=del|reply|fwd (fetch)
             => [onViewSubmit.cgi]      Edit window, or delete+confirm (del)
                 => [onSendSubmit.cgi]  Confirmation (sends smtp mail)
                     => back to root

 => [onRootSendLink.cgi]                Edit Window
     => [onSendSubmit.cgi]              Confirmation (sends smtp mail)
         => back to root

This file simply lists all the source files in the system, using => and indentation to denote the scripts they trigger.

For instance, links on the pymailcgi.html root page invoke onRootViewLink.cgi and onRootSendLink.cgi, both executable scripts. The script onRootViewLink.cgi generates a password page, whose Submit button in turn triggers onViewPswdSubmit.cgi, and so on. Notice that both the view and send actions can wind up triggering onSendSubmit.cgi to send a new mail; view operations get there after the user chooses to reply to or forward an incoming mail.

In a system like this, CGI scripts make little sense in isolation, so it's a good idea to keep the overall page flow in mind; refer to this file if you get lost. For additional context, Figure 13-1 shows the overall contents of this site, viewed on Windows with the PyEdit "Open" function.

Figure 13-1. PyMailCgi contents

figs/ppy2_1301.gif

The temp directory was used only during development. To install this site, all the files you see here are uploaded to a PyMailCgi subdirectory of my public_html web directory. Besides the page-flow HTML and CGI script files invoked by user interaction, PyMailCgi uses a handful of utility modules as well:

PyMailCgi also reuses parts of the pymail.py and mailconfig.py modules we wrote in Chapter 11; on my web server, these are installed in a special directory that is not necessarily the same as their location in the examples distribution (they show up in another server directory, not shown in Figure 13-1). As usual, PyMailCgi also uses a variety of standard Python library modules: smtplib, poplib, rfc822, cgi, urllib, time, rotor, and the like.

Carry-on Software

PyMailCgi works as planned and illustrates more CGI and email concepts, but I want to point out a few caveats up front. The application was initially written during a two-hour layover in Chicago's O'Hare airport (though debugging took a few hours more). I wrote it to meet a specific need -- to be able to read and send email from any web browser while traveling around the world teaching Python classes. I didn't design it to be aesthetically pleasing to others and didn't spend much time focusing on its efficiency.

I also kept this example intentionally simple for this book. For example, PyMailCgi doesn't provide all the features of the PyMailGui program in Chapter 11, and reloads email more than it probably should. In other words, you should consider this system a work in progress; it's not yet software worth selling. On the other hand, it does what it was intended to do, and can be customized by tweaking its Python source code -- something that can't be said of all software sold.

13.2.2 Presentation Overview

PyMailCgi is a challenge to present in a book like this, because most of the "action" is encapsulated in shared utility modules (especially one called commonhtml.py ); the CGI scripts that implement user interaction don't do much by themselves. This architecture was chosen deliberately, to make scripts simple and implement a common look-and-feel. But it means you must jump between files to understand how the system works.

To make this example easier to digest, we're going to explore its code in two chunks: page scripts first, and then the utility modules. First, we'll study screen shots of the major web pages served up by the system and the HTML files and top-level Python CGI scripts used to generate them. We begin by following a send mail interaction, and then trace how existing email is processed. Most implementation details will be presented in these sections, but be sure to flip ahead to the utility modules listed later to understand what the scripts are really doing.

I should also point out that this is a fairly complex system, and I won't describe it in exhaustive detail; be sure to read the source code along the way for details not made explicit in the narrative. All of the system's source code appears in this section (and also at http://examples.oreilly.com/python2), and we will study the key concepts in this system here. But as usual with case studies in this book, I assume that you can read Python code by now and will consult the example's source code for more details. Because Python's syntax is so close to executable pseudocode, systems are sometimes better described in Python than in English.

13.3 The Root Page

Let's start off by implementing a main page for this example. The file shown in Example 13-2 is primarily used to publish links to the Send and View functions' pages. It is coded as a static HTML file, because there is nothing to generate on the fly here.

Example 13-2. PP2E\Internet\Cgi-Web\PyMailCgi\pymailcgi.html
<HTML><BODY>
<TITLE>PyMailCgi Main Page</TITLE>
<H1 align=center>PyMailCgi</H1>
<H2 align=center>A POP/SMTP Email Interface</H2>
<P align=center><I>Version 1.0, April 2000</I></P>

<table><tr><td><hr>
<P>
<A href="http://rmi.net/~lutz/about-pp.html">
<IMG src="../PyErrata/ppsmall.gif" align=left 
alt="[Book Cover]" border=1 hspace=10></A>
This site implements a simple web-browser interface to POP/SMTP email
accounts.  Anyone can send email with this interface, but for security
reasons, you cannot view email unless you install the scripts with your
own email account information, in your own server account directory.
PyMailCgi is implemented as a number of Python-coded CGI scripts that run on
a server machine (not your local computer), and generate HTML to interact 
with the client/browser.  See the book <I>Programming Python, 2nd Edition</I>
for more details.</P>

<tr><td><hr>
<h2>Actions</h2>
<P><UL>
<LI><a href="onRootViewLink.cgi">View, Reply, Forward, Delete POP mail</a>
<LI><a href="onRootSendLink.cgi">Send a new email message by SMTP</a>
</UL></P>

<tr><td><hr>
<P>Caveats: PyMailCgi 1.0 was initially written during a 2-hour layover at 
Chicago's O'Hare airport.  This release is not nearly as fast or complete 
as PyMailGui (e.g., each click requires an Internet transaction, there
is no save operation, and email is reloaded often).  On the other hand, 
PyMailCgi runs on any web broswer, whether you have Python (and Tk) 
installed on your machine or not.  

<P>Also note that if you use these scripts to read your own email, PyMailCgi
does not guarantee security for your account password, so be careful out there.
See the notes in the View action page as well as the book for more information
on security policies.  Also see:

<UL>
<li>The <I>PyMailGui</I> program in the Email directory, which
        implements a client-side Python+Tk email GUI
<li>The <I>pymail.py</I> program in the Email directory, which 
        provides a simple command-line email interface
<li>The Python imaplib module which supports the IMAP email protocol
        instead of POP
<li>The upcoming openSSL support for secure transactions in the new 
        Python 1.6 socket module  
</UL></P>
</table><hr>

<A href="http://www.python.org">
<IMG SRC="../PyErrata/PythonPoweredSmall.gif" ALIGN=left   
ALT="[Python Logo]" border=0 hspace=15></A> 
<A href="../PyInternetDemos.html">More examples</A>
</BODY></HTML>

The file pymailcgi.html is the system's root page and lives in a PyMailCgi subdirectory of my web directory that is dedicated to this application (and helps keep its files separate from other examples). To access this system, point your browser to:

http://starship.python.net/~lutz/PyMailCgi/pymailcgi.html

If you do, the server will ship back a page like that shown in Figure 13-2.

Figure 13-2. PyMailCgi main page

figs/ppy2_1302.gif

Now, before you click on the View link here expecting to read your own email, I should point out that by default, PyMailCgi allows anybody to send email from this page with the Send link (as we learned earlier, there are no passwords in SMTP). It does not, however, allow arbitrary users on the Web to read their email accounts without typing an explicit and unsafe URL or doing a bit of installation and configuration. This is on purpose, and has to do with security constraints; as we'll see later, I wrote the system such that it never associates your email username and password together without encryption.

By default, then, this page is set up to read my (the author's) email account, and requires my POP password to do so. Since you probably can't guess my password (and wouldn't find my email helpful if you could), PyMailCgi is not incredibly useful as installed at this site. To use it to read your email instead, you should install the system's source code on your own server and tweak a mail configuration file that we'll see in a moment. For now, let's proceed by using the system as it is installed on my server, with my POP email account; it works the same way, regardless of which account it accesses.

13.4 Sending Mail by SMTP

PyMailCgi supports two main functions (as links on the root page): composing and sending new mail to others, and viewing your incoming mail. The View function leads to pages that let users reply to, forward, and delete existing email. Since the Send function is the simplest, let's start with its pages and scripts first.

13.4.1 The Message Composition Page

The Send function steps users through two pages: one to edit a message and one to confirm delivery. When you click on the Send link on the main page, the script in Example 13-3 runs on the server.

Example 13-3. PP2E\Internet\Cgi-Web\PyMailCgi\onRootSendLink.cgi
#!/usr/bin/python
# On 'send' click in main root window

import commonhtml
from externs import mailconfig

commonhtml.editpage(kind='Write', headers={'From': mailconfig.myaddress})

No, this file wasn't truncated; there's not much to see in this script, because all the action has been encapsulated in the commonhtml and externs modules. All that we can tell here is that the script calls something named editpage to generate a reply, passing in something called myaddress for its "From" header. That's by design -- by hiding details in utility modules, we make top-level scripts like this much easier to read and write. There are no inputs to this script either; when run, it produces a page for composing a new message, as shown in Figure 13-3.

Figure 13-3. PyMailCgi send (write) page

figs/ppy2_1303.gif

13.4.2 Send Mail Script

Much like the Tkinter-based PyMailGui client program we met in Chapter 11, this page provides fields for entering common header values as well as the text of the message itself. The "From" field is prefilled with a string imported from a module called mailconfig. As we'll discuss in a moment, that module lives in another directory on the server in this system, but its contents are the same as in the PyMailGui example. When we click the Send button of the edit page, Example 13-4 runs on the server.

Example 13-4. PP2E\Internet\Cgi-Web\PyMailCgi\onSendSubmit.cgi
#!/usr/bin/python
# On submit in edit window--finish a write, reply, or forward

import cgi, smtplib, time, string, commonhtml
#commonhtml.dumpstatepage(0)
form = cgi.FieldStorage(  )                      # parse form input data

# server name from module or get-style url
smtpservername = commonhtml.getstandardsmtpfields(form)

# parms assumed to be in form or url here
from commonhtml import getfield                # fetch value attributes
From = getfield(form, 'From')                  # empty fields may not be sent
To   = getfield(form, 'To')
Cc   = getfield(form, 'Cc')
Subj = getfield(form, 'Subject')
text = getfield(form, 'text')

# caveat: logic borrowed from PyMailGui
date  = time.ctime(time.time(  ))
Cchdr = (Cc and 'Cc: %s\n' % Cc) or ''
hdrs  = ('From: %s\nTo: %s\n%sDate: %s\nSubject: %s\n' 
                 % (From, To, Cchdr, date, Subj))
hdrs  = hdrs + 'X-Mailer: PyMailCgi Version 1.0 (Python)\n'

Ccs = (Cc and string.split(Cc, ';')) or []     # some servers reject ['']
Tos = string.split(To, ';') + Ccs              # cc: hdr line, and To list
Tos = map(string.strip, Tos)                   # some addrs can have ','s

try:                                              # smtplib may raise except
    server = smtplib.SMTP(smtpservername)         # or return failed Tos dict
    failed = server.sendmail(From, Tos, hdrs + text)
    server.quit(  )
except:
    commonhtml.errorpage('Send mail error')
else:
    if failed:
        errInfo = 'Send mail error\nFailed recipients:\n' + str(failed)
        commonhtml.errorpage(errInfo)
    else:
        commonhtml.confirmationpage('Send mail')

This script gets mail header and text input information from the edit page's form (or from parameters in an explicit URL), and sends the message off using Python's standard smtplib module. We studied smtplib in depth in Chapter 11, so I won't say much more about it now. In fact, the send mail code here looks much like that in PyMailGui (despite what I've told you about code reuse; this code would be better made a utility).

A utility in commonhtml ultimately fetches the name of the SMTP server to receive the message from either the mailconfig module or the script's inputs (in a form field or URL parameter). If all goes well, we're presented with a generated confirmation page, as in Figure 13-4.

Figure 13-4. PyMailCgi send confirmation page

figs/ppy2_1304.gif

Notice that there are no usernames or passwords to be found here; as we saw in Chapter 11, SMTP requires only a server that listens on the SMTP port, not a user account or password. As we also saw in that chapter, SMTP send operations that fail either raise a Python exception (e.g., if the server host can't be reached) or return a dictionary of failed recipients.

If there is a problem during mail delivery, we get an error page like the one shown in Figure 13-5. This page reflects a failed recipient -- the else clause of the try statement we used to wrap the send operation. On an actual exception, the Python error message and extra details would be displayed.

Figure 13-5. PyMailCgi send error page

figs/ppy2_1305.gif

Before we move on, you should know that this send mail script is also used to deliver reply and forward messages for incoming POP mail. The user interface for those operations is slightly different than for composing new email from scratch, but as in PyMailGui, the submission handler logic is the same code -- they are really just mail send operations.

It's also worth pointing out that the commonhtml module encapsulates the generation of both the confirmation and error pages, so that all such pages look the same in PyMailCgi no matter where and when they are produced. Logic that generates the mail edit page in commonhtml is reused by the reply and forward actions too (but with different mail headers).

In fact, commonhtml makes all pages look similar -- it also provides common page header (top) and footer (bottom) generation functions, which are used everywhere in the system. You may have already noticed that all the pages so far follow the same pattern: they start with a title and horizontal rule, have something unique in the middle, and end with another rule, followed by a Python icon and link at the bottom. This common look-and-feel is the product of commonhtml; it generates everything but the middle section for every page in the system (except the root page, a static HTML file).

If you are interested in seeing how this encapsulated logic works right now, flip ahead to Example 13-14. We'll explore its code after we study the rest of the mail site's pages.

13.4.2.1 Using the send mail script outside a browser

I initially wrote the send script to be used only within PyMailCgi, using values typed into the mail edit form. But as we've seen, inputs can be sent in either form fields or URL parameters; because the send mail script checks for inputs in CGI inputs before importing from the mailconfig module, it's also possible to call this script outside the edit page to send email. For instance, explicitly typing a URL of this nature into your browser (but all on one line and with no intervening spaces):

http://starship.python.net/~lutz/ 
     PyMailCgi/onSendSubmit.cgi?site=smtp.rmi.net&
                                [email protected]&
                                [email protected]&
                                Subject=test+url&
                                text=Hello+Mark;this+is+Mark

will indeed send an email message as specified by the input parameters at the end. That URL string is a lot to type into a browser's address field, of course, but might be useful if generated automatically by another script. As we saw in Chapter 11, module urllib can then be used to submit such a URL string to the server from within a Python program. Example 13-5 shows one way to do it.

Example 13-5. PP2E\Internet\Cgi-Web\PyMailCgi\sendurl.py
####################################################################
# Send email by building a URL like this from inputs:
# http://starship.python.net/~lutz/ 
#     PyMailCgi/onSendSubmit.cgi?site=smtp.rmi.net&
#                                [email protected]&
#                                [email protected]&
#                                Subject=test+url&
#                                text=Hello+Mark;this+is+Mark
####################################################################

from urllib import quote_plus, urlopen

url = 'http://starship.python.net/~lutz/PyMailCgi/onSendSubmit.cgi'
url = url + '?site=%s'    % quote_plus(raw_input('Site>'))    
url = url + '&From=%s'    % quote_plus(raw_input('From>'))    
url = url + '&To=%s'      % quote_plus(raw_input('To  >'))    
url = url + '&Subject=%s' % quote_plus(raw_input('Subj>'))    
url = url + '&text=%s'    % quote_plus(raw_input('text>'))    # or input loop

print 'Reply html:'
print urlopen(url).read(  )    # confirmation or error page html

Running this script from the system command line is yet another way to send an email message -- this time, by contacting our CGI script on a remote server machine to do all the work. Script sendurl.py runs on any machine with Python and sockets, lets us input mail parameters interactively, and invokes another Python script that lives on a remote machine. It prints HTML returned by our CGI script:

C:\...\PP2E\Internet\Cgi-Web\PyMailCgi>python sendurl.py
Site>smtp.rmi.net
From>[email protected]
To  >[email protected]
Subj>test sendurl.py
text>But sir, it's only wafer-thin...
Reply html:
<html><head><title>PyMailCgi: Confirmation page (PP2E)</title></head>
<body bgcolor="#FFFFFF"><h1>PyMailCgi Confirmation</h1><hr>
<h2>Send mail operation was successful</h2>
<p>Press the link below to return to the main page.</p>
</p><hr><a href="http://www.python.org">
<img src="../PyErrata/PythonPoweredSmall.gif"
align=left alt="[Python Logo]" border=0 hspace=15></a>
<a href="pymailcgi.html">Back to root page</a>
</body></html>

The HTML reply printed by this script would normally be rendered into a new web page if caught by a browser. Such cryptic output might be less than ideal, but you could easily search the reply string for its components to determine the result (e.g., using string.find to look for "successful"), parse out its components with Python's standard htmllib module, and so on. The resulting mail message -- viewed, for variety, with Chapter 11's PyMailGui program -- shows up in my account as seen in Figure 13-6.

Figure 13-6. sendurl.py result

figs/ppy2_1306.gif

Of course, there are other, less remote ways to send email from a client machine. For instance, the Python smtplib module itself depends only upon the client and POP server connections being operational, whereas this script also depends on the CGI server machine (requests go from client to CGI server to POP server and back). Because our CGI script supports general URLs, though, it can do more than a "mailto:" HTML tag, and can be invoked with urllib outside the context of a running web browser. For instance, scripts like sendurl.py can be used to invoke and test server-side programs.

13.5 Reading POP Email

So far, we've stepped through the path the system follows to send new mail. Let's now see what happens when we try to view incoming POP mail.

13.5.1 The POP Password Page

If you flip back to the main page in Figure 13-2, you'll see a View link; pressing it triggers the script in Example 13-6 to run on the server:

Example 13-6. PP2E\Internet\Cgi-Web\PyMailCgi\onRootViewLink.cgi
#!/usr/bin/python
##############################################################
# on view link click on main/root html page
# this could almost be a html file because there are likely
# no input params yet, but I wanted to use standard header/
# footer functions and display the site/user names which must 
# be fetched;  On submission, doesn't send the user along with
# password here, and only ever sends both as URL params or 
# hidden fields after the password has been encrypted by a 
# user-uploadable encryption module; put html in commonhtml?
##############################################################

# page template

pswdhtml = """
<form method=post action=%s/onViewPswdSubmit.cgi>
<p>
Please enter POP account password below, for user "%s" and site "%s".
<p><input name=pswd type=password>
<input type=submit value="Submit"></form></p>

<hr><p><i>Security note</i>: The password you enter above will be transmitted 
over the Internet to the server machine, but is not displayed, is never 
transmitted in combination with a username unless it is encrypted, and is 
never stored anywhere: not on the server (it is only passed along as hidden
fields in subsequent pages), and not on the client (no cookies are generated).
This is still not totally safe; use your browser's back button to back out of
PyMailCgi at any time.</p>
"""

# generate the password input page 

import commonhtml                                         # usual parms case:
user, pswd, site = commonhtml.getstandardpopfields({})    # from module here,
commonhtml.pageheader(kind='POP password input')          # from html|url later
print pswdhtml % (commonhtml.urlroot, user, site)
commonhtml.pagefooter(  )

This script is almost all embedded HTML: the triple-quoted pswdhtml string is printed, with string formatting, in a single step. But because we need to fetch the user and server names to display on the generated page, this is coded as an executable script, not a static HTML file. Module commonhtml either loads user and server names from script inputs (e.g., appended to the script's URL), or imports them from the mailconfig file; either way, we don't want to hardcode them into this script or its HTML, so an HTML file won't do.

Since this is a script, we can also make use of the commonhtml page header and footer routines to render the generated reply page with the common look-and-feel; this is shown in Figure 13-7.

Figure 13-7. PyMailCgi view password login page

figs/ppy2_1307.gif

At this page, the user is expected to enter the password for the POP email account of the user and server displayed. Notice that the actual password isn't displayed; the input field's HTML specifies type=password, which works just like a normal text field, but shows typed input as stars. (See also Example 11-6 for doing this at a console, and Example 11-23 for doing this in a GUI.)

13.5.2 The Mail Selection List Page

After filling out the last page's password field and pressing its Submit button, the password is shipped off to the script shown in Example 13-7.

Example 13-7. PP2E\Internet\Cgi-Web\PyMailCgi\onViewPswdSubmit.cgi
#!/usr/bin/python
# On submit in pop password input window--make view list

import cgi, StringIO, rfc822, string
import loadmail, commonhtml 
from   secret import encode        # user-defined encoder module
MaxHdr = 35                        # max length of email hdrs in list

# only pswd comes from page here, rest usually in module
formdata = cgi.FieldStorage(  )
mailuser, mailpswd, mailsite = commonhtml.getstandardpopfields(formdata)

try:
    newmail  = loadmail.loadnewmail(mailsite, mailuser, mailpswd)
    mailnum  = 1
    maillist = []
    for mail in newmail:
        msginfo = []
        hdrs = rfc822.Message(StringIO.StringIO(mail))
        for key in ('Subject', 'From', 'Date'):
            msginfo.append(hdrs.get(key, '?')[:MaxHdr])
        msginfo = string.join(msginfo, ' | ')
        maillist.append((msginfo, commonhtml.urlroot + '/onViewListLink.cgi', 
                                      {'mnum': mailnum,
                                       'user': mailuser,          # data params
                                       'pswd': encode(mailpswd),  # pass in url
                                       'site': mailsite}))        # not inputs
        mailnum = mailnum+1
    commonhtml.listpage(maillist, 'mail selection list')
except:
    commonhtml.errorpage('Error loading mail index')

This script's main purpose is to generate a selection list page for the user's email account, using the password typed into the prior page (or passed in a URL). As usual with encapsulation, most of the details are hidden in other files:

The maillist list built here is used to create the body of the next page -- a clickable email message selection list. Each generated hyperlink in the list page references a constructed URL that contains enough information for the next script to fetch and display a particular email message.

If all goes well, the mail selection list page HTML generated by this script is rendered as in Figure 13-8. If you get as much email as I do, you'll probably need to scroll down to see the end of this page. It looks like Figure 13-9, and follows the common look-and-feel for all PyMailCgi pages, thanks to commonhtml.

Figure 13-8. PyMailCgi view selection list page, top

figs/ppy2_1308.gif

Figure 13-9. PyMailCgi view selection list page, bottom

figs/ppy2_1309.gif

If the script can't access your email account (e.g., because you typed the wrong password), then its try statement handler instead produces a commonly formatted error page. Figure 13-10 shows one that gives the Python exception and details as part of the reply after a genuine exception is caught.

Figure 13-10. PyMailCgi login error page

figs/ppy2_1310.gif

13.5.2.1 Passing state information in URL link parameters

The central mechanism at work in Example 13-7 is the generation of URLs that embed message numbers and mail account information. Clicking on any of the View links in the selection list triggers another script, which uses information in the link's URL parameters to fetch and display the selected email. As mentioned in the prior chapter, because the list's links are effectively programmed to "know" how to load a particular message, it's not too far-fetched to refer to them as smart links -- URLs that remember what to do next. Figure 13-11 shows part of the HTML generated by this script.

Figure 13-11. PyMailCgi view list, generated HTML

figs/ppy2_1311.gif

Did you get all that? You may not be able to read generated HTML like this, but your browser can. For the sake of readers afflicted with human parsing limitations, here is what one of those link lines looks like, reformatted with line breaks and spaces to make it easier to understand:

<tr><th><ahref="http://starship.python.net/~lutz/
                      PyMailCgi/onViewListLink.cgi
                                       ?user=lutz&
                                        mnum=66&
                                        pswd=%8cg%c2P%1e%f3%5b%c5J%1c%f0&
                                        site=pop.rmi.net">View</a> 66
<td>test sendurl.py | [email protected] | Mon Jun  5 17:51:11 2000

PyMailCgi generates fully specified URLs (with server and pathname values imported from a common module). Clicking on the word "View" in the hyperlink rendered from this HTML code triggers the onViewListLink script as usual, passing it all the parameters embedded at the end of the URL: POP username, the POP message number of the message associated with this link, and POP password and site information. These values will be available in the object returned by cgi.FieldStorage in the next script run. Note that the mnum POP message number parameter differs in each link because each opens a different message when clicked, and that the text after <td> comes from message headers extracted with the rfc822 module.

The commonhtml module escapes all of the link parameters with the urllib module, not cgi.escape, because they are part of a URL. This is obvious only in the pswd password parameter -- its value has been encrypted, but urllib additionally escapes non-safe characters in the encrypted string per URL convention (that's where all those %xx come from). It's okay if the encryptor yields odd -- even non-printable -- characters, because URL encoding makes them legible for transmission. When the password reaches the next script, cgi.FieldStorage undoes URL escape sequences, leaving the encrypted password string without % escapes.

It's instructive to see how commonhtml builds up the smart link parameters. Earlier, we learned how to use the urllib.quote_plus call to escape a string for inclusion in URLs:

>>> import urllib
>>> urllib.quote_plus("There's bugger all down here on Earth")
'There%27s+bugger+all+down+here+on+Earth'

Module commonhtml, though, calls the higher-level urllib.urlencode function, which translates a dictionary of name:value pairs into a complete URL parameter string, ready to add after a ? marker in a URL. For instance, here is urlencode in action at the interactive prompt:

>>> parmdict = {'user': 'Brian',
...             'pswd': '#!/spam',
...             'text': 'Say no more, squire!'}

>>> urllib.urlencode(parmdict)
'pswd=%23%21/spam&user=Brian&text=Say+no+more,+squire%21'

>>> "%s?%s" % ("http://scriptname.cgi", urllib.urlencode(parmdict))
'http://scriptname.cgi?pswd=%23%21/spam&user=Brian&text=Say+no+more,+squire%21'

Internally, urlencode passes each name and value in the dictionary to the built-in str function (to make sure they are strings) and then runs each one through urllib.quote_plus as they are added to the result. The CGI script builds up a list of similar dictionaries and passes it to commonhtml to be formatted into a selection list page.[1]

In broader terms, generating URLs with parameters like this is one way to pass state information to the next script (along with databases and hidden form input fields, discussed later). Without such state information, the user would have to re-enter the username, password, and site name on every page they visit along the way. We'll use this technique again in the next case study, to generate links that "know" how to fetch a particular database record.

Incidentally, the list generated by this script is not radically different in functionality from what we built in the PyMailGui program of Chapter 11. Figure 13-12 shows this strictly client-side GUI's view on the same email list displayed in Figures Figure 13-8 and Figure 13-9.

Figure 13-12. PyMailGui displaying the same view list

figs/ppy2_1312.gif

However, PyMailGui uses the Tkinter GUI library to build up a user interface instead of sending HTML to a browser. It also runs entirely on the client and downloads mail from the POP server to the client machine over sockets on demand. In contrast, PyMailCgi runs on the server machine and simply displays mail text on the client's browser -- mail is downloaded from the POP server machine to the starship server, where CGI scripts are run. These architecture differences have some important ramifications, which we'll discuss in a few moments.

13.5.2.2 Security protocols

In onViewPswdSubmit's source code (Example 13-7), notice that password inputs are passed to an encode function as they are added to the parameters dictionary, and hence show up encrypted in hyperlink URLs. They are also URL-encoded for transmission (with % escapes) and are later decoded and decrypted within other scripts as needed to access the POP account. The password encryption step, encode, is at the heart of PyMailCgi's security policy.

Beginning in Python 1.6, the standard socket module will include optional support for OpenSSL, an open source implementation of secure sockets that prevents transmitted data from being intercepted by eavesdroppers on the Net. Unfortunately, this example was developed under Python 1.5.2 and runs on a server whose Python did not have secure socket support built in, so an alternative scheme was devised to minimize the chance that email account information could be stolen off the Net in transit.

Here's how it works. When this script is invoked by the password input page's form, it gets only one input parameter: the password typed into the form. The username is imported from a mailconfig module installed on the server instead of transmitted together with the unencrypted password (that would be much too easy for malicious users to intercept).

To pass the POP username and password to the next page as state information, this script adds them to the end of the mail selection list URLs, but only after the password has been encrypted by secret.encode -- a function in a module that lives on the server and may vary in every location that PyMailCgi is installed. In fact, PyMailCgi was written to not have to know about the password encryptor at all; because the encoder is a separate module, you can provide any flavor you like. Unless you also publish your encoder module, the encoded password shipped with the username won't be of much help to snoopers.

That upshot is that normally, PyMailGui never sends or receives both user and password values together in a single transaction unless the password is encrypted with an encryptor of your choice. This limits its utility somewhat (since only a single account username can be installed on the server), but the alternative of popping up two pages -- one for password entry and one for user -- is even more unfriendly. In general, if you want to read your mail with the system as coded, you have to install its files on your server, tweak its mailconfig.py to reflect your account details, and change its secret.py encryptor as desired.

One exception: since any CGI script can be invoked with parameters in an explicit URL instead of form field values, and since commonhtml tries to fetch inputs from the form object before importing them from mailconfig, it is possible for any person to use this script to check his or her mail without installing and configuring a copy of PyMailCgi. For example, a URL like the following (but without the linebreak used to make it fit here):

http://starship.python.net/~lutz/PyMailCgi/
 onViewPswdSubmit.cgi?user=lutz&pswd=asif&site=pop.rmi.net

will actually load email into a selection list using whatever user, password, and mail site names are appended. From the selection list, you may then view, reply, forward, and delete email. Notice that at this point in the interaction, the password you send in a URL of this form is not encrypted. Later scripts expect that the password inputs will be sent encrypted, though, which makes it more difficult to use them with explicit URLs (you would need to match the encrypted form produced by the secret module on the server). Passwords are encrypted as they are added to links in the reply page's selection list, and remain encrypted in URLs and hidden form fields thereafter.

But please don't use a URL like this, unless you don't care about exposing your email password. Really. Sending both your unencrypted mail user ID and password strings across the Net in a URL like this is extremely unsafe and wide open to snoopers. In fact, it's like giving them a loaded gun -- anyone who intercepts this URL will have complete access to your email account. It is made even more treacherous by the fact that this URL format appears in a book that will be widely distributed all around the world.

If you care about security and want to use PyMailCgi, install it on your own server and configure mailconfig and secret. That should at least guarantee that your user and password information will never both be transmitted unencrypted in a single transaction. This scheme still is not foolproof, so be careful out there, folks. Without secure sockets, the Internet is a "use at your own risk" medium.

13.5.3 The Message View Page

Back to our page flow. At this point, we are still viewing the message selection list in Figure 13-8. When we click on one of its generated hyperlinks, the smart URL invokes the script in Example 13-8 on the server, sending the selected message number and mail account information (user, password, and site) as parameters on the end of the script's URL.

Example 13-8. PP2E\Internet\Cgi-Web\PyMailCgi\onViewListLink.cgi
#!/usr/bin/python
############################################################
# On user click of message link in main selection list;
# cgi.FieldStorage undoes any urllib escapes in the link's
# input parameters (%xx and '+' for spaces already undone);
############################################################

import cgi, rfc822, StringIO
import commonhtml, loadmail
from secret import decode
#commonhtml.dumpstatepage(0)

form = cgi.FieldStorage(  )
user, pswd, site = commonhtml.getstandardpopfields(form)
try:
    msgnum   = form['mnum'].value                               # from url link
    newmail  = loadmail.loadnewmail(site, user, decode(pswd))
    textfile = StringIO.StringIO(newmail[int(msgnum) - 1])      # don't eval!
    headers  = rfc822.Message(textfile)
    bodytext = textfile.read(  )
    commonhtml.viewpage(msgnum, headers, bodytext, form)        # encoded pswd
except: 
    commonhtml.errorpage('Error loading message')

Again, most of the work here happens in the loadmail and commonhtml modules, which are listed later in this section (Example 13-12 and Example 13-14). This script adds logic to decode the input password (using the configurable secret encryption module) and extract the selected mail's headers and text using the rfc822 and StringIO modules, just as we did in Chapter 11.[2]

If the message can be loaded and parsed successfully, the result page (shown in Figure 13-13) allows us to view, but not edit, the mail's text. The function commonhtml.viewpage generates a "read-only" HTML option for all the text widgets in this page.

Figure 13-13. PyMailCgi view page

figs/ppy2_1313.gif

View pages like this have a pull-down action selection list near the bottom; if you want to do more, use this list to pick an action (Reply, Forward, or Delete), and click on the Next button to proceed to the next screen. If you're just in a browsing frame of mind, click the "Back to root page" link at the bottom to return to the main page, or use your browser's Back button to return to the selection list page.

13.5.3.1 Passing state information in HTML hidden input fields

What you don't see on the view page in Figure 13-13 is just as important as what you do. We need to refer to Example 13-14 for details, but there's something new going on here. The original message number, as well as the POP user and (still encrypted) password information sent to this script as part of the smart link's URL, wind up being copied into the HTML used to create this view page, as the values of "hidden" input fields in the form. The hidden field generation code in commonhtml looks like this:

    print '<form method=post action="%s/onViewSubmit.cgi">' % urlroot
    print '<input type=hidden name=mnum value="%s">' % msgnum
    print '<input type=hidden name=user value="%s">' % user     # from page|url
    print '<input type=hidden name=site value="%s">' % site     # for deletes
    print '<input type=hidden name=pswd value="%s">' % pswd     # pswd encoded

Much like parameters in generated hyperlink URLs, hidden fields in a page's HTML allow us to embed state information inside this web page itself. Unless you view that page's source, you can't see this state information, because hidden fields are never displayed. But when this form's Submit button is clicked, hidden field values are automatically transmitted to the next script along with the visible fields on the form.

Figure 13-14 shows the source code generated for a different message's view page; the hidden input fields used to pass selected mail state information are embedded near the top.

Figure 13-14. PyMailCgi view page, generated HTML

figs/ppy2_1314.gif

The net effect is that hidden input fields in HTML, just like parameters at the end of generated URLs, act like temporary storage areas and retain state between pages and user interaction steps. Both are the Web's equivalent to programming language variables. They come in handy any time your application needs to remember something between pages.

Hidden fields are especially useful if you cannot invoke the next script from a generated URL hyperlink with parameters. For instance, the next action in our script is a form submit button (Next), not a hyperlink, so hidden fields are used to pass state. As before, without these hidden fields, users would need to re-enter POP account details somewhere on the view page if they were needed by the next script (in our example, they are required if the next action is Delete).

13.5.3.2 Escaping mail text and passwords in HTML

Notice that everything you see on the message view page in Figure 13-13 is escaped with cgi.escape. Header fields and the text of the mail itself might contain characters that are special to HTML and must be translated as usual. For instance, because some mailers allow you to send messages in HTML format, it's possible that an email's text could contain a </textarea> tag, which would throw the reply page hopelessly out of sync if not escaped.

One subtlety here: HTML escapes are important only when text is sent to the browser initially (by the CGI script). If that text is later sent out again to another script (e.g., by sending a reply), the text will be back in its original, non-escaped format when received again on the server. The browser parses out escape codes and does not put them back again when uploading form data, so we don't need to undo escapes later. For example, here is part of the escaped text area sent to a browser during a Reply transaction (use your browser's View Source option to see this live):

<tr><th align=right>Text:
<td><textarea name=text cols=80 rows=10 readonly>
more stuff

--Mark Lutz  (http://rmi.net/~lutz)  [PyMailCgi 1.0]


&gt; -----Original Message-----
&gt; From: [email protected]
&gt; To: [email protected]
&gt; Date: Tue May  2 18:28:41 2000
&gt; 
&gt; &lt;table&gt;&lt;textarea&gt;
&gt; &lt;/textarea&gt;&lt;/table&gt;
&gt; --Mark Lutz  (http://rmi.net/~lutz)  [PyMailCgi 1.0]
&gt; 
&gt; 
&gt; &gt; -----Original Message-----

After this reply is delivered, its text looks as it did before escapes (and exactly as it appeared to the user in the message edit web page):

more stuff

--Mark Lutz  (http://rmi.net/~lutz)  [PyMailCgi 1.0]


> -----Original Message-----
> From: [email protected]
> To: [email protected]
> Date: Tue May  2 18:28:41 2000
> 
> <table><textarea>
> </textarea></table>
> --Mark Lutz  (http://rmi.net/~lutz)  [PyMailCgi 1.0]
> 
> 
> > -----Original Message-----

Did you notice the odd characters in the hidden password field of the generated HTML screen shot (Figure 13-14)? It turns out that the POP password is still encrypted when placed in hidden fields of the HTML. For security, they have to be: values of a page's hidden fields can be seen with a browser's View Source option, and it's not impossible that the text of this page could be intercepted off the Net.

The password is no longer URL-encoded when put in the hidden field, though, even though it was when it appeared at the end of the smart link URL. Depending on your encryption module, the password might now contain non-printable characters when generated as a hidden field value here; the browser doesn't care, as long as the field is run through cgi.escape like everything else added to the HTML reply stream. The commonhtml module is careful to route all text and headers through cgi.escape as the view page is constructed.

As a comparison, Figure 13-15 shows what the mail message captured in Figure 13-13 looks like when viewed in PyMailGui, the client-side Tkinter-based email tool from Chapter 11. PyMailGui doesn't need to care about things like passing state in URLs or hidden fields (it saves state in Python variables) or escaping HTML and URL strings (there are no browsers, and no network transmission steps once mail is downloaded). It does require Python to be installed on the client, but we'll get into that in a few pages.

Figure 13-15. PyMailGui viewer, same message

figs/ppy2_1315.gif

13.5.4 The Message Action Pages

At this point in our hypothetical PyMailCgi web interaction, we are viewing an email message (Figure 13-13) that was chosen from the selection list page. On the message view page, selecting an action from the pull-down list and clicking the Next button invokes the script in Example 13-9 on the server to perform a reply, forward, or delete operation for the selected message.

Example 13-9. PP2E\Internet\Cgi-WebPyMaiCgi\onViewSubmit.cgi
#!/usr/bin/python
# On submit in mail view window, action selected=(fwd, reply, delete)

import cgi, string
import commonhtml, secret
from   externs import pymail, mailconfig
from   commonhtml import getfield

def quotetext(form):
    """
    note that headers come from the prior page's form here,
    not from parsing the mail message again; that means that 
    commonhtml.viewpage must pass along date as a hidden field
    """ 
    quoted = '\n-----Original Message-----\n'
    for hdr in ('From', 'To', 'Date'):
        quoted = quoted + '%s: %s\n' % (hdr, getfield(form, hdr))
    quoted = quoted + '\n' +   getfield(form, 'text')
    quoted = '\n' + string.replace(quoted, '\n', '\n> ')
    return quoted

form = cgi.FieldStorage(  )  # parse form or url data
user, pswd, site = commonhtml.getstandardpopfields(form)

try:
    if form['action'].value   == 'Reply':
        headers = {'From':    mailconfig.myaddress,
                   'To':      getfield(form, 'From'),
                   'Cc':      mailconfig.myaddress,
                   'Subject': 'Re: ' + getfield(form, 'Subject')}
        commonhtml.editpage('Reply', headers, quotetext(form))

    elif form['action'].value == 'Forward':
        headers = {'From':    mailconfig.myaddress,
                   'To':      '',
                   'Cc':      mailconfig.myaddress,
                   'Subject': 'Fwd: ' + getfield(form, 'Subject')}
        commonhtml.editpage('Forward', headers, quotetext(form))

    elif form['action'].value == 'Delete':
        msgnum = int(form['mnum'].value)       # or string.atoi, but not eval(  )
        commonhtml.runsilent(                  # mnum field is required here
            pymail.deletemessages,
                (site, user, secret.decode(pswd), [msgnum], 0) )
        commonhtml.confirmationpage('Delete')

    else:
       assert 0, 'Invalid view action requested'
except:
    commonhtml.errorpage('Cannot process view action')

This script receives all information about the selected message as form input field data (some hidden, some not) along with the selected action's name. The next step in the interaction depends upon the action selected:

All these actions use data passed in from the prior page's form, but only the Delete action cares about the POP username and password and must decode the password received (it arrives here from hidden form input fields generated in the prior page's HTML).

13.5.4.1 Reply and forward

If you select Reply as the next action, the message edit page in Figure 13-16 is generated by the script. Text on this page is editable, and pressing this page's Send button again triggers the send mail script we saw in Example 13-4. If all goes well, we'll receive the same confirmation page we got earlier when writing new mail from scratch (Figure 13-4).

Figure 13-16. PyMailCgi reply page

figs/ppy2_1316.gif

Forward operations are virtually the same, except for a few email header differences. All of this busy-ness comes "for free," because Reply and Forward pages are generated by calling commonhtml.editpage, the same utility used to create a new mail composition page. Here, we simply pass the utility preformatted header line strings (e.g., replies add "Re:" to the subject text). We applied the same sort of reuse trick in PyMailGui, but in a different context. In PyMailCgi, one script handles three pages; in PyMailGui, one callback function handles three buttons, but the architecture is similar.

13.5.4.2 Delete

Selecting the Delete action on a message view page and pressing Next will cause the onViewSubmit script to immediately delete the message being viewed. Deletions are performed by calling a reusable delete utility function coded in Chapter 11; the call to the utility is wrapped in a commonhtml.runsilent call that prevents print statements in the utility from showing up in the HTML reply stream (they are just status messages, not HTML code). Figure 13-17 shows a delete operation in action.

Figure 13-17. PyMailCgi view page, delete selected

figs/ppy2_1317.gif

As mentioned, Delete is the only action that uses the POP account information (user, password, and site) that was passed in from hidden fields on the prior (message view) page. By contrast, the Reply and Forward actions format an edit page, which ultimately sends a message to the SMTP server; no POP information is needed or passed. But at this point in the interaction, the POP password has racked up more than a few frequent flyer miles. In fact, it may have crossed phone lines, satellite links, and continents on its journey from machine to machine. This process is illustrated here:

  1. Input (Client): The password starts life by being typed into the login page on the client (or being embedded in an explicit URL), unencrypted. If typed into the input form in a web browser, each character is displayed as a star (*).

  2. Load index (Client to CGI server to POP server): It is next passed from the client to the CGI server, which sends it on to your POP server in order to load a mail index. The client sends only the password, unencrypted.

  3. List page URLs (CGI server to client): To direct the next script's behavior, the password is embedded in the mail selection list web page itself as hyperlink URL parameters, encrypted and URL-encoded.

  4. Load message (Client to CGI server to POP server): When an email is selected from the list, the password is sent to the next script within the script's URL; the CGI script decrypts it and passes it on to the POP server to fetch the selected message.

  5. View page fields (CGI server to client): To direct the next script's behavior, the password is embedded in the view page itself as HTML hidden input fields, encrypted and HTML-escaped.

  6. Delete (Client to CGI server to POP server): Finally, the password is again passed from client to CGI server, this time as hidden form field values; the CGI script decrypts it and passes it to the POP server to delete.

Along the way, scripts have passed the password between pages as both a URL parameter and an HTML hidden input field; either way, they have always passed its encrypted string, and never passed an unencrypted password and username together in any transaction. Upon a Delete request, the password must be decoded here using the secret module before passing it to the POP server. If the script can access the POP server again and delete the selected message, another confirmation page appears, as shown in Figure 13-18.

Figure 13-18. PyMailCgi delete confirmation

figs/ppy2_1318.gif

Note that you really should click "Back to root page" after a successful deletion -- don't use your browser's Back button to return to the message selection list at this point, because the delete has changed the relative numbers of some messages in the list. PyMilGui worked around this problem by only deleting on exit, but PyMailCgi deletes mail immediately since there is no notion of "on exit." Clicking on a view link in an old selection list page may not bring up the message you think it should, if it comes after a message that was deleted.

This is a property of POP email in general: incoming mail simply adds to the mail list with higher message numbers, but deletions remove mail from arbitrary locations in the list and hence change message numbers for all mail following the ones deleted. Even PyMailGui may get some message numbers wrong if mail is deleted by another program while the GUI is open (e.g., in a second PyMailGui instance). Alternatively, both mailers could delete all email off the server as soon as it is downloaded, such that deletions wouldn't impact POP identifiers (Microsoft Outlook uses this scheme, for instance), but this requires additional mechanisms for storing deleted email persistently for later access.

One subtlety: for replies and forwards, the onViewSubmit mail action script builds up a > -quoted representation of the original message, with original "From:", "To:", and "Date:" header lines prepended to the mail's original text. Notice, though, that the original message's headers are fetched from the CGI form input, not by reparsing the original mail (the mail is not readily available at this point). In other words, the script gets mail header values from the form input fields of the view page. Because there is no "Date" field on the view page, the original message's date is also passed along to the action script as a hidden input field to avoid reloading the message. Try tracing through the code in this chapter's listings to see if you can follow dates from page to page.

13.6 Utility Modules

This section presents the source code of the utility modules imported and used by the page scripts shown above. There aren't any new screen shots to see here, because these are utilities, not CGI scripts (notice their .py extensions). Moreover, these modules aren't all that useful to study in isolation, and are included here primarily to be referenced as you go through the CGI scripts' code. See earlier in this chapter for additional details not repeated here.

13.6.1 External Components

When I install PyMailCgi and other server-side programs shown in this book, I simply upload the contents of the Cgi-Web examples directory on my laptop to the top-level web directory on my server account (public_html ). The Cgi-Web directory also lives on this book's CD (see http://examples.oreilly.com/python2), a mirror of the one on my PC. I don't copy the entire book examples distribution to my web server, because code outside the Cgi-Web directory isn't designed to run on a web server.

When I first installed PyMailCgi, however, I ran into a problem: it's written to reuse modules coded in other parts of the book, and hence in other directories outside Cgi-Web. For example, it reuses the mailconfig and pymail modules we wrote in Chapter 11, but neither lives in the CGI examples directory. Such external dependencies are usually okay, provided we use package imports or configure sys.path appropriately on startup. In the context of CGI scripts, though, what lives on my development machine may not be what is available on the web server machine where the scripts are installed.

To work around this (and avoid uploading the full book examples distribution to my web server), I define a directory at the top-level of Cgi-Web called Extern, to which any required external modules are copied as needed. For this system, Extern includes a subdirectory called Email, where the mailconfig and pymail modules are copied for upload to the server.

Redundant copies of files are less than ideal, but this can all be automated with install scripts that automatically copy to Extern and then upload Cgi-Web contents via FTP using Python's ftplib module (discussed in Chapter 11). Just in case I change this structure, though, I've encapsulated all external name accesses in the utility module in Example 13-10.

Example 13-10. PP2E\Internet\Cgi-Web\PyMailCgi\externs.py
##############################################################
# Isolate all imports of modules that live outside of the
# PyMailCgi PyMailCgi directory.  Normally, these would come
# from PP2E.Internet.Email, but when I install PyMailCgi, 
# I copy just the Cgi-Web directory's contents to public_html
# on the server, so there is no PP2E directory on the server.
# Instead, I either copy the imports referenced in this file to 
# the PyMailCgi parent directory, or tweak the dir appended to
# the sys.path module search path here.  Because all other 
# modules get the externals from here, there is only one place
# to change when they are relocated.  This may be arguably
# gross, but I only put Internet code on the server machine.
##############################################################

import sys
sys.path.append('..')                 # see dir where Email installed on server
from  Extern import Email             # assumes a ../Extern dir with Email dir
from  Extern.Email import pymail      # can use names Email.pymail or pymail
from  Extern.Email import mailconfig

This module appends the parent directory of PyMailCgi to sys.path to make the Extern directory visible (remember, PYTHONPATH might be anything when CGI scripts are run as user "nobody") and preimports all external names needed by PyMailCgi into its own namespace. It also supports future changes; because all external references in PyMailCgi are made through this module, I have to change only this one file if externals are later installed differently.

As a reference, Example 13-11 lists part of the external mailconfig module again. For PyMailCgi, it's copied to Extern, and may be tweaked as desired on the server (for example, the signature string differs slightly in this context). See the pymail.py file in Chapter 11, and consider writing an automatic copy-and-upload script for the Cgi-Web\Extern directory a suggested exercise; it's not proved painful enough to compel me to write one of my own.

Example 13-11. PP2E\Internet\Cgi-Web\Extern\Email\mailconfig.py
############################################
# email scripts get server names from here:
# change to reflect your machine/user names;
# could get these in command line instead
############################################

# SMTP email server machine (send)
smtpservername = 'smtp.rmi.net'          # or starship.python.net, 'localhost'

# POP3 email server machine, user (retrieve)
popservername  = 'pop.rmi.net'           # or starship.python.net, 'localhost'
popusername    = 'lutz'                  # password is requested when run

...rest omitted

# personal info used by PyMailGui to fill in forms;
# sig-- can be a triple-quoted block, ignored if empty string;
# addr--used for initial value of "From" field if not empty,

myaddress   = '[email protected]'
mysignature = '--Mark Lutz  (http://rmi.net/~lutz)  [PyMailCgi 1.0]'

13.6.2 POP Mail Interface

The loadmail utility module in Example 13-12 depends on external files and encapsulates access to mail on the remote POP server machine. It currently exports one function, loadnewmail, which returns a list of all mail in the specified POP account; callers are unaware of whether this mail is fetched over the Net, lives in memory, or is loaded from a persistent storage medium on the CGI server machine. That is by design -- loadmail changes won't impact its clients.

Example 13-12. PP2E\Internet\Cgi-Web\PyMailCgi\loadmail.py
###################################################################
# mail list loader; future--change me to save mail list between
# cgi script runs, to avoid reloading all mail each time; this
# won't impact clients that use the interfaces here if done well;
# for now, to keep this simple, reloads all mail on each operation
###################################################################

from commonhtml import runsilent         # suppress print's (no verbose flag)
from externs    import Email

# load all mail from number 1 up
# this may trigger an exception

def loadnewmail(mailserver, mailuser, mailpswd):
    return runsilent(Email.pymail.loadmessages,
                                  (mailserver, mailuser, mailpswd))

It's not much to look at -- just an interface and calls to other modules. The Email.pymail.loadmessages function (reused here from Chapter 11) uses the Python poplib module to fetch mail over sockets. All this activity is wrapped in a commonhtml.runsilent function call to prevent pymail print statements from going to the HTML reply stream (although any pymail exceptions are allowed to propagate normally).

As it is, though, loadmail loads all incoming email to generate the selection list page, and reloads all email again every time you fetch a message from the list. This scheme can be horribly inefficient if you have lots of email sitting on your server; I've noticed delays on the order of a dozen seconds when my mailbox is full. On the other hand, servers can be slow in general, so the extra time taken to reload mail isn't always significant; I've witnessed similar delays on the server for empty mailboxes and simple HTML pages too.

More importantly, loadmail is intended only as a first-cut mail interface -- something of a usable prototype. If I work on this system further, it would be straightforward to cache loaded mail in a file, shelve, or database on the server, for example. Because the interface exported by loadmail would not need to change to introduce a caching mechanism, clients of this module would still work. We'll explore server storage options in the next chapter.

13.6.3 POP Password Encryption

Time to call the cops. We discussed the approach to password security adopted by PyMailCgi earlier. In brief, it works hard to avoid ever passing the POP account username and password across the Net together in a single transaction, unless the password is encrypted according to module secret.py on the server. This module can be different everywhere PyMailCgi is installed and can be uploaded anew at any time -- encrypted passwords aren't persistent and live only for the duration of one mail-processing interaction session.[3] Example 13-13 is the encryptor module I installed on my server while developing this book.

Example 13-13. PP2E\Internet\Cgi-Web\PyMailCgi\secret.py
###############################################################################
# PyMailCgi encodes the pop password whenever it is sent to/from client over
# the net with a user name as hidden text fields or explicit url params; uses 
# encode/decode functions in this module to encrypt the pswd--upload your own
# version of this module to use a different encryption mechanism; pymail also 
# doesn't save the password on the server, and doesn't echo pswd as typed, but
# this isn't 100% safe--this module file itself might be vulnerable to some
# malicious users; Note: in Python 1.6, the socket module will include standard
# (but optional) support for openSSL sockets on the server, for programming 
# secure Internet transactions in Python; see 1.6 socket module docs;
###############################################################################

forceReadablePassword = 0 
forceRotorEncryption  = 1

import time, string
dayofweek = time.localtime(time.time(  ))[6]

###############################################################################
# string encoding schemes
###############################################################################

if not forceReadablePassword:
    # don't do anything by default: the urllib.quote or
    # cgi.escape calls in commonhtml.py will escape the 
    # password as needed to embed in in URL or HTML; the 
    # cgi module undoes escapes automatically for us;

    def stringify(old):   return old
    def unstringify(old): return old

else:
    # convert encoded string to/from a string of digit chars,
    # to avoid problems with some special/nonprintable chars,
    # but still leave the result semi-readable (but encrypted);
    # some browser had problems with escaped ampersands, etc.;

    separator = '-'

    def stringify(old):
        new = ''
        for char in old:
            ascii = str(ord(char)) 
            new   = new + separator + ascii       # '-ascii-ascii-ascii'
        return new

    def unstringify(old):
        new = ''
        for ascii in string.split(old, separator)[1:]:
            new = new + chr(int(ascii))
        return new 

###############################################################################
# encryption schemes
###############################################################################

if (not forceRotorEncryption) and (dayofweek % 2 == 0):
    # use our own scheme on evenly-numbered days (0=monday)
    # caveat: may fail if encode/decode over midnite boundary
 
    def do_encode(pswd):
        res = ''
        for char in pswd:
            res = res + chr(ord(char) + 1)        # add 1 to each ascii code
        return str(res)

    def do_decode(pswd):
        res = ''
        for char in pswd:
            res = res + chr(ord(char) - 1)
        return res

else:
    # use the standard lib's rotor module to encode pswd
    # this does a better job of encryption than code above

    import rotor
    mykey = 'pymailcgi'

    def do_encode(pswd):
        robj = rotor.newrotor(mykey)              # use enigma encryption
        return robj.encrypt(pswd)        
    
    def do_decode(pswd):
        robj = rotor.newrotor(mykey)
        return robj.decrypt(pswd)        

###############################################################################
# top-level entry points
###############################################################################

def encode(pswd):
    return stringify(do_encode(pswd))             # encrypt plus string encode

def decode(pswd):
    return do_decode(unstringify(pswd))

This encryptor module implements two alternative encryption schemes: a simple ASCII character code mapping, and Enigma-style encryption using the standard rotor module. The rotor module implements a sophisticated encryption strategy, based on the "Enigma" encryption machine used by the Nazis to encode messages during World War II. Don't panic, though; Python's rotor module is much less prone to cracking than the Nazis'!

In addition to encryption, this module also implements an encoding method for already-encrypted strings. By default, the encoding functions do nothing, and the system relies on straight URL encoding. An optional encoding scheme translates the encrypted string to a string of ASCII code digits separated by dashes. Either encoding method makes non-printable characters in the encrypted string printable.

13.6.3.1 Default encryption scheme: rotor

To illustrate, let's test this module's tools interactively. First off, we'll experiment with Python's standard rotor module, since it's at the heart of the default encoding scheme. We import the module, make a new rotor object with a key (and optionally, a rotor count), and call methods to encrypt and decrypt:

C:\...\PP2E\Internet\Cgi-Web\PyMailCgi>python
>>> import rotor
>>> r = rotor.newrotor('pymailcgi')        # (key, [,numrotors])
>>> r.encrypt('abc123')                    # may return non-printable chars
' \323an\021\224'

>>> x = r.encrypt('spam123')               # result is same len as input
>>> x
'* _\344\011pY'
>>> len(x)
7
>>> r.decrypt(x)
'spam123'

Notice that the same rotor object can encrypt multiple strings, that the result may contain non-printable characters (printed as \ascii escape codes when displayed, possibly in octal form), and that the result is always the same length as the original string. Most importantly, a string encrypted with rotor can be decrypted in a different process (e.g., in a later CGI script) if we recreate the rotor object:

C:\...\PP2E\Internet\Cgi-Web\PyMailCgi>python
>>> import rotor
>>> r = rotor.newrotor('pymailcgi')        # can be decrypted in new process
>>> r.decrypt('* _\344\011pY')             # use "\ascii" escapes for two chars
'spam123'

Our secret module by default simply uses rotor to encrypt, and does no additional encoding of its own. It relies on URL encoding when the password is embedded in a URL parameter, and HTML escaping when the password is embedded in hidden form fields. For URLs, the following sorts of calls occur:

>>> from secret import encode, decode
>>> x = encode('abc$#<>&+')                 # CGI scripts do this (rotor)
>>> x
' \323a\016\317\326\023\0163'

>>> import urllib                           # urllib.urlencode does this
>>> y = urllib.quote_plus(x)
>>> y
'+%d3a%0e%cf%d6%13%0e3'

>>> a = urllib.unquote_plus(y)              # cgi.FieldStorage does this
>>> a
' \323a\016\317\326\023\0163'

>>> decode(a)                               # CGI scripts do this (rotor)
'abc$#<>&+' 
13.6.3.2 Alternative encryption schemes

To show how to write alternative encryptors and encoders, secret also includes a digits-string encoder and a character-code shuffling encryptor; both are enabled with global flag variables at the top of the module:

forceReadablePassword

If set to true, the encrypted password is encoded into a string of ASCII code digits separated by dashes. Defaults to false to fall back on URL and HTML escape encoding.

forceRotorEncryption

If set to false and the encryptor is used on an even-numbered day of the week, the simple character-code encryptor is used instead of rotor. Defaults to true to force rotor encryption.

To show how these alternatives work, lets's set forceReadablePassword to 1 and forceRotorEncryption to 0, and reimport. Note that these are global variables that must be set before the module is imported (or reloaded), because they control the selection of alternative def statements. Only one version of each kind of function is ever made by the module:

C:\...\PP2E\Internet\Cgi-Web\PyMailCgi>python
>>> from secret import *
>>> x = encode('abc$#<>&+')
>>> x
'-98-99-100-37-36-61-63-39-44'

>>> y = decode(x)
>>> y
'abc$#<>&+'

This really happens in two steps, though -- encryption and then encoding (the top-level encode and decode functions orchestrate the two steps). Here's what the steps look like when run separately:

>>> t = do_encode('abc$#<>&+')               # just our encryption
>>> t
"bcd%$=?',"
>>> stringify(t)                             # add our own encoding
'-98-99-100-37-36-61-63-39-44'

>>> unstringify(x)                           # undo encoding
"bcd%$=?',"
>>> do_decode(unstringify(x))                # undo both steps
'abc$#<>&+'

This alternative encryption scheme merely adds 1 to the each character's ASCII code value, and the encoder inserts the ASCII code integers of the result. It's also possible to combine rotor encryption and our custom encoding (set both forceReadablePassword and forceRotorEncryption to 1), but URL encoding provided by urllib works just as well. Here are a variety of schemes in action; secret.py is edited and saved before each reload:

>>> import secret
>>> secret.encode('spam123')          # default: rotor, no extra encoding
'* _\344\011pY'

>>> reload(secret)                    # forcereadable=1, forcerotor=0
<module 'secret' from 'secret.py'>
>>> secret.encode('spam123')
'-116-113-98-110-50-51-52'

>>> reload(secret)                    # forcereadable=1, forcerotor=1
<module 'secret' from 'secret.py'>
>>> secret.encode('spam123')
'-42-32-95-228-9-112-89'
>>> ord('Y')                          # the last one is really a 'Y'
89

>>> reload(secret)                    # back to default rotor, no stringify
<module 'secret' from 'secret.pyc'>
>>> import urllib
>>> urllib.quote_plus(secret.encode('spam123'))
'%2a+_%e4%09pY'
>>> 0x2a                              # the first is really 42, '*'
42
>>> chr(42)
'*'

You can provide any kind of encryption and encoding logic you like in a custom secret.py, as long as it adheres to the expected protocol -- encoders and decoders must receive and return a string. You can also alternate schemes by days of the week as done here (but note that this can fail if your system is being used when the clock turns over at midnight!), and so on. A few final pointers:

Other Python encryption tools

There are additional encryption tools that come with Python or are available for Python on the Web; see http://www.python.org and the library manual for details. Some encryption schemes are considered serious business and may be protected by law from export, but these rules change over time.

Secure sockets support

As mentioned, Python 1.6 (not yet out as I wrote this) will have standard support for OpenSSL secure sockets in the Python socket module. OpenSSL is an open source implementation of the secure sockets protocol (you must fetch and install it separately from Python -- see http://www.openssl.org). Where it can be used, this will provide a better and less limiting solution for securing information like passwords than the manual scheme we've adopted here.

For instance, secure sockets allow usernames and passwords to be entered into and submitted from a single web page, thereby supporting arbitrary mail readers. The best we can do without secure sockets is to either avoid mixing unencrypted user and password values and assume that some account data and encryptors live on the server (as done here), or to have two distinct input pages or URLs (one for each value). Neither scheme is as user-friendly as a secure sockets approach. Most browsers already support SSL; to add it to Python on your server, see the Python 1.6 (and beyond) library manual.

Internet security is a much bigger topic than can be addressed fully here, and we've really only scratched its surface. For additional information on security issues, consult books geared exclusively towards web programming techniques.

On my server, the secret.py file will be changed over time, in case snoopers watch the book's web site. Moreover, its source code cannot be viewed with the getfile CGI script coded in Chapter 12. That means that if you run this system live, passwords in URLs and hidden form fields may look very different than seen in this book. My password will have changed by the time you read these words too, or else it would be possible to know my password from this book alone!

13.6.4 Common Utilities Module

The file commonhtml.py, shown in Example 13-14, is the Grand Central Station of this application -- its code is used and reused by just about every other file in the system. Most of it is self-explanatory, and I've already said most of what I wanted to say about it earlier, in conjunction with the CGI scripts that use it.

I haven't talked about its debugging support, though. Notice that this module assigns sys.stderr to sys.stdout, in an attempt to force the text of Python error messages to show up in the client's browser (remember, uncaught exceptions print details to sys.stderr). That works sometimes in PyMailCgi, but not always -- the error text shows up in a web page only if a page_header call has already printed a response preamble. If you want to see all error messages, make sure you call page_header (or print Content-type: lines manually) before any other processing. This module also defines functions that dump lots of raw CGI environment information to the browser (dumpstatepage), and that wrap calls to functions that print status messages so their output isn't added to the HTML stream (runsilent).

I'll leave the discovery of any remaining magic in this code up to you, the reader. You are hereby admonished to go forth and read, refer, and reuse.

Example 13-14. PP2E\Internet\Cgi-Web\PyMailCgi\commonhtml.py
#!/usr/bin/python
#########################################################
# generate standard page header, list, and footer HTML;
# isolates html generation-related details in this file;
# text printed here goes over a socket to the client,
# to create parts of a new web page in the web browser;
# uses one print per line, instead of string blocks;
# uses urllib to escape parms in url links auto from a
# dict, but cgi.escape to put them in html hidden fields;
# some of the tools here are useful outside pymailcgi;
# could also return html generated here instead of 
# printing it, so it could be included in other pages;
# could also structure as a single cgi script that gets
# and tests a next action name as a hidden form field;
# caveat: this system works, but was largely written 
# during a 2-hour layover at the Chicago O'Hare airport:
# some components could probably use a bit of polishing;
# to run standalone on starship via a commandline, type
# "python commonhtml.py"; to run standalone via a remote
# web brower, rename file with .cgi and run fixcgi.py.
#########################################################

import cgi, urllib, string, sys
sys.stderr = sys.stdout           # show error messages in browser
from externs import mailconfig    # from a package somewhere on server

# my address root
urlroot = 'http://starship.python.net/~lutz/PyMailCgi'

def pageheader(app='PyMailCgi', color='#FFFFFF', kind='main', info=''):
    print 'Content-type: text/html\n'
    print '<html><head><title>%s: %s page (PP2E)</title></head>' % (app, kind)
    print '<body bgcolor="%s"><h1>%s %s</h1><hr>' % (color, app, (info or kind))

def pagefooter(root='pymailcgi.html'):
    print '</p><hr><a href="http://www.python.org">'
    print '<img src="../PyErrata/PythonPoweredSmall.gif" '
    print 'align=left alt="[Python Logo]" border=0 hspace=15></a>' 
    print '<a href="%s">Back to root page</a>' % root
    print '</body></html>'

def formatlink(cgiurl, parmdict):
    """
    make "%url?key=val&key=val" query link from a dictionary;
    escapes str(  ) of all key and val with %xx, changes ' ' to +
    note that url escapes are different from html (cgi.escape)
    """ 
    parmtext = urllib.urlencode(parmdict)           # calls urllib.quote_plus
    return '%s?%s' % (cgiurl, parmtext)             # urllib does all the work

def pagelistsimple(linklist):                       # show simple ordered list
    print '<ol>'
    for (text, cgiurl, parmdict) in linklist:
        link = formatlink(cgiurl, parmdict)
        text = cgi.escape(text)
        print '<li><a href="%s">\n    %s</a>' % (link, text)
    print '</ol>'
 
def pagelisttable(linklist):                        # show list in a table
    print '<p><table border>'                       # escape text to be safe
    count = 1
    for (text, cgiurl, parmdict) in linklist:
        link = formatlink(cgiurl, parmdict)
        text = cgi.escape(text)
        print '<tr><th><a href="%s">View</a> %d<td>\n %s' % (link, count, text)
        count = count+1
    print '</table>'

def listpage(linkslist, kind='selection list'):
    pageheader(kind=kind)
    pagelisttable(linkslist)         # [('text', 'cgiurl', {'parm':'value'})]
    pagefooter(  )

def messagearea(headers, text, extra=''):
    print '<table border cellpadding=3>'
    for hdr in ('From', 'To', 'Cc', 'Subject'):
        val = headers.get(hdr, '?')
        val = cgi.escape(val, quote=1)
        print '<tr><th align=right>%s:' % hdr
        print '    <td><input type=text '
        print '    name=%s value="%s" %s size=60>' % (hdr, val, extra)
    print '<tr><th align=right>Text:'
    print '<td><textarea name=text cols=80 rows=10 %s>' % extra
    print '%s\n</textarea></table>' % (cgi.escape(text) or '?')   # if has </>s

def viewpage(msgnum, headers, text, form):
    """
    on View + select (generated link click)
    very subtle thing: at this point, pswd was url encoded in the
    link, and then unencoded by cgi input parser; it's being embedded
    in html here, so we use cgi.escape; this usually sends nonprintable
    chars in the hidden field's html, but works on ie and ns anyhow:
    in url:  ?user=lutz&mnum=3&pswd=%8cg%c2P%1e%f0%5b%c5J%1c%f3&...
    in html: <input type=hidden name=pswd value="...nonprintables..">
    could urllib.quote the html field here too, but must urllib.unquote
    in next script (which precludes passing the inputs in a URL instead 
    of the form); can also fall back on numeric string fmt in secret.py
    """ 
    pageheader(kind='View')
    user, pswd, site = map(cgi.escape, getstandardpopfields(form))
    print '<form method=post action="%s/onViewSubmit.cgi">' % urlroot
    print '<input type=hidden name=mnum value="%s">' % msgnum
    print '<input type=hidden name=user value="%s">' % user     # from page|url
    print '<input type=hidden name=site value="%s">' % site     # for deletes
    print '<input type=hidden name=pswd value="%s">' % pswd     # pswd encoded
    messagearea(headers, text, 'readonly')

    # onViewSubmit.quotetext needs date passed in page
    print '<input type=hidden name=Date value="%s">' % headers.get('Date','?')
    print '<table><tr><th align=right>Action:'
    print '<td><select name=action>'
    print '    <option>Reply<option>Forward<option>Delete</select>'
    print '<input type=submit value="Next">'
    print '</table></form>'                      # no 'reset' needed here
    pagefooter(  )

def editpage(kind, headers={}, text=''):     
    # on Send, View+select+Reply, View+select+Fwd
    pageheader(kind=kind)
    print '<form method=post action="%s/onSendSubmit.cgi">' % urlroot
    if mailconfig.mysignature:
        text = '\n%s\n%s' % (mailconfig.mysignature, text)
    messagearea(headers, text)
    print '<input type=submit value="Send">'
    print '<input type=reset  value="Reset">'
    print '</form>'
    pagefooter(  )

def errorpage(message):
    pageheader(kind='Error')                        # or sys.exc_type/exc_value
    exc_type, exc_value = sys.exc_info(  )[:2]        # but safer,thread-specific
    print '<h2>Error Description</h2><p>', message  
    print '<h2>Python Exception</h2><p>',  cgi.escape(str(exc_type))
    print '<h2>Exception details</h2><p>', cgi.escape(str(exc_value))
    pagefooter(  )

def confirmationpage(kind):
    pageheader(kind='Confirmation')
    print '<h2>%s operation was successful</h2>' % kind
    print '<p>Press the link below to return to the main page.</p>'
    pagefooter(  )

def getfield(form, field, default=''):
    # emulate dictionary get method
    return (form.has_key(field) and form[field].value) or default

def getstandardpopfields(form):
    """
    fields can arrive missing or '' or with a real value
    hard-coded in a url; default to mailconfig settings
    """
    return (getfield(form, 'user', mailconfig.popusername),
            getfield(form, 'pswd', '?'),
            getfield(form, 'site', mailconfig.popservername))

def getstandardsmtpfields(form):
    return  getfield(form, 'site', mailconfig.smtpservername)

def runsilent(func, args):
    """
    run a function without writing stdout
    ex: suppress print's in imported tools
    else they go to the client/browser
    """
    class Silent:
        def write(self, line): pass 
    save_stdout = sys.stdout
    sys.stdout  = Silent(  )                         # send print to dummy object
    try:                                           # which has a write method
        result = apply(func, args)                 # try to return func result
    finally:                                       # but always restore stdout
        sys.stdout = save_stdout
    return result

def dumpstatepage(exhaustive=0):
    """
    for debugging: call me at top of a cgi to
    generate a new page with cgi state details 
    """
    if exhaustive:
        cgi.test(  )                       # show page with form, environ, etc.
    else:                                
        pageheader(kind='state dump')
        form = cgi.FieldStorage(  )        # show just form fields names/values
        cgi.print_form(form)
        pagefooter(  )
    sys.exit(  )
                              
def selftest(showastable=0):                        # make phony web page
    links = [                                       # [(text, url, {parms})]
        ('text1', urlroot + '/page1.cgi', {'a':1}),         
        ('text2', urlroot + '/page1.cgi', {'a':2, 'b':'3'}),
        ('text3', urlroot + '/page2.cgi', {'x':'a b', 'y':'a<b&c', 'z':'?'}),
        ('te<>4', urlroot + '/page2.cgi', {'<x>':'', 'y':'<a>', 'z':None})]
    pageheader(kind='View')
    if showastable:
        pagelisttable(links)
    else:
        pagelistsimple(links)
    pagefooter(  )

if __name__ == '__main__':                          # when run, not imported
    selftest(len(sys.argv) > 1)                     # html goes to stdout

13.7 CGI Script Trade-offs

As shown in this chapter, PyMailCgi is still something of a system in the making, but it does work as advertised: by pointing a browser at the main page's URL, I can check and send email from anywhere I happen to be, as long as I can find a machine with a web browser. In fact, any machine and browser will do: Python doesn't even have to be installed.[4] That's not the case with the PyMailGui client-side program we wrote in Chapter 11.

But before we all jump on the collective Internet bandwagon and utterly abandon traditional APIs like Tkinter, a few words of larger context are in order. Besides illustrating larger CGI applications in general, this example was chosen to underscore some of the trade-offs you run into when building applications to run on the Web. PyMailGui and PyMailCgi do roughly the same things, but are radically different in implementation:

On a basic level, both systems use the Python POP and SMTP modules to fetch and send email through sockets. But the implementation alternatives they represent have some critical ramifications that you should know about when considering delivering systems on the Web:

Performance costs

Networks are slower than CPUs . As implemented, PyMailCgi isn't nearly as fast or as complete as PyMailGui. In PyMailCgi, every time the user clicks a submit button, the request goes across the network. More specifically, every user request incurs a network transfer overhead, every callback handler (usually) takes the form of a newly spawned process on the server, parameters come in as text strings that must be parsed out, and the lack of state information on the server between pages means that mail needs to be reloaded often. In contrast, user clicks in PyMailGui trigger in-process function calls instead of network traffic and process forks, and state is easily saved as Python in-process variables (e.g., the loaded-mail list is retained between clicks). Even with an ultra-fast Internet connection, a server-side CGI system is slower than a client-side program.[5]

Some of these bottlenecks may be designed away at the cost of extra program complexity. For instance, some web servers use threads and process pools to minimize process creation for CGI scripts. Moreover, some state information can be manually passed along from page to page in hidden form fields and generated URL parameters, and state can be saved between pages in a concurrently accessible database to minimize mail reloads (see the PyErrata case study in Chapter 14 for an example). But there's no getting past the fact that routing events over a network to scripts is much slower than calling a Python function directly.

Complexity costs

HTML isn't pretty . Because PyMailCgi must generate HTML to interact with the user in a web browser, it is also more complex (or at least, less readable) than PyMailGui. In some sense, CGI scripts embed HTML code in Python. Because the end result of this is a mixture of two very different languages, creating an interface with HTML in a CGI script can be much less straightforward than making calls to a GUI API such as Tkinter.

Witness, for example, all the care we've taken to escape HTML and URLs in this chapter's examples; such constraints are grounded in the nature of HTML. Furthermore, changing the system to retain loaded-mail list state in a database between pages would introduce further complexities to the CGI-based solution. Secure sockets (e.g., OpenSSL, to be supported in Python 1.6) would eliminate manual encryption costs, but introduce other overheads.

Functionality costs

HTML can only say so much. HTML is a portable way to specify simple pages and forms, but is poor to useless when it comes to describing more complex user interfaces. Because CGI scripts create user interfaces by writing HTML back to a browser, they are highly limited in terms of user-interface constructs.

For example, consider implementing an image-processing and animation program as CGI scripts: HTML doesn't apply once we leave the domain of fill-out forms and simple interactions. This is precisely the limitation that Java applets were designed to address -- programs that are stored on a server but pulled down to run on a client on demand, and given access to a full-featured GUI API for creating richer user interfaces. Nevertheless, strictly server-side programs are inherently limited by the constraints of HTML. The animation scripts we wrote at the end of Chapter 8, for example, are well beyond the scope of server-side scripts.

Portability benefits

All you need is a browser . On the client side, at least. Because PyMailCgi runs over the Web, it can be run on any machine with a web browser, whether that machine has Python and Tkinter installed or not. That is, Python needs to be installed on only one computer: the web server machine where the scripts actually live and run. As long as you know that the users of your system have an Internet browser, installation is simple.

Python and Tkinter, you will recall, are very portable too -- they run on all major window systems (X, Windows, Mac) -- but to run a client-side Python/Tk program such as PyMailGui, you need Python and Tkinter on the client machine itself. Not so with an application built as CGI scripts: they will work on Macintosh, Linux, Windows, and any other machine that can somehow render HTML web pages. In this sense, HTML becomes a sort of portable GUI API language in CGI scripts, interpreted by your web browser. You don't even need the source code or bytecode for the CGI scripts themselves -- they run on a remote server that exists somewhere else on the Net, not on the machine running the browser.

Execution requirements

But you do need a browser. That is, the very nature of web-enabled systems can render them useless in some environments. Despite the pervasiveness of the Internet, there are still plenty of applications that run in settings that don't have web browsers or Internet access. Consider, for instance, embedded systems, real-time systems, and secure government applications. While an Intranet (a local network without external connections) can sometimes make web applications feasible in some such environments, I have recently worked at more than one company whose client sites had no web browsers to speak of. On the other hand, such clients may be more open to installing systems like Python on local machines, as opposed to supporting an internal or external network.

Administration requirements

You really need a server too . You can't write CGI-based systems at all without access to a web sever. Further, keeping programs on a centralized server creates some fairly critical administrative overheads. Simply put, in a pure client/server architecture, clients are simpler, but the server becomes a critical path resource and a potential performance bottleneck. If the centralized server goes down, you, your employees, and your customers may be knocked out of commission. Moreover, if enough clients use a shared server at the same time, the speed costs of web-based systems become even more pronounced. In fact, one could make the argument that moving towards a web server architecture is akin to stepping backwards in time -- to the time of centralized mainframes and dumb terminals. Whichever way we step, offloading and distributing processing to client machines at least partially avoids this processing bottleneck.

So what's the best way to build applications for the Internet -- as client-side programs that talk to the Net, or as server-side programs that live and breathe on the Net? Naturally, there is no one answer to that question, since it depends upon each application's unique constraints. Moreover, there are more possible answers to it than we have proposed here; most common CGI problems already have common proposed solutions. For example:

Client-side solutions

Client- and server-side programs can be mixed in many ways. For instance, applet programs live on a server, but are downloaded to and run as client-side programs with access to rich GUI libraries (more on applets when we discuss JPython in Chapter 15). Other technologies, such as embedding JavaScript or Python directly in HTML code, also support client-side execution and richer GUI possibilities; such scripts live in HTML on the server, but run on the client when downloaded and access browser components through an exposed object model (see the discussion Section 15.8 near the end of Chapter 15). The emerging Dynamic HTML (DHTML) extensions provide yet another client-side scripting option for changing web pages after they have been constructed. All of these client-side technologies add extra complexities all their own, but ease some of the limitations imposed by straight HTML.

State retention solutions

Some web application servers (e.g., Zope, described in Chapter 15) naturally support state retention between pages by providing concurrently accessible object databases. Some of these systems have a real underlying database component (e.g., Oracle and mySql); others may make use of files or Python persistent object shelves with appropriate locking (as we'll explore in the next chapter). Scripts can also pass state information around in hidden form fields and generated URL parameters, as done in PyMailCgi, or store it on the client machine itself using the standard cookie protocol.

Cookies are bits of information stored on the client upon request from the server. A cookie is created by sending special headers from the server to the client within the response HTML (Set-Cookie: name=value). It is then accessed in CGI scripts as the value of a special environment variable containing cookie data uploaded from the client (HTTP_COOKIE). Search http://www.python.org for more details on using cookies in Python scripts, including the freely available cookie.py module, which automates the cookie translation process.[6] Cookies are more complex than program variables and are somewhat controversial (some see them as intrusive), but they can offload some simple state retention tasks.

HTML generation solutions

Add-ons can also take some of the complexity out of embedding HTML in Python CGI scripts, albeit at some cost to execution speed. For instance, the HTMLgen system described in Chapter 15 lets programs build pages as trees of Python objects that "know" how to produce HTML. When a system like this is employed, Python scripts deal only with objects, not the syntax of HTML itself. Other systems such as PHP and Active Server Pages (described in the same chapter) allow scripting language code to be embedded in HTML and executed on the server, to dynamically generate or determine part of the HTML that is sent back to a client in response to requests.

Clearly, Internet technology does imply some design trade-offs, and is still evolving rapidly. It is nevertheless an appropriate delivery context for many (though not all) applications. As with every design choice, you must be the judge. While delivering systems on the Web may have some costs in terms of performance, functionality, and complexity, it is likely that the significance of those overheads will diminish with time.

[1]  Technically, again, you should generally escape & separators in generated URL links like by running the URL through cgi.escape, if any parameter's name could be the same as that of an HTML character escape code (e.g., "&amp=high"). See the prior chapter for more details; they aren't escaped here because there are no clashes.

[2]  Notice that the message number arrives as a string and must be converted to an integer in order to be used to fetch the message. But we're careful not convert with eval here, since this is a string passed over the Net and could have arrived embedded at the end of an arbitrary URL (remember that earlier warning?).

[3]  Note that there are other ways to handle password security, beyond the custom encryption schemes described in this section. For instance, Python's socket module now supports the server-side portion of the OpenSSL secure sockets protocol. With it, scripts may delegate the security task to web browsers and servers. On the other hand, such schemes do not afford as good an excuse to introduce Python's standard encryption tools in this book.

[4]  This property can be especially useful when visiting government institutions, which seem to generally provide web browser accessibility, but restrict administrative functions and broader network connectivity to officially cleared system administrators (and international spies).

[5]  To be fair, some Tkinter operations are sent to the underlying Tcl library as strings too, which must be parsed. This may change in time; but the contrast here is with CGI scripts versus GUI libraries in general, not with a particular library's implementation.

[6]  Also see the new standard cookie module in Python release 2.0.

CONTENTS