21.2 MIME and Email Format Handling
Python supplies the
email package to handle parsing, generation, and
manipulation of MIME files such as email messages, network news
posts, and so on. The Python standard library also contains other
modules that handle some parts of these jobs. However, the new
email package offers a more complete and
systematic approach to these important tasks. I therefore suggest you
use package email, not the older modules that
partially overlap with parts of
email's functionality. Package
email has nothing to do with receiving or sending
email; for such tasks, see modules poplib and
smtplib, covered in Chapter 18.
Instead, package email deals with how you handle
messages after you receive them or before you send
them.
21.2.1 Functions in Package email
Package email supplies two factory functions
returning an instance m of class
email.Message.Message. These functions rely on
class email.Parser.Parser, but the factory
functions are handier and simpler. Therefore, I do not cover module
Parser further in this
book.
Builds m by parsing string
s.
Builds m by parsing the contents of
file-like object f, which must be open for
reading.
21.2.2 The email.Message Module
The email.Message
module supplies class Message. All parts of
package email produce, modify, or use instances of
class Message. An instance
m of Message models a
MIME message, including headers and a payload (data content). You can
create m, initially empty, by calling
class Message, which accepts no arguments. More
often, you create m by parsing via
functions message_from_string and
message_from_file of module
email, or by other indirect means such as the
classes covered in "Creating
Messages" later in this chapter.
m's payload can be a
string, a single other instance of Message, or a
list of other Message instances for a multipart
message.
You can set arbitrary headers on email messages
you're building. Several Internet RFCs specify
headers that you can use for a wide variety of purposes. The main
applicable RFC is RFC 2822 (see http://www.faqs.org/rfcs/rfc2822.html). An
instance m of class
Message holds headers as well as a payload.
m is a mapping, with header names as keys
and header value strings as values. The semantics of
m as a mapping are rather different from
those of a dictionary, to make m more
convenient. m's keys are
case-insensitive. m keeps headers in the
order in which you add them, and methods keys,
values, and items return
headers in that order. m can have more
than one header named
key�m[key]
returns an arbitrary one of them, del
m[key]
deletes all of them.
len(m)
returns the total number of headers, counting duplicates, not just
the number of distinct header names. If there is no header named
key,
m[key]
returns None and does not raise
KeyError (i.e., behaves like
m.get(key)),
and del
m[key]
is a no-operation.
An instance m of
Message supplies the following attributes and
methods dealing with m's
headers and payload.
m.add_header(_name,_value,**_params)
|
|
Like
m[_name]=_value,
but you can also supply header parameters as keyword arguments. For
each keyword argument
pname=pvalue,
add_header changes underscores to dashes, then
appends to the header's value a parameter of the
form:
; pname="pvalue" If pvalue is None,
add_header appends only a parameter
';
pname'.
Adds the payload to
m's payload. If
m's payload was
None,
m's payload is now
payload. If
m's payload was a list,
appends payload to the list. If
m's payload was a single
item x,
m's payload becomes the
list
[x,payload],
but only if m's
Content-Type header is missing or has a main type of
multipart. Otherwise, when
m has a single payload and a Content-Type
whose main type is not multipart,
m.add_payload(payload)
raises a MultipartConversionError exception.
m.as_string(unixfrom=False)
|
|
Returns the entire message as a string. When
unixfrom is true, also includes a first
line, normally starting with 'From ', known as the
envelope header of the message.
Attribute
m.epilogue can be
None, or a string that becomes part of the
message's string form after the last boundary line.
Mail programs normally don't display this text.
epilogue is a normal attribute of
m: your program can access it when
you're examining an m
that is fully built by whatever means, and your program can bind it
when you're building or modifying
m in your program.
m.get_all(name,default=None)
|
|
Returns a list with all values of headers named
name, in the order in which the headers
were added to m. When
m has no header named
name, get_all returns
default.
m.get_boundary(default=None)
|
|
Returns the string value of the
boundary parameter of
m's Content-Type header.
When m has no Content-Type header, or the
header has no boundary parameter,
get_boundary returns
default.
m.get_charsets(default=None)
|
|
Returns the list L of string values of
parameter charset of
m's Content-Type headers.
When m is multipart,
L has one item per part, otherwise
L has length 1. For parts that have no
Content-Type, no charset parameter, or a main type
different from 'text', the corresponding item in
L is default.
m.get_filename(default=None)
|
|
Returns the string value of the filename parameter
of m's
Content-Disposition header. When m has no
Content-Disposition, or the header has no filename
parameter, get_filename returns
default.
m.get_maintype(default=None)
|
|
Returns m's main content
type, a string
'maintype'
taken from header Content-Type converted to lowercase. When
m has no header Content-Type,
get_maintype returns
default.
m.get_param(param,default=None,header='Content-Type')
|
|
Returns the string value of the parameter named
param of
m's header named
header. Returns the empty string for a
parameter specified just by name. When m
has no header header, or the header has no
parameter named param,
get_param returns
default.
m.get_params(default=None,header='Content-Type')
|
|
Returns the parameters of
m's header named
header, a list of pairs of strings giving
each parameter's name and value. Uses the empty
string as the value for parameters specified just by name. When
m has no header
header, get_params
returns default.
m.get_payload(i=None,decode=False)
|
|
Returns m's payload. When
m.is_multipart( ) is
False, i must be
None, and
m.get_payload( )
returns m's entire
payload, a string or a Message instance. If
decode is true, and the value of header
Content-Transfer-Encoding is either
'quoted-printable' or 'base64',
m.get_payload also
decodes the payload. If decode is false,
or header Content-Transfer-Encoding is missing or has other values,
m.get_payload returns
the payload unchanged.
When m.is_multipart( )
is True, decode must be
false. When i is
None,
m.get_payload( )
returns m's payload as a
list. Otherwise, m.get_payload(
) returns the ith item of the
payload, and raises TypeError if
i is less than 0 or is
too large.
m.get_subtype(default=None)
|
|
Returns m's content
subtype, a string
'subtype'
taken from header Content-Type converted to lowercase. When
m has no header Content-Type,
get_subtype returns
default.
Returns m's content type,
a string
'maintype/subtype'
taken from header Content-Type converted to lowercase. When
m has no header Content-Type,
get_type returns
default.
Returns the envelope header string for m,
or None if the envelope header was never set.
Returns True when
m's payload is a list,
otherwise False.
Attribute
m.preamble can be
None or a string that becomes part of the
message's string form before the first boundary
line. Only mail programs that don't support
multipart messages display this text to the user, so you can use this
attribute to alert the user that your message is multipart and that a
different mail program is needed to view it.
preamble is a normal attribute of
m: your program can access it when
you're examining an m
that is fully built by whatever means, and your program can bind it
when you're building or modifying
m in your program.
Sets the boundary parameter of
m's Content-Type header
to boundary. When
m has no Content-Type header, raises
HeaderParseError.
Sets m's payload to
payload, which must be a string or list,
as appropriate.
Sets the envelope header string for m.
unixfrom is the entire envelope header
line, including the leading 'From ' but not
including the trailing '\n'.
Returns an iterator on all parts and subparts of
m, to walk the tree of parts
depth-first.
21.2.3 The email.Generator Module
The email.Generator
module supplies class Generator, which you can use
to generate the textual form of a message
m.
m.as_string and
str(m)
may be sufficient, but class Generator gives you
slightly more flexibility. You instantiate
Generator with a mandatory argument and two
optional ones.
class Generator(outfp,mangle_from_=False,maxheaderlen=78)
|
|
outfp is a file or file-like object
supplying method write. When
mangle_from_ is true,
g prepends a '>' to
any line in a message's payload that starts with
'From ' This helps make the
message's textual form more safely parseable.
g wraps each header line at semicolons,
into physical lines of no more than
maxheaderlen characters, for readability.
To use g, just call it:
g(m, unixfrom=False) This emits m in text form to
outfp, like
outfp.write(m.as_string(unixfrom)).
21.2.4 Creating Messages
Package email supplies modules with names starting
with 'MIME', each module supplying a subclass of
Message named like the module. These classes make
it easier to create Message instances of various
MIME types. The MIME classes are as follows.
class MIMEAudio(_audiodata,_subtype=None,_encoder=None,**_params)
|
|
_audiodata is a byte
string of audio data to pack in a message of MIME type
'audio/_subtype'.
When _subtype is None,
_audiodata must be parseable by standard
Python module sndhdr to determine the subtype;
otherwise MIMEAudio raises a
TypeError. When
_encoder is None,
MIMEAudio encodes data as Base 64, which is
generally optimal. Otherwise, _encoder
must be callable with one parameter m, the
message being constructed; _encoder must
then call m.get_payload(
) to get the payload, encode the payload, put the encoded
form back by calling
m.set_payload, and set
m['Content-Transfer-Encoding']
appropriately. MIMEAudio passes the
_params dictionary of keyword argument
names and values to
m.add_header to
construct m's
Content-Type.
class MIMEBase(_maintype,_subtype,**_params)
|
|
The base class of all MIME classes; directly subclasses
Message. Instantiating:
m = MIMEBase(main,sub,**parms) is equivalent to the longer and less convenient idiom:
m = Message( )
m.add_header('Content-Type','%s/%s'%(main,sub),**parms)
m.add_header('Mime-Version','1.0')
class MIMEAudio(_imagedata,_subtype=None,_encoder=None,**_params)
|
|
Like MIMEAudio, but with maintype
'image' and using standard Python module
imghdr to determine the subtype if needed.
class MIMEMessage(msg,_subtype='rfc822')
|
|
Packs msg, which must be an instance of
Message (or a subclass), as the payload of a
message of MIME type
'message/_subtype'.
class MIMEText(_text,_subtype='plain',_charset='us-ascii',_encoder=None)
|
|
Packs text string _text as the payload of
a message of MIME type
'text/_subtype'
with the given charset. When
_encoder is None,
MIMEText does not encode the text, which is
generally optimal. Otherwise, _encoder
must be callable with one parameter m, the
message being constructed; _encoder must
then call m.get_payload(
) to get the payload, encode the payload, put the encoded
form back by calling
m.set_payload, and set
m['Content-Transfer-Encoding']
appropriately.
21.2.5 The email.Encoders Module
The email.Encoders
module supplies functions that take a message
m as their only argument, encode
m's payload, and set
m's headers
appropriately.
Uses Base 64 encoding, optimal for arbitrary binary data.
Does nothing to m's
payload and headers.
Uses Quoted Printable encoding, optimal for textual data that is not
fully ASCII.
Does nothing to m's
payload, sets header Content-Transfer-Encoding to
'8bit' if any byte of
m's payload has the high
bit set, or otherwise to '7bit'.
21.2.6 The email.Utils Module
The
email.Utils module supplies miscellaneous
functions useful for email processing.
Decodes string s as per the rules in RFC
2047 and returns the resulting Unicode string.
pair is a pair of strings
(name,email_address).
dump_address_pair returns a string
s with the address to insert in header
fields such as To and Cc. When name is
false (e.g., ''),
dump_address_pair returns
email_address.
encode(s,charset='iso-8859-1',encoding='q')
|
|
Encodes string s (which must use the given
charset) as per the rules in RFC 2047.
encoding must be 'q' to
specify Quoted Printable, or 'b' to specify Base
64.
formatdate(timeval=None,localtime=False)
|
|
timeval is a number of seconds since the
epoch. When timeval is
None, formatdate uses the
current time. When localtime is true,
formatdate uses the local timezone; otherwise it
uses UTC. formatdate returns a string with the
given time instant formatted in the way specified by RFC 2822.
Parses each item of L, a list of address
strings as used in header fields such as To and
Cc, and returns a list of pairs of strings
(name,email_address).
When getaddresses cannot parse an item of
L as an address,
getaddresses uses (None,None)
as the corresponding item in the list it returns.
t is a tuple with 10 items, the first 9 in
the same format used in module time covered in
Chapter 12,
t[-1] is a time zone as
an offset in seconds from UTC (with the opposite sign from
time.timezone, as specified by RFC 2822). When
t[-1] is
None, mktime_tz uses the local
time zone. mktime_tz returns a float with the
number of seconds since the epoch, in UTC, corresponding to the time
instant that t denotes.
Parses string s, which contains an address
as typically specified in header fields such as To
and Cc, and returns a pair of strings
(name,email_address).
When parseaddr cannot parse
s as an address,
parseaddr returns (None,None).
Parses string s as per the rules in RFC
2822 and returns a tuple t with 9 items,
as used in module time covered in Chapter 12 (the items
t[-3:] are not
meaningful). parsedate also attempts to parse
erroneous variations on RFC 2822 that widespread mailers use. When
parsedate cannot parse
s, parsedate returns
None.
Like parsedate, but returns a tuple
t with 10 items, where
t[-1] is
s's time zone as an
offset in seconds from UTC (with the opposite sign from
time.timezone, as specified by RFC 2822), like in
the argument that mktime_tz accepts. Items
t[-4:-1] are not
meaningful. When s has no time zone,
t[-1] is
None.
Returns a copy of string s where each
double quote (") becomes '\"'
and each existing backslash is repeated.
Returns a copy of string s where leading
and trailing double quote characters (") and angle
brackets (<>) are removed if they surround
the rest of s.
21.2.7 The Message Classes of the rfc822 and mimetools Modules
The best way to handle email-like messages
is with package email. However, other modules
covered in Chapter 18 and Chapter 20 use instances of class
rfc822.Message or its subclass
mimetools.Message. This section covers the subset
of these classes' functionality that you need to
make effective use of the modules covered in Chapter 18 and Chapter 20.
An instance m of class
Message is a mapping, with the
headers' names as keys and the corresponding header
value strings as values. Keys and values are strings, and keys are
case-insensitive. m supports all mapping
methods except clear, copy,
popitem, and update.
get and setdefault default to
'', instead of None. Instance
m also supplies convenience methods (e.g.,
to combine getting a header's value and parsing it
as a date or an address). I suggest you use for such purposes the
functions of module email.Utils, covered earlier
in this chapter, and use m just as a
mapping.
When m is an instance of
mimetools.Message, m
supplies additional methods.
Returns
m's main content type,
taken from header Content-Type converted to lowercase. When
m has no header Content-Type,
getmaintype returns 'text'.
Returns the string value of the
parameter named param of
m's header Content-Type.
Returns m's content
subtype, taken from header Content-Type converted to lowercase. When
m has no header Content-Type,
getsubtype returns 'plain'.
Returns
m's content type, taken
from header Content-Type converted to lowercase. When
m has no header Content-Type,
gettype returns
'text/plain'.
|