I l@ve RuBoard |
10.12 Unpacking a Multipart MIME MessageCredit: Matthew Dixon Cowles 10.12.1 ProblemYou have a multipart MIME message and want to unpack it. 10.12.2 SolutionThe walk method of message objects generated by the email module (new as of Python 2.2) makes this task really easy: import email.Parser import os, sys def main( ): if len(sys.argv)==1: print "Usage: %s filename" % os.path.basename(sys.argv[0]) sys.exit(1) mailFile = open(sys.argv[1], "rb") p = email.Parser.Parser( ) msg = p.parse(mailFile) mailFile.close( ) partCounter = 1 for part in msg.walk( ): if part.get_main_type( )=="multipart": continue name = part.get_param("name") if name==None: name = "part-%i" % partCounter partCounter+=1 # In real life, make sure that name is a reasonable filename # for your OS; otherwise, mangle it until it is! f = open(name,"wb") f.write(part.get_payload(decode=1)) f.close( ) print name if _ _name_ _=="_ _main_ _": main( ) 10.12.3 DiscussionThe email module, new in Python 2.2, makes parsing MIME messages reasonably easy. (See the Library Reference for detailed documentation about the email module.) This recipe shows how to recursively unbundle a MIME message with the email module in the easiest way, using the walk method of message objects. You can create a message object in several ways. For example, you can instantiate the email.Message.Message class and build the message object's contents with calls to its add_payload method. In this recipe, I need to read and analyze an existing message, so I worked the other way around, calling the parse method of an email.Parser.Parser instance. The parse method takes as its only argument a file-like object (in the recipe, I pass it a real file object that I just opened for binary reading with the built-in open function) and returns a message object, on which you can call message object methods. The walk method is a generator, i.e., it returns an iterator object on which you can loop with a for statement. Usually, you will use this method exactly as I use it in this recipe: for part in msg.walk( ): The iterator sequentially returns (depth-first, in case of nesting) the parts that comprise the message. If the message is not a container of parts (has no attachments or alternates, i.e., message.is_multipart( ) is false), no problem: the walk method will return an iterator with a single element: the message itself. In any case, each element of the iterator is also a message object (an instance of email.Message.Message), so you can call on it any of the methods a message object supplies. In a multipart message, parts with a type of 'multipart/something' (i.e., a main type of 'multipart') may be present. In this recipe, I skip them explicitly since they're just glue holding the true parts together. I use the get_main_type method to obtain the main type and check it for equality with 'multipart'; if equality holds, I skip this part and move to the next one with a continue statement. When I know I have a real part in hand, I locate its name (or synthesize one if it has no name), open that name as a file, and write the message's contents (also known as the message's payload), which I get by calling the get_payload method, into the file. I use the decode=1 argument to ensure that the payload is decoded back to a binary content (e.g., an image, a sound file, a movie, etc.) if needed, rather than remaining in text form. If the payload is not encoded, decode=1 is innocuous, so I don't have to check before I pass it. 10.12.4 See AlsoRecipe 10.11; documentation for the standard library modules email, smtplib, mimetypes, base64, quopri, and cStringIO in the Library Reference. |
I l@ve RuBoard |