I l@ve RuBoard |
7.6 The urllib ModuleThe urlib module provides a unified client interface for HTTP, FTP, and gopher. It automatically picks the right protocol handler based on the uniform resource locator (URL) passed to the library. Fetching data from a URL is extremely easy. Just call the urlopen method, and read from the returned stream object, as shown in Example 7-14. Example 7-14. Using the urllib Module to Fetch a Remote ResourceFile: urllib-example-1.py import urllib fp = urllib.urlopen("http://www.python.org") op = open("out.html", "wb") n = 0 while 1: s = fp.read(8192) if not s: break op.write(s) n = n + len(s) fp.close() op.close() for k, v in fp.headers.items(): print k, "=", v print "copied", n, "bytes from", fp.url server = Apache/1.3.6 (Unix) content-type = text/html accept-ranges = bytes date = Mon, 11 Oct 1999 20:11:40 GMT connection = close etag = "741e9-7870-37f356bf" content-length = 30832 last-modified = Thu, 30 Sep 1999 12:25:35 GMT copied 30832 bytes from http://www.python.org Note that stream object provides some non-standard attributes. headers is a Message object (as defined by the mimetools module), and url contains the actual URL. The latter is updated if the server redirects the client to a new URL. The urlopen function is actually a helper function, which creates an instance of the FancyURLopener class and calls its open method. To get special behavior, you can subclass that class. For instance, the class in Example 7-15 automatically logs in to the server when necessary. Example 7-15. Using the urllib Module with Automatic AuthenticationFile: urllib-example-3.py import urllib class myURLOpener(urllib.FancyURLopener): # read an URL, with automatic HTTP authentication def setpasswd(self, user, passwd): self._ _user = user self._ _passwd = passwd def prompt_user_passwd(self, host, realm): return self._ _user, self._ _passwd urlopener = myURLOpener() urlopener.setpasswd("mulder", "trustno1") fp = urlopener.open("http://www.secretlabs.com") print fp.read() |
I l@ve RuBoard |