I l@ve RuBoard |
7.7 The urlparse ModuleThe urlparse module contains functions to process URLs, and to convert between URLs and platform-specific filenames. Example 7-16 demonstrates. Example 7-16. Using the urlparse ModuleFile: urlparse-example-1.py import urlparse print urlparse.urlparse("http://host/path;params?query#fragment") ('http', 'host', '/path', 'params', 'query', 'fragment') A common use is to split an HTTP URL into host and path components (an HTTP request involves asking the host to return data identified by the path), as shown in Example 7-17. Example 7-17. Using the urlparse Module to Parse HTTP LocatorsFile: urlparse-example-2.py import urlparse scheme, host, path, params, query, fragment =\ urlparse.urlparse("http://host/path;params?query#fragment") if scheme == "http": print "host", "=>", host if params: path = path + ";" + params if query: path = path + "?" + query print "path", "=>", path host => host path => /path;params?query Alternatively, Example 7-18 shows how you can use the urlunparse function to put the URL back together again. Example 7-18. Using the urlparse Module to Parse HTTP LocatorsFile: urlparse-example-3.py import urlparse scheme, host, path, params, query, fragment =\ urlparse.urlparse("http://host/path;params?query#fragment") if scheme == "http": print "host", "=>", host print "path", "=>", urlparse.urlunparse( (None, None, path, params, query, None) ) host => host path => /path;params?query Example 7-19 uses the urljoin function to combine an absolute URL with a second, possibly relative URL. Example 7-19. Using the urlparse Module to Combine Relative LocatorsFile: urlparse-example-4.py import urlparse base = "http://spam.egg/my/little/pony" for path in "/index", "goldfish", "../black/cat": print path, "=>", urlparse.urljoin(base, path) /index => http://spam.egg/index goldfish => http://spam.egg/my/little/goldfish ../black/cat => http://spam.egg/my/black/cat |
I l@ve RuBoard |