I l@ve RuBoard |
11.11 Module: Fetching Latitude/Longitude Data from the WebCredit: Will Ware Given a list of cities, Example 11-1 fetches their latitudes and longitudes from one web site (http://www.astro.ch, a database used for astrology, of all things) and uses them to dynamically build a URL for another web site (http://pubweb.parc.xerox.com), which, in turn, creates a map highlighting the cities against the outlines of continents. Maybe someday a program will be clever enough to load the latitudes and longitudes as waypoints into your GPS receiver. The code can be vastly improved in several ways. The main fragility of the recipe comes from relying on the exact format of the HTML page returned by the www.astro.com site, particularly in the rather clumsy for x in inf.readlines( ) loop in the findcity function. If this format ever changes, the recipe will break. You could change the recipe to use htmllib.HTMLParser instead, and be a tad more immune to modest format changes. This helps only a little, however. After all, HTML is meant for human viewers, not for automated parsing and extraction of information. A better approach would be to find a site serving similar information in XML (including, quite possibly, XHTML, the XML/HTML hybrid that combines the strengths of both of its parents) and parse the information with Python's powerful XML tools (covered in Chapter 12). However, despite this defect, this recipe still stands as an example of the kind of opportunity already afforded today by existing services on the Web, without having to wait for the emergence of commercialized web services. Example 11-1. Fetching latitude/longitude data from the Webimport string, urllib, re, os, exceptions, webbrowser JUST_THE_US = 0 class CityNotFound(exceptions.Exception): pass def xerox_parc_url(marklist): """ Prepare a URL for the xerox.com map-drawing service, with marks at the latitudes and longitudes listed in list-of-pairs marklist. """ avg_lat, avg_lon = max_lat, max_lon = marklist[0] marks = ["%f,%f" % marklist[0]] for lat, lon in marklist[1:]: marks.append(";%f,%f" % (lat, lon)) avg_lat = avg_lat + lat avg_lon = avg_lon + lon if lat > max_lat: max_lat = lat if lon > max_lon: max_lon = lon avg_lat = avg_lat / len(marklist) avg_lon = avg_lon / len(marklist) if len(marklist) == 1: max_lat, max_lon = avg_lat + 1, avg_lon + 1 diff = max(max_lat - avg_lat, max_lon - avg_lon) D = {'height': 4 * diff, 'width': 4 * diff, 'lat': avg_lat, 'lon': avg_lon, 'marks': ''.join(marks)} if JUST_THE_US: url = ("http://pubweb.parc.xerox.com/map/db=usa/ht=%(height)f" + "/wd=%(width)f/color=1/mark=%(marks)s/lat=%(lat)f/" + "lon=%(lon)f/") % D else: url = ("http://pubweb.parc.xerox.com/map/color=1/ht=%(height)f" + "/wd=%(width)f/color=1/mark=%(marks)s/lat=%(lat)f/" + "lon=%(lon)f/") % D return url def findcity(city, state): Please_click = re.compile("Please click") city_re = re.compile(city) state_re = re.compile(state) url = ("""http://www.astro.ch/cgi-bin/atlw3/aq.cgi?expr=%s&lang=e""" % (string.replace(city, " ", "+") + "%2C+" + state)) lst = [ ] found_please_click = 0 inf = urllib.FancyURLopener( ).open(url) for x in inf.readlines( ): x = x[:-1] if Please_click.search(x) != None: # Here is one assumption about unchanging structure found_please_click = 1 if (city_re.search(x) != None and state_re.search(x) != None and found_please_click): # Pick apart the HTML pieces L = [ ] for y in string.split(x, '<'): L = L + string.split(y, '>') # Discard any pieces of zero length lst.append(filter(None, L)) inf.close( ) try: # Here's a few more assumptions x = lst[0] lat, lon = x[6], x[10] except IndexError: raise CityNotFound("not found: %s, %s"%(city, state)) def getdegrees(x, dividers): if string.count(x, dividers[0]): x = map(int, string.split(x, dividers[0])) return x[0] + (x[1] / 60.) elif string.count(x, dividers[1]): x = map(int, string.split(x, dividers[1])) return -(x[0] + (x[1] / 60.)) else: raise CityNotFound("Bogus result (%s)" % x) return getdegrees(lat, "ns"), getdegrees(lon, "ew") def showcities(citylist): marklist = [ ] for city, state in citylist: try: lat, lon = findcity(city, state) print ("%s, %s:" % (city, state)), lat, lon marklist.append((lat, lon)) except CityNotFound, message: print "%s, %s: not in database? (%s)" % (city, state, message) url = xerox_parc_url(marklist) # Print URL # os.system('netscape "%s"' % url) webbrowser.open(url) # Export a few lists for test purposes citylist = (("Natick", "MA"), ("Rhinebeck", "NY"), ("New Haven", "CT"), ("King of Prussia", "PA")) citylist1 = (("Mexico City", "Mexico"), ("Acapulco", "Mexico"), ("Abilene", "Texas"), ("Tulum", "Mexico")) citylist2 = (("Munich", "Germany"), ("London", "England"), ("Madrid", "Spain"), ("Paris", "France")) if _ _name_ _=='_ _main_ _': showcities(citylist1) 11.11.1 See AlsoDocumentation for the standard library module htmlllib in the Library Reference; information about the Xerox PARC map viewer is at http://www.parc.xerox.com/istl/projects/mapdocs/; AstroDienst hosts a worldwide server of latitude/longitude data (http://www.astro.com/cgi-bin/atlw3/aq.cgi). |
I l@ve RuBoard |