Quick and Dirty Lat/Lng Values from the Google Geocoding API

Python    2011-10-11

This past week, I had to assign lat/long values to a large batch of addresses we had sitting around in our database.

To get those values, I posted to the Google Geocoding API and parsed the resulting XML.

This script isn't particularly polished - I didn't spend much time thinking about the perfect way to grab this kind of information, I just dove in. Anyway, I thought it might be useful as a jumping-off point for anyone else looking to do the same thing with a batch of street addresses. Tweak and adjust to whatever suits your needs; as always, your mileage may vary.

I started with a source file that looks a little like this (a dump from the address table - I actually had about 10K records):

id|location_code|name|address1|city|state|postalcode|country
3394|70A|Arrowhead Stadium|1 Arrowhead Dr.|Kansas city|MO|64129|US
77|ARC|Arco Arena|1 Sports Parkway|Sacramento|CA|95834|US
2074|BW1|Burswood Dome|Great Eastern Highway|Perth|WA|6100|AU
2200|LIH|Limburghal|Jaarbeurslaan 6|Genk|Limburg|3600|BE
3199|12J|London Arena|Limeharbour|London|Eng.|E14 9TH|GB
1471|GDE|Rose Garden|One Center Court Suite 200|Portland|OR|97227|US

The script uses a few standard libraries, plus minidom to parse the XML:

import os, urllib, string, fileinput, sys
from xml.dom import minidom

source = 'venues.txt'
target = 'updates.txt'

def processfile(name):
    """ 
    Parse address data from source file
    Generate API requests
    Write updates to include lat/lng values
    """
    records = []
    for line in fileinput.input(name):
        if fileinput.isfirstline():
            columns = line.split('|')

        if not fileinput.isfirstline():
            try:
                line[1]
                # Strip out characters that could cause errors on API request
                line = line.replace('\n', '').replace(".", "")
                line = line.replace(" ", "+").replace("&", "and")
                address = line.split('|')
                # In my case, I needed the venue name in its original form for
                # inclusion in the UPDATE, so I'm returning spaces to field 3
                address[2] = address[2].replace("+", " ")
                # Rebuild each address
                newaddress = []
                newaddress.append(address[3])
                newaddress.append(address[4])
                newaddress.append(address[5])
                newaddress.append(address[6])
                newaddress.append(address[7])
                newaddress.append(address[2])
                # Then append each address back to a new record set
                records.append(newaddress)
            except IndexError:
                print "emptyline"

    requesturl = 'http://maps.googleapis.com/maps/api/geocode/xml?address='

    for row in records:

        # Generate the request URL
        # Example: 
        # http://maps.googleapis.com/maps/api/geocode/xml?address=1+ \
        #        Arrowhead+Dr,Kansas+city,MO,64129,US&sensor=false

        request = '%s%s,%s,%s,%s,%s&sensor=false' %(requesturl,row[0],row[1],
                    row[2],row[3],row[4])

        # Make the request and parse the XML it returns
        dom = minidom.parse(urllib.urlopen(request))
        try:
            # Get values for the 'lat' and 'lng' nodes
            lat = dom.getElementsByTagName('lat')[0].firstChild.nodeValue
            lng = dom.getElementsByTagName('lng')[0].firstChild.nodeValue

            # I also needed to strip the operators out of 2-word city names
            row[1] = row[1].replace("+", " ")

            # And generate UPDATE statements
            update = ('UPDATE venues SET lat="%s", lng="%s" WHERE name="%s"' +
                      'AND city="%s" AND country="%s";\n') %(lat,lng,row[5],
                      row[1],row[4])

        except IndexError:
            # If lat/lng values are not returned in the XML
            update = ('/* ZERO_RESULTS WHERE name="%s" AND city="%s" AND ' +
                      'country="%s" */\n') %(row[5],row[1],row[4])

        with open(target, "a") as f:
            f.write(update)

    return records

def main():
    addresses = processfile(source)

if __name__ == "__main__":
    main()

You'll notice that I have my UPDATE statements writing to a text file, instead of connecting to a database and updating directly - that's just because I wanted to review the updates by hand before running them against my table. But you could just as easily do something like this in place of the 'update' line:

            import MySQLdb as Database
            db = Database.connect("myhost", "myuser", "mypasswd", "mydbname")
            cursor = db.cursor()
            sql = """UPDATE venues SET lat='%s', lng='%s' WHERE name='%s' 
                     AND city='%s' AND country='%s'""" %(lat,lng,row[5],
                     row[1],row[4])
            cursor.execute(sql)
            cursor.close()
            db.close()