Coding for Good: Working with the Sunlight Labs APIs

Aug 29, 2012 Python

If you're looking to get your feet wet when it comes to working with open U.S. government data, I can think of no better place to start than with the Sunlight Laps APIs. They're not kidding when they say that using their APIs is absurdly easy.

Sunlight Labs is a project of the Sunlight Foundation, an organization that has been working for several years to access public government data** - the kind of data that is freely available on state and federal web sites, but that is buried behind a Byzantine series of links or is just poorly formatted for analytical use. Sunlight has done the hard work of finding that data and collecting it, and Sunlight Labs has created the tools that make it accessible for all of us to use.

**(Their other projects include Sunlight Reporting Group, Sunlight Live and the Open House Project.)

Currently they provide five APIs accessible with Python:

  1. Sunlight Congress API: returns information about legislators at the federal level
  2. Open States API: exposes similar information at the state level
  3. Capitol Words API: gives you a look at the most-used words in Congressional sessions
  4. Transparency Data API: specific data sets, such as campaign contributions and lobbying records
  5. Real Time Congress API: data such as floor updates, committee hearings, floor video, bills, votes, amendments, and various documents

This script example uses the openstates API to

  • get all available data about legislators at the state level
  • parse out only what's needed to do a summary count of party affiliations per state
  • and return that information as:
    • a JSON object that can be used for visualizations
    • a table suitable for embedding into an HTML page

To use any of the libraries, you'll first need to get an API key:

http://services.sunlightlabs.com/accounts/register/

It only took a few minutes for my key to arrive in the mail. Once you've got it, you have a few options for setting it (I used ~/.sunlight.key):

http://python-sunlight.readthedocs.org/en/latest/index.html#usage

Then install the sunlight module (this won't apply to the Transparency Data and Real Time Congress APIs) using either pip install or checking out the project from Github:

http://python-sunlight.readthedocs.org/en/latest/index.html#installation

With all that done, you're ready to go. Let's pop open an interpreter and play around with the given example:

>>> import sunlight
>>> nc_legs = sunlight.openstates.legislators(state='nc')

As you'll see, this returns a list of dicts, each dict containing a lot of publicly availably information - such as name, district, office address, party affiliation, in some cases even a picture - about each state legislator in the state of North Carolina:

>>> nc_legs
>>> [{u'leg_id': u'NCL000242', u'first_name': u'Barbara', 
u'last_name': u'Lee', u'middle_name': u'', 
u'district': u'12', u'chamber': u'lower', u'url': 
u'http://www.ncga.state.nc.us/gascripts/members/viewMember.pl?sChamber=House&nUserID=634', 
u'created_at': u'2012-08-10 02:06:05', u'updated_at': u'2012-08-29 02:09:04', 
u'email': u'Barbara.Lee@ncleg.net', u'+notice': u'[\xa0Appointed\xa008/06/2012\xa0]', 
u'state': u'nc', u'offices': [{u'fax': None, u'name': u'Capitol Office', 
u'phone': u'919-733-5995', 
u'address': u'NC House of Representatives\n300 N. Salisbury Street, Room 613\n\nRaleigh, NC 27603-5925', 
u'type': u'capitol', u'email': None}], 
u'full_name': u'Barbara Lee', u'active': True, u'party': u'Democratic', 
u'suffixes': u'', u'id': u'NCL000242', 
u'photo_url': u'http://www.ncga.state.nc.us/House/pictures/hiRes/634.jpg'}, 
...

One simple but powerful API call and we've already got so much information at our fingertips. So what can we do with all that data? Well, since the ultimate goal is to get a count of party affiliations per state, let's start by creating a list of state abbreviations. Then for each state in that list, we can make the same API call to get all the legislative data, and write a subset of that data - the state, the representative's full name, and their party affiliation - to a new dict.

states = ["AL", "AK", "AZ", "AR", ...]

def find_state_reps():
  # Start by instantiating the new dict:
  statereps = {}

    for s in states:
      legs = sunlight.openstates.legislators(state=s)
      # If you print 'legs', you'll see a dict with loads of
      # contact information for each state representative.
      # For my purposes, I'm only collecting name and
      # party affiliation.

      # This dict will hold {name:party} pairs for each state
      l = {}
      for leg in legs:
        name = leg['full_name']
        try:
          party = leg['party']
        except KeyError: # In some cases, 'party' is missing
          party = None
        l[name] = party
      statereps[s] = l

    # At this point, the 'statereps' dict contains:
    # {'state':{'rep_name':'party_affiliation'}}
    # for each state.

But you know what? Sunlight Labs is providing this API as a free resource, and I don't want to take advantage of their hard work by pounding their servers with a new set of 50 requests every time I run this script. So I'm going to write the dict to a file so that data doesn't have to be pulled from the API again.

    outfile = 'state_reps_list.txt'
    f = open(outfile, 'w')
    f.write(str(statereps))
    f.close()

Now, as I'm developing, I can just check to see if I have that file in place and use the dict from there. And when it's time to refresh the data, I can just delete the file and hit the API again to rebuild the statereps dict from scratch:

  import os.path

  f = os.path.exists(outfile)

  # If we've already got the list stored in a file, 
  # just refer to that file
  # instead of hitting the API again:

  if f:
    # Get the file content and return it as the statereps dict
    f = open(outfile, 'r')
    statereps = eval(f.read())
    f.close()
  else:
    # Hit the API for the data
    ...

  return statereps

My statereps dict looks something like this, but obviously contains a lot more data:

{
 'WA': {u'Bruce Chandler': u'Republican', u'Derek Kilmer': u'Democratic', ...}, 
 'WV': {u'Mike Green': u'Democratic', u'Mark Wills': u'Democratic', ...}, 
 ...
}

Now I can pass that data into another function that returns the summary count of party affiliations among state legislators per state (e.g., state: dems=x, repubs=y, other=z):

import re

def partycount(reps_dict):

  partycount = {}

  for s in reps_dict:

    # Create lists to hold the party members on a per-state basis:
    demlist = []
    replist = []
    otherlist = []

    for k in reps_dict[s]:
      # s -> state abbreviation
      # k -> full name
      # reps_dict[s][k] -> party affiliation

      if reps_dict[s][k]:
        # Use the re module to determine if either of these strings
        # appears in the party affiliation value

        dem = re.search('Dem', reps_dict[s][k])
        rep = re.search('Repub', reps_dict[s][k])

        # And funnel those values into the appropriate lists
        if dem:
          # If the legislator's party affiliation contains the substring 'Dem',
          # add their name to the 'dem' list:
          demlist.append(k)
        elif rep:
          # If the legislator's party affiliation contains the substring 'Rep',
          # add their name to the 'rep' list:
          replist.append(k)
        else:
          # If neither substring appears in the legislator's party affiliation,
          # add their name to the 'other' list
          otherlist.append(k)
    c = {}
    # Get the length of each list and you have a count of
    # dems vs. repubs vs. other for this state:
    c['Democrats'] = len(demlist)
    c['Republicans'] = len(replist)
    c['Other'] = len(otherlist)
    partycount[s] = c

  return partycount

And now we've got (yet another) dict that looks like this:

{
 'WA': {'Republicans': 64, 'Other': 0, 'Democrats': 83}, 
 'DE': {'Republicans': 22, 'Other': 0, 'Democrats': 40},
 'DC': {'Republicans': 0, 'Other': 2, 'Democrats': 10},
 'WI': {'Republicans': 74, 'Other': 1, 'Democrats': 55}, 
 ...
}

Before I return that partycount dict, I can insert this somewhat ugly bit of code into the function to generate an HTML page with all that data embedded in a table:

  # This count data could just as easily be output as
  # a template context object, or printed to stdout
  output = "<html><body><table>"
  output += "<tr><td><b>STATE</td><td><b>Republicans</b></td> \
             <td><b>Other</b></td><td><b>Democrats</b></td></tr>"

  # Let's sort the keys while we're at it, 
  # so the states appear in alphabetical order:
  for key in sorted(partycount.iterkeys()):
    output += "<tr><td align='center'>%s</td>" % (key)

    for k in partycount[key]:
      output += "<td align='center'>%s</td>" % (partycount[key][k])
      percentlist.append(partycount[key][k])

    output += "</tr>\n"
  output += "</table></body></html>"

  f = open('redvblue.html', 'w')
  f.write(str(output))
  f.close()

One other thing - I can also take that first statereps dict and convert it to json - that might be handy for doing visualizations down the road:

import simplejson as json

def converttojson(reps_dict):
  """
  Take a dict object and convert it to JSON
  """
  result = json.dumps(reps_dict, sort_keys=False, indent=4)
  return result

Some resources for doing visualizations with the resulting JSON object:

Here are a few more things that I could see adding to this script:

  • Add a line reading "Data current as of [date]" to the top of the html - use the filesystem date of the 'state_reps_list.txt' file (or the current date if you're getting the data fresh from the API):
    datetime.datetime.fromtimestamp(os.path.getmtime(outfile))
        
  • Get unemployment data (source: US Department of Labor, Bureau of Labor Statistics) and compare on a per-state basis to see if there is any correlation between unemployment rates and dominance of any particular party at the state level:

    http://www.bls.gov/web/laus/laumstrk.htm

  • Use the Transparency Data API to see how campaign contributions compare from state to state

My complete script, minus the changes mentioned above (which I have already implemented locally) can be found here:

https://gist.github.com/3501470

And incidentally, here's that table output:

RepublicansOtherDemocrats
AK34026
AL87151
AR61074
AZ61128
CA42177
CO48052
CT660121
DC0210
DE22040
FL109050
GA147188
HI9067
IA84066
ID85020
IL78099
IN97053
KS123041
KY63174
LA82260
MA370161
MD550133
ME97385
MI90058
MN109091
MO132164
MS95079
MT95054
NC98071
ND106038
NE0490
NH3100109
NJ48072
NM47164
NV26037
NY2716124
OH82050
OK100048
OR44046
PA1390111
RI18194
SC102067
SD80124
TN85047
TX120061
UT80024
VA87152
VT434133
WA64083
WI74155
WV41093
WY76014

State legislative data current as of 2012-08-29 11:48:26