It’s all very well when someone like the New York Times opens up an API to campaign finance data (announced here), but only the geeks can play, right? It’s not like they’re making the data available in a spreadsheet so that more people have a chance to try and do something with it, is it? Err, well, maybe this’ll help? A way of getting data out from the NYT API and into an online Google spreadsheet, where you can do something with it… (you can see an example here: campaign data in a Google Spreadsheet (via The New York Times Campaign Data API)).
Following up on a comment to Visualising Financial Data In a Google Spreadsheet Motion Chart by Dan Mcquillan: “FWIW, there’s a lot of campaigning potential in these tools”, I just had a quick tinker with another Google spreadsheet function – importXML – that pulls XML data into a Google spreadsheet, and in particular pointed it at the newly released New York Times Campaign Finance API.
To use this API you need to get yourself a key, which requires you to register with the New York Times, and also provide a URL for your application: create a new Google spreadsheet, and use the URL for it (it’ll look something like http://spreadsheets.google.com/ccc?key=p1rHUqg4g4223vNKsJW8GcQ&hl=en_GB) as the URL for your NYT API app.
First up, enter “API KEY” as a label in a cell – I used G22 – and paste your New York Times API Campaign Finance API key for that spreadsheet into the cell next to it – e.g. H32. (My spreadsheet was cobbled together in scratchpad mode, so the cells are all over the place!)
In cell G23 enter the label “Names URL” and in cell H23 the following formula, which constructs a URL for the “Candidate Summaries”:
[aaargghhh – crappy styylesheet… the formula is this, but all on one line:
(The formula concatenates a NYT API calling URL with your API key.)
We’re now going to pull in the list of candidates: in e.g. cell A24 enter the formula:
H23, remember, is a reference to the cell that contains the URL for the API call that grabs all the candidate summaries. The =importXML grabs the XML result from the API call, and the “//candidate” bit looks through the XML file till it finds a “candidate” element. If you look at the XML from the webservice, you’ll see it actually contains several of these:
The spreadsheet actually handles all of these and prints out a summary table for you:
All we did was configure one cell to grab all this information, remember: cell A24 which contains =ImportXML(H23,”//candidate”).
We’re now going to pull in more detailed campaign info for each candidate using the “Candidate Details” API request, which looks something like http://api.nytimes.com/svc/elections/us/v2/president/2008/finances/candidates/romney.xml
(The (lack of) capitalisation of the surname of the candidate in the URL is not necessary – it seems to work okay with “Romney.xml” for example).
To construct the URLs for each candidate, let’s write a formula to construct the URL for one of them.
In my spreadsheet, the Candidate Summary import in cell A24 built a table that filled cells A24:E42, so we’re going to start working in additional columns alongside that table.
You’ll notice that the URL for the detailed report just requires the candidate’s surname, whereas the candidate summary provides the forename as well.
In cell F24, add the following formula: SPLIT(A24, “,”)
What this does is split the contents of cell A24 (e.g. Obama, Barack) at the comma, and populate two cells (in columns F and G) as a result:
If you highlight the cell (F24), click on the little blue square in the bottom right hand corner, and drag down, you should be able to magically copy the formula and so spilt all the candidates’ names.
Now what we need to do is construct the URL that will pull back the detailed campaign information for each candidate. We’ll build the URLs in the next free column along (column H in my spreadsheet – the split name is across columns F and G).
In cell H24, enter the following formula (all on one line – split here for convenience):
H$22 refers to the cell that contains your NYT API key. The $ is there to anchor the row number in the formula, so when you drag the cell to copy it, the copied formula still refers to that exact same cell.
Highlight cell H24 and drag the little blue square down to complete the table and construct URLs for each candidate.
Good stuff :-)
Now we can construct another formula to import the full campaign details for each candidate.
in cell A2 in my spreadsheet, I entered the following:
H24 is the URL for the candidate details API call for the first candidate (Barack Obama in my case).
Hopefully, you should get a row of data pulled into the spreadsheet:
The XML file responsible looks something like this:
Use your eyes and work out what the contents of each cell might refer to:
Click on cell A2, and drag the blue square down to A20. The contents of cell A2 will be copied in a relative way, and should now be importing detailed information for all the candidates.
Now you have the data in a form you can play with :-)
And it should update whenever the New York Times updates the data it exposes through the API.
(I would publish the spreadsheet in all its glory, but then I’d have to give away my API key… Ah, what the heck = here it is: Spreadsheet of Campaign Finances, courtesy of the New York Times. (Maybe I should read the T&C to check this is a fair use…!))
For other New York Times Campaign Data API functions, see the documentation. You should know enough now to be able to work out how to use it…
PS the “//candidate” XPATH stuff can be tricky to get your head round. If anyone wants to post as a comment a list of XPATH routines to pull out different data elements, feel free to do so :-)
If you make use of the data in any interesting ways, please link back here so we can see what you’re getting up to and how you’re using the data…