Another Olympics, so another chance to look at how to construct a real time medal map using the tricks and techniques that are available to us today, cf. the approach I took in 2008. In this post, I’ll show how to create a medal map using Wikipedia data and Google spreadsheet query functions in Google Apps Standard Edition (this will also work in Google Apps for Edu, Business and so on, but not (currently) in the ‘everyday’ edition of Google spreadsheets that you get with a personal Google account. Why not? Because I make use of Google Apps Script, that’s why…)
To start with, let’s get some data. Wikipedia is surprisingly good as a near realtime data source for large sporting events, so let’s see what’s on the Winter Olympics 2010 medals page:
We can pull this data into a Google spreadsheet using the =importHTML() formula:
***NOTE – THE PAGE LAYOUT HAS CHANGED AND WITH IT TABLE NUMBER. THE FORMULA SHOULD NOW READ: [Update: bah – back to table 3 again…oh, the joys or arbitrary HTML scraping!)
In order to plot medals in a “heat map” using the Google heatmap widget:
we need to provide two columns of data: an ISO country code and the number value we want to associate with that country code.
Looking at the medals table we have imported, we see that we have country name and a non-ISO country code.
Let’s start by extracting the country name using a Google Apps Script (tutorials). These scripts are user defined formulae that you can create and then use in the same way that you would use any of the normally provided spreadsheet formulae. Open the script editor from the spreadsheet Tools menu:
Write your function(s):
then call them in the spreadsheet:
The next step is to get the ISO country code for each country. I’m going to do this using a QUERY formula that interrogates a list of ISO country codes in another sheet (see Using Google Spreadsheets Like a Database – The QUERY Formula for more on the Google Spreadsheets QUERY function):
At its heart, a particular QUERY will look something like this:
=QUERY('ISO Country Codes'!A2:B268,"select A,B where A contains 'FRANCE' limit 1")))
The limit 1 ensures we only return a single result.
To construct the QUERY from a country name specisifed in a cell, we embed a CONCATENATE formula to construct the select string:
=QUERY('ISO Country Codes'!A2:B268,concatenate("select A,B where A contains '",upper(H2),"' limit 1")))
The contains operator is case sensitive, so as we have country names in all upper case in our ISO sheet, we convert the country name pulled in from the medal table to upper case as well. (I couldn’t get the more appropriate matches expression to work?)
So now we can get a column of country names (again) along with the corresponding ISO country code:
(The IF statement is there to keep things tidy.)
If we now copy across medal data into a column alongside the coutnry code column, we can create a heat map widget:
This widget is also available as a Google gadget:
Although there is some latency in the system, as changes are made to the original WIkipedia page, the Google spreadsheet should occasionally check it as and when the spreadsheet – or the heat map – is viewed.
So there you have it, a real time Medal Map, with data pulled into a Google spreadsheet from a Wikipedia page using an importHTML function, tidied using a Google Apps script function, used to key a QUERY in an ISO codes spreadsheet/database, and then rendered via a third party Google gadget.
Overall hack time – about half an hour, first time round. Time to write this post – over an hour…
PS see also Data Scraping Wikipedia with Google Spreadsheets