Guardian Game Reviews – with Video Trailers

In what I intend to be the last post in the series for a while – maybe – here’s a quick hack around the Guardian Open-Platform Content API that shows how to “annotate” a particular set of articles (specifically, recent video game reviews) with video trailers for the corresponding game pulled in from Youtube, and then render the whole caboodle in a Grazr widget.

So let’s begin…

Take one Guardian API query: we construct the URI and call the webservice:

Here’s a typical result:

As with the football results map pipe, the linkj-text looks like a good candidate for search query text. So let’s tweak it for use as a search on somewhere like Youtube:

I want to be able to let the user choose trailer or review videos, which is why I cleared the search string of the word “review” first, before adding the user preference back into the string.

Now run a Youtube GData/API search using an old pipe block I found laying around (Youtube video search pipe) and grab the top result:

Now I happen to know that if you give a Grazr widget a feed with enclosure.url attributes that point to a flash file, it will embed the flash resource for you:

So now we can take the RSS output of the pipe and pop it into a Guardian game review with video previews Grazr widget:

(The widget layout is customised as described in part in Setting the 3 Pane divider positions.)

If you want to grab the widget and embed it your own webpages, it’s easy enough to do so (although not on hosted WordPress blogs). Simply click on “Share” and select “Customize” from the widget menu bar:

Then you can customise the widget and grab an embed code, or a link to a full screen view:

Good, eh? ;-)

PS Grazr (which is actually an OPML viewer as well as an RSS viewer) embeds other stuff too. For example, here it as as a slideshare viewer; and here it is showing how to automatically generate embedded players for Youtube videos and MP3 audio files using delicious Feed Enhancer – Auto-Enclosures pipe. (If you’re into a bit of hackery, it’ll carry Scribd iPaper too: Embed Scribd iPaper in Grazr Demo. I’m guessing you should be able to get it to embed arbitrary flash games as well?)

Tinkering with the Guardian Content API – SerendipiTwitterNews

Okay, so here I am again, again, trying yet again to get y’all to see just what I was on about with serendipitwitterous

So think of it this way – you’re on the Tube (route planning courtesy of a hack you knocked up using the TfL API), peering over the shoulder of someone reading the Guardian or the Observer, and an interesting headline catches your eye… You start to read it…. Serendipity has worked her charms again…

Got that? That’s what this new edition pipe/feed hack type thing lets you do – discover interesting news stories from the Guardian (or Observer) courtesy of a little pipe that does term extraction on your latest tweets and then runs them as searches on the OpenPlatform API, with the “/technology” filter set, to turn up news stories from the last 10 days or so that may be loosely related to something you tweeted about…

So here’s where we start – grab your most recent tweets, via a twitter RSS feed:

Run each tweet through the Yahoo term extraction service, and filter out null entries (i.e. ones that donlt contain any text) and any duplicates:

As a result, you might get something like this:

We’re now going to run those extracted terms through the Guardian Open Platform API. Firstly, we need to construct the RESTful URL that will call the service:

You’ll maybe notice a couple of filters in there? One limits stories to articles that have been tagged as Technology stories; the other limits the search to reasonably fresh stories (that is, stories with a datestamp from the last 10 days).

We now use these URIs to call the web service and get some XML back; when testing the service, I noticed some errors that appeared to be caused by the API limiting the number of calls per second to the service (it is still early beta days of testing, after all), so I’m limiting the number of calls I make in rapid succession to just a few queries:

Finally, we map the relevant parts of the XML response to RSS item elements:

And there we have it – a Yahoo pipe that does crude term extraction on your most recent tweets and uses them as query terms for recent tech stories in a search on the Guardian OpenPlatform Content API:

And I call it Guardian Tech SerendipiTwitterNews, okay? ;-)

“JOIN”ing Data from the Guardian Data Store Student Satisfaction Data

I really should make this the last post on the Guardian data store for a while, because I’ve got a stack of other things I really ought to be doing instead, but it struck me that the following demo might open up some peoples eyes as to what’s possible when you have several data sets that play nicely…

It shows you how to take data from two different spreadsheets and link it together to create a third data set that contains elements from the two original ones. (If you just want to cut to the end, here’s a (visualised) reason why it might not be such a happy idea to go to Southampton Solent if you want to study Architecture or Planning: Student Happiness on Planning and Architecture courses. Now ask yourself the question: how would I (err, that is, you) have produced that sort of chart?)

This whole idea is in and around the area of Linked Data, but doing it the hard way…

If you don’t know what Linked Data is, watch this:

So let’s have a look at the Guardian University Tables/ Satisfaction Data that has been uploaded to Google Spreadsheets:

You’ll notice there are lots of sheets in there, covering the different subject areas – like “Architecture”, for instance:

Importantly, the format of the names of the institutions is consistent across spreadsheets.

So I was wondering – if I was a student wanting to study either planning or architecture, how could i find out which institutions had a good satisfaction rating, or low student to staff ratio, across both those subjects? (Maybe I’m not sure which subject I want to do, so if I choose the wrong one and try to switch course, I know I’m not going to switch into a duff department.)

That is, I’d like to be able to see a single table containing the data from both the overall results table as well as the Architecture and Planning tables.

Now if the data was in a database, and if I spoke SQL, this would be quite easy to do (hint condition: look up sql JOIN). But I only have my browser to hand, so what to do…?

Dabble DB provides one answer… How? Here’s how…

Start off by creating a new application:

I’m going to seed it with a table containing the names of HEIs as listed in column B of the overall data table by importing just that column, as CSV data, from the Google spreadsheet:

Pull the data in:

So now we have the data:

Okay – let’s import a couple more tables, say the data for Planning and Architecture areas.

First, Planning – here’s the CSV:

Click on “More…” and you’ll be offered the chance to Add a New Category.

Take that opportunity:-)

You hopefully get the idea…

Exactly as before…

Now do the same for the Architecture data:

So now i have three tables – known as categories – in Dabble DB.

Let’s link them… that is, let’s make the data from one category available to another.

Firstly, I’ll link the Architecture data to the table that just lists the HEIs – click on the Name of Institution column to pop-up a menu and select Configure:

We’re going to Link the column to Entries in another one:

in particular, we’re going to tell Dabble DB that the Names of Institutions in the Architecture table are the same things as the institions in the HEI category/table:

If you look at the HEIs category, you’ll see the Architecture column has been brought in:

We can now do the same for Planning (remember, pop up the Name of Institution menu and Configure it to Link Entries).

The next step is to pull in some data from the two linked categories. How about we compare the Teaching Satisfaction scores for these two subjects?

Click on the column header for one of the linked categories – say the planning one, select Add Derived Field and then the field you want to pull in:

The data gets pulled in…

(Oops – this is all a bit sparse; maybe I should have used a different filed, such as Average Entry Tariff? Never mind, let’s push on…)

Add the corresponding derived field for the Architecture courses:

If you click on the “unsaved View” label, you can save this data table:

To tidy up the table, let’s hide the duplicated Name columns and resave:

To give something like this:

A nice feature of Dabble DB is that is makes it easy to export data from any given view:

So if we grab the CSV URI:

We can take it to, I dunno, Many Eyes Wikified?

Here it comes…:

Many eyes wikified data

Create a placeholder for a visulisation (the data page is ousefulTestboard/StudentHappinessPlanningArchitecture):

Or just type the text yourself:

Click through to create the viz:

We’ll have the scatter plot:

The empty cells in the data columns may cause Many Eyes Wikified to think the data is Text – it’s not, it’s Number:

Now customise the view… I could just have every spot the same, but Architecture is my first preference, so let’s just highlight the places where students are happiest doing that subject (click through to play with the visualisation):

UCL seems to do best:

So here’s a recap – we’ve essentially JOINed data from two separate spreadsheets from Google Spreadsheets to create a new data table in Dabble DB, then visualised it in Many Eyes.

Can you see now why privacy hacks don’t like the idea of linked data in government, or across companies?

So here’s you’re weekend homework – create a data set that identifies entry requirements across the various engineering subject areas, and try to find a way of visualising it ;-)

Creating Google Charts From CSV Data Inside a Yahoo Pipe

One of the problems with processing CSV data using a Yahoo pipe – as described in Filtering Guardian Data Blog/Google Spreadsheet Data With Yahoo! Pipes – is that the RSS output of the pipe is not very well formed. It’s fine for outputting as CSV, or even JSON, but it’s not so hot as RSS.

So in this post, I’ll show you how to get a proper RSS feed out, with a meaningful title and sensible description for each item, and as an added bonus, a chart image generated from the CSV data using the Google Chart API.

So to start, let’s grab some CSV (I’m using the Guardian Data Store RAE 2008 data on Google Spreadhseets again), name the columns ourselves, and generate a sensible title for each feed item:

For the item description data, I’m going to create a simple listing of the data values:

(Don’t worry about where the chartURI came from just for the moment…)

What this loop element does is create a string for each item in the feed, and then output is as the description element for the corresponding item. The data elements within each item are displayed in a faux table within the description. (Note that I could have constructed a proper HTML data table had I wanted to…)

So here’s description (without the chartURI image element shown):

That is:

So that’s part 1 – constructing title and description elements for each feed item, where a feed item is derived from a CSV row.

Now let’s see how to create a chart image from the data using the Google Chart API.

The Google Charts API provides a RESTful API to a chart generating service – which is to say: you can give it a URI containing some data and specifying a chart type, and it will give you an image back in return, with your data graphed appropriately (see also: RESTful Image Generation – When Text Just Won’t Do).

So for example, here’s a pie chart:

And here’s its URI:×300&chd=t:10,10,45,35,0&chl=1*|2*|3*|4*|unclassifed&chtt=RAE2008%20(Open%20University%20)%20-%20Art%20and%20Design

So using our EYES, we can see different things in that URI:

  • the domain that serves the image:
  • the chart type: cht=p
  • the chart size/dimensions: &chs=450×300
  • the chart data: &chd=t:10,10,45,35,0
  • the chart labels: &chl=1*|2*|3*|4*|unclassifed
  • and the chart title:

And now we can construct URIs of that form within the pipe:

Now I do a couple o fbits of tidying up – the chartURI string just needs making into a nice URI – replace the spaces with their ASCII code representation, for example:

As we currently have no link element defined for the feed items, lets use the chart URI:


So there we have it – and RSS feed from the pipe containing the “tabulated” data from each CSV row, and a pie chart to display it:

From the pipe RSS feed URI:
you’ll notice the search term for the particular HEI you’re interested in, which means you should be easily able to hack it to give the appropriate feed for your own HEI:-)

Filtering Guardian Data Blog/Google Spreadsheet Data With Yahoo! Pipes

In Using Many Eyes Wikified to Visualise Guardian Data Store Data on Google Docs I showed how to pull data from Google spreadsheets (uploaded there by the Guardian as part of their Data Store initiative and visualise it using Many Eyes Wikified.

Unfortunately, one of the downsides of using Many Eyes Wikified is that you can’t filter the imported data or select subsets of rows from it (nor can you choose to just ignore particular columns in the visualisation editor – which would be a really handy thing to be able to do, and probably quite easy to implement?)

So for example, when looking at the RAE 2008 Data, it might be quite useful to be able to just visualise the data for a single institution, such as the Open University. But how can we do this?

One way would be to find a way of searching the data in the spreadsheet, and then only extracting the rows that contained the search term – such as “Open University” – in the institution name column. But I don’t know if that’s possible in Google Spreadsheets, (though it is possible in Zoho DB, which supports all manner of queries in SQL like dialects; but that’s for another day…;-).

An alternative way is to grab the whole of the spreadsheet and then just filter out all the rows that don’t contain the filter term in the desired column… which is an idea that came into my mind last night on my way home, and which appears to work quite well, as this Filtering UK RAE 2008 CSV Data from Google Docs pipe shows:

If your pipe imported a CSV file, as this one does (in fact, if the items being pushed out of the pipe have a y:row attribute set giving a row/count number for each item) then you can get a CSV file out of the pipe too:

Which in this case means we can filter through just the RAE 2008 data for a particular institution, grab the CSV URL for that data, and pull it into Many Eyes wikified in the same way as described before:

Here it is:

And once we have a wiki data page for it, we can visualise it – I’ve popped some examples up here: OU RAE 2008 performance.

For example, how about a bubble diagram view showing our 4* performance across the various units of assessment:

(The number is the percentage of submissions in that unit of assessment achieving the specified grade.)

Or how about this one – an interactive bar chart showing the percentages of 3* and 4* submissions in each unit of assessment:

If you look back at the pipe interface, you’ll see you can use the same pipe to pull out the data for any of the HEIs:

The pipe output CSV URI also makes this clear:

In the next post on this topic, I’ll show you how to create a rather more interesting RSS feed from this sort of pipe, including some Google chart URIs created dynamically within the pipe…

But for now, it’s time to go foraging for dinner…

Using Many Eyes Wikified to Visualise Guardian Data Store Data on Google Docs

Last week, I posted a quick demo of how to visualise data stored in a Google spreadsheet in Many Eyes Wikified (HEFCE Grant Funding, in Pictures).

The data I used was the latest batch of HEFCE teaching funding data, but Joss soon tweeted to say he’d got Research funding data up on Google spreadsheets, and could I do something with that? You can see the results here: Visualising UK HEI Research Funding data on many Eyes Wikified (Joss has also had a go: RAE: UK research funding results visualised).

Anyway, today the Guardian announced a new content API (more on that later – authorised developer keys are still like gold dust), as well as the Guardian data store (strapline: “facts you can use”) and the associated Data Store Blog.

Interestingly, the data is being stored on Google docs, in part because Google spreadsheets offer an API and a wide variety of export formats.

As regular readers will know, one of the export formats from Google spreadsheets is CSV – Comma Separated Variable data – which just so happens to be liked by services such as Dabble DB and Many Eyes. I’ll try to come up with a demo of how to mash-up several different data sets in Dabble DB over the next few days, but as I’ve a spare half-hour now, I thought I’d post a qiuck demo of how to visualise some of the Guardian data store spreadsheet data in Many Eyes Wikified.

So to start, let’s look at the the RAE2008 results data – University research department rankings (you can find the actual data here:

If you speak URL, you’ll know that you can get the CSV version of the data by adding &output=csv to the URL, like this:

Inspection of the CSV output suggests there’s some crap at the top we don’t want – i.e. not actual column headings – as well as the the end of the file:

(Note this “crap” is actually important metadata – it describes the data and its provenance – but it’s not the actual data we want to visualise).

Grabbing the actualt data, without the metadata, can be achieve by grabbing a particular range of cells using the &range= URL argument. Inspection of the table suggests that meaningful data can be found in the columnar range of A to H; guesswork and a bit of binary search identifies the actual range of cell data as A2:H2365 – so we can export JUST the data, as CSV, using the URL

If you create a new page on Many Eyes Wikified, this data can be imported into a wiki page there as follows:

We can now use this data page as the basis of a set of Many Eyes visualisations. Noting that the “relative URL address” of the data page is ousefulTestboard/GuardianUKRAERankings2008 (the full URL of the wikified data page is, create a new page and put a visualisation placeholder or two in it:

Saving that page – and clicking through on the visualisation placeholder links – means you can now create your visualisation (Many Eyes seems to try to guess what visualisation you want if you use an appropriate visulisation name?):

Select the settings you want for you visualisation, and hit save:

A visualisation page will be created automatically, and a smaller, embedded version of the visualisation will appear in the wiki page:

If you visit the visualisation page – for example this Treemap visualisation, you should find it is fully interactive – which means you can explore the data for yourself, as I’ll show in a later post…

See more examples here: RAE 2008 Treemap; RAE 2008 Bubble Diagram; RAE 2008 bar chart.