Filtering Guardian Data Blog/Google Spreadsheet Data With Yahoo! Pipes

In Using Many Eyes Wikified to Visualise Guardian Data Store Data on Google Docs I showed how to pull data from Google spreadsheets (uploaded there by the Guardian as part of their Data Store initiative and visualise it using Many Eyes Wikified.

Unfortunately, one of the downsides of using Many Eyes Wikified is that you can’t filter the imported data or select subsets of rows from it (nor can you choose to just ignore particular columns in the visualisation editor – which would be a really handy thing to be able to do, and probably quite easy to implement?)

So for example, when looking at the RAE 2008 Data, it might be quite useful to be able to just visualise the data for a single institution, such as the Open University. But how can we do this?

One way would be to find a way of searching the data in the spreadsheet, and then only extracting the rows that contained the search term – such as “Open University” – in the institution name column. But I don’t know if that’s possible in Google Spreadsheets, (though it is possible in Zoho DB, which supports all manner of queries in SQL like dialects; but that’s for another day…;-).

An alternative way is to grab the whole of the spreadsheet and then just filter out all the rows that don’t contain the filter term in the desired column… which is an idea that came into my mind last night on my way home, and which appears to work quite well, as this Filtering UK RAE 2008 CSV Data from Google Docs pipe shows:

If your pipe imported a CSV file, as this one does (in fact, if the items being pushed out of the pipe have a y:row attribute set giving a row/count number for each item) then you can get a CSV file out of the pipe too:

Which in this case means we can filter through just the RAE 2008 data for a particular institution, grab the CSV URL for that data, and pull it into Many Eyes wikified in the same way as described before:

Here it is:

And once we have a wiki data page for it, we can visualise it – I’ve popped some examples up here: OU RAE 2008 performance.

For example, how about a bubble diagram view showing our 4* performance across the various units of assessment:

(The number is the percentage of submissions in that unit of assessment achieving the specified grade.)

Or how about this one – an interactive bar chart showing the percentages of 3* and 4* submissions in each unit of assessment:

If you look back at the pipe interface, you’ll see you can use the same pipe to pull out the data for any of the HEIs:

The pipe output CSV URI also makes this clear:
http://pipes.yahoo.com/pipes/pipe.run?_id=6d9579da965cf16e883e363eb26e9e63&_render=csv&hei=Leicester

In the next post on this topic, I’ll show you how to create a rather more interesting RSS feed from this sort of pipe, including some Google chart URIs created dynamically within the pipe…

But for now, it’s time to go foraging for dinner…

Getting Lots of Results Out of a Google Custom Search Engine (CSE) via RSS

In Getting an RSS Feed Out of a Google Custom Search Engine (CSE), I described a Yahoo! pipe that can be used as a way of getting a RSS feed out of a Google Custom Search Engine using the Google Ajax Search API.

One of the limitations of the API appears to be that it only returns 8 search results at a time. although these can be paged.

So for example, if you run a normal Google search that returns lots of results, those results are presented over several results pages. If you hover over the links for the different pages, and look at the status bar at the bottom of your browser where the link is displayed, you’ll see that the URL for each page of results is largely the same; the difference comes in the &start= argument in the URI that says which number search result should be at the top of the page; something like this:

The same argument – start can be used to page the CSE results from the AJAX API; which means we can add this in to the URI that calls the Google AJAX Search API within a pipe:

This gives us a quick fix for getting more than 8 results out of a CSE: use the get 8 CSE results starting at a given result pipe to get the first 8 results (counted as results 0..7), then another copy of the pipe to get results 9-16 (counted as 8..15 – i.e. starting at result 8), a second copy of the pipe to get results 17-25, and so on, and then aggregate all the results…

Here’s an example – lots of Google CSE results as RSS pipe:

Notice that each CSE calling pipe is called with the same CSE ID and the same search query, but different start numbers.

This resipe hopefully also gives you the clue that you could use the Union pipe block to merge results from different CSEs (just make sure you use the right CSE ID and the right start values!).

Merging Several Calendar iCal Feeds With Yahoo Pipes

Following up on Displaying Events from Multiple Google Calendars in a Single Embedded Calendar View, and picking up on a quip Jim Groom made in the post that started this thread (“Patrick suggested Yahoo Pipes!, you ever experiment with this? “), I did have a quick play with pipes, and this is what I found..,

The “Fetch Feed” block is happy to accept iCal feeds, as this iCal Merge pipe demonstrates:

(I grabbed the iCal feeds from pages linked to from the Stanford events page. A websearch for “ical lectures events” should pull up other sources;-)

If you import an iCal feed into a Yahoo pipe, you get an iCal output format option:

You can then render this feed in an online calendar such as 30 boxes: pipes merged iCal feeds in 30 boxes (here’s the 30 boxes config page for that calendar).

(NB it’s worth noting that 30 boxes will let you generate a calendar view that will merge up to 3 iCal feeds anyway.)

Using the Pipe’s output iCal URL to try to add the merged calendar feed to Google Calendar didn’t seem to work, but when I converted the URL to a TinyURL (http://tinyurl.com/67bg2d) and used that as the import URL, it worked fine.

Do this:

then this:

and get this:

(I couldn’t get the Yahoo pipe iCal feed to work in iCal on my Mac, nor could I resyndicate the feed from the Google Calendar. I think the problem is with the way the Pipes output URL is constructed… which could be worked around by relaying/republishing the Pipe iCal feed through something with a nice URL, maybe?)

That okay for you, Reverend? :-)

PS having to add the feeds by hand to the pipe is a pain. So how about if we list a set of iCal feeds in an RSS feed (which could be a shared bookmark feed, built around a common tag), then pull that bookmark feed (such as the feed from a delicious page (e.g. http://delicious.com/psychemedia/ical+feedtest)) into a pipe and use it to identify what iCal feeds to pull into the pipe?

Got that? The Loop block grabs the URL for each iCal feed listed in the input RSS feed, and pulls in the corresponding iCal events. It seems to work okay, too:-) That is, the feed powered iCal merge pipe will aggregate events from all the iCal feed listed in the RSS feed that is pulled into the pipe.

So now the workflow, which could possibly be tidied a little, is this:
– bookmark iCal feed URLs to a common somewhere (this can be as weak as shared tags, which are then used as the basis for aggregation of feed URLs);
– take the feed from that common somewhere and pop it into the feed powered iCal merge pipe.
– get the TinyURL of the iCal output from the pipe, and subscribe to it in Google Calendar, (for a personal calendar view).

Hmm… we still can’t publish the Google Calendar though, because we don’t “own” the calendar dates (the iCal feed does)? But I guess we can still use 30boxes as the display surface, and provide a button to add the calendar to Google Calendar?

OKAY – it seems that when you import the feed, it makes sense to tick the box that says “allow other people to find this calendar”:

… because then you can generate some embed code for the calendar, provide a link for anyone else to see the calendar (like this one), and use the tidied up iCal feed that Google calendar now provides to view the calendar in something like iCal:

PPS To make things a little easier, I tweaked the feed powered pipe so now you can just provide it with an RSS feed that points to one or more iCal feeds:

I also added a block to sort the dates in ascending date order. It’s simple enough to add the feed to iGoogle etc, or as a badge in your blog, using the Yahoo Pipes display helper tools:

Hmm, it would be nice if Pipes also offered a “calendar” output view when it knew there was iCal data around, just like it generates a map for when it sniffs geo-data, and a slideshow view when it detects appropriately addressed media objects? Any chance of that, I wonder?

Looking Up Alternative Copies of a Book on Amazon, via ThingISBN

As Amazon improves access to the long tail of books through Amazon’s marketplace sellers and maybe even their ownership of Abebooks, it’s increasingly easy to find multiple editions of the same book. So when I followed a link to a book that Mike Ellis recommended last week (to The Victorian Internet in fact) and found that none of the editions of the book were in stock, as new, on Amazon, I had the tangential thought that it’d be quite handy to have a service that would take an ISBN and then look up the prices for all the various editions of that book on Amazon.

Given an ISBN for a book, there are at least a couple of ways of finding the ISBNs for other editions of the book – the Worldcat xISBN service, and ThingISBN from LibraryThing (now part owned by Amazon through Amazon’s ownership of Abebooks; for who else Amazon owns, see Amazon “Edge Services” – Digital Manufacturing).

So here’s a couple of Yahoo pipes for looking up the alternative editions of a book on the Amazon website, after discovering those editions from ThingISBN.

First of all a pipe that takes an ISBN and looks up alternative editions using ThingISBN:

What this pipe does is construct a URL that calls for the list of alternative ISBNs for a given ISBN. That is, it constructs a URL of the form http://www.librarything.com/api/thingISBN/ISBNHERE, which returns an XML file containing the alternative ISBNs (example), grabs the XML file back using the Fetch Data block, renames the internal representation of the grabbed XML so that the pipe will generate a valid RSS feed, and output the result.

So now we have an RSS feed that contains a list of alternative ISBNs, via ThingISBN, for a given ISBN.

Now to find out how much these books cost on Amazon. For that, we shall find it convenient to construct a pipe that will look up the details of a book on Amazon using the Amazon Associates web service, given an ISBN. (For a brief intro to Amazon Associates web services, see Calling Amazon Associates/Ecommerce Web Services from a Google Spreadsheet.)

Here’s a pipe to do that:

(If you use the AWSzone scratchpad to construct a URL that calls the Amazon web service with a look up for book by ISBN, you can just paste it into the “Base” entry form in the Pipe’s URL Builder block and hit return, and it will explode the arguments into the appropriate slots for you.)

So now we have a pipe that will look up the details of a book on Amazon given its ISBN.

We can now put the ThingISBN pipe and the Amazon ISBN lookup pipe together, to create a compound pipe that will lookup details for all the alternative versions of a particular book, given that particular book’s ISBN:

Okay – so now we have a pipe that takes an ISBN, looks up the alternative ISBNs using ThingISBN, then grabs details for each of those alternatives from Amazon…

Now what? Well, if you use this pipe in your own mashup, you may find that if you construct a URL that calls a pipe with a given ISBN, if you don’t handle the ISBN properly in your own code, you can pass a badly formed ISBN to the pipe. The most common example of this is dropping a leading 0 on the ISBN – so e.g. you pass 441172717 rather than 0441172717.

Now it just so happens that LibraryThing offers another webservice that can correct this sort of error – ISBN check API – and it’s easy enough to create a pipe to call it:

Good – so now we can defensively programme the front end of our pipe to handle badly formed ISBNs by sticking this pipe at the front of the compound pipe that calls ThingISBN and then loops through Amazon calls.

But there’s something we can do at the other end of the pipe too, and that is make use of a ‘slideshow’ feature that Yahoo pipes offers as an interface to the pipe. If the elements of a feed contain image items that are packaged in an appropriate way, the Yahoo pipes interface will automatically create a slidesho of those images.

What this means is that if we package URLs that point to the book cover image of each alternative version of a book, we can get a slideshow of the bookcovers of all the alternative editions of that book.

Here’s just such a pipe:

And here’s the example output:

If you click on the “Get as Badge” option, you can then embed this slideshow on your own website or start page:

For example, here I’ve added the slideshow to my iGoogle page:

Now to my mind, that’s quite a fun (and practical) way of introducing quite a few ideas about webservice orchestration that can be unpacked at a later date. But of course, it’s not very academic, so it’s unlikely to appear in a course near you anytime soon… ;-) But I’d argue that it does stand up as a demo that could be given to show people how much fun this stuff can be to play with, before we inflict SOAP and WS-* on them…