Tinkering with the Guardian Content API – SerendipiTwitterNews

Okay, so here I am again, again, trying yet again to get y’all to see just what I was on about with serendipitwitterous

So think of it this way – you’re on the Tube (route planning courtesy of a hack you knocked up using the TfL API), peering over the shoulder of someone reading the Guardian or the Observer, and an interesting headline catches your eye… You start to read it…. Serendipity has worked her charms again…

Got that? That’s what this new edition pipe/feed hack type thing lets you do – discover interesting news stories from the Guardian (or Observer) courtesy of a little pipe that does term extraction on your latest tweets and then runs them as searches on the OpenPlatform API, with the “/technology” filter set, to turn up news stories from the last 10 days or so that may be loosely related to something you tweeted about…

So here’s where we start – grab your most recent tweets, via a twitter RSS feed:

Run each tweet through the Yahoo term extraction service, and filter out null entries (i.e. ones that donlt contain any text) and any duplicates:

As a result, you might get something like this:

We’re now going to run those extracted terms through the Guardian Open Platform API. Firstly, we need to construct the RESTful URL that will call the service:

You’ll maybe notice a couple of filters in there? One limits stories to articles that have been tagged as Technology stories; the other limits the search to reasonably fresh stories (that is, stories with a datestamp from the last 10 days).

We now use these URIs to call the web service and get some XML back; when testing the service, I noticed some errors that appeared to be caused by the API limiting the number of calls per second to the service (it is still early beta days of testing, after all), so I’m limiting the number of calls I make in rapid succession to just a few queries:

Finally, we map the relevant parts of the XML response to RSS item elements:

And there we have it – a Yahoo pipe that does crude term extraction on your most recent tweets and uses them as query terms for recent tech stories in a search on the Guardian OpenPlatform Content API:

And I call it Guardian Tech SerendipiTwitterNews, okay? ;-)

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

2 thoughts on “Tinkering with the Guardian Content API – SerendipiTwitterNews”

Comments are closed.

%d bloggers like this: