Tagged: Guardian content API

Tinkering with the Guardian Platform API – Tag Signals

Given a company or personal name, what’s a quick way of generating meaningful tags around what it’s publicly known for, or associated with?

Over the last couple of weeks or so, I’ve been doodling around a few ideas with Miguel Andres-Clavera from the JWT (London) Innovation Lab looking for novel ways of working out how brands and companies seem to be positioned by virtue of their social media followers, as well as their press mentions.

Here’s a quick review of one of those doodles: looking up tags associated with Guardian news articles that mention a particular search term (such as a company, or personal name) as a way of getting a crude snapshot of how the Guardian ‘positions’ that referent in its news articles.

It’s been some time since I played with the Guardian Platform API, but the API explorer makes it pretty easy to automatically generate some (the Python library for the Guardian Platform API appears to have rotted somewhat with various updates made to the API after its initial public testing period).

Guardian OpenPlatfrom API

Here’s a snapshot over recent articles mentioning “The Open University” (bipartite article-tag graph):

Open university - article-tag graph

Here’s a view of the co-occurrence tag graph:

'OPen University

The code is available as a Gist: Guardian Platform API Tag Grapher

As with many of my OUseful tools and techniques, this view over the data is intended to be used as a sensemaking tool as much as anything. In this case, the aim is to help folk get an idea of how, for example, “The Open University” is emergently positioned in the context of Guardian articles. As with the other ‘discovering social media positioning’ techniques I’m working on, I see the approach useful not so much for reporting, but more as a way of helping us understand how communities position brands/companies etc relative to each other, or relative to particular ideas/concepts.

It’s maybe also worth noting that the Guardian Platform article tag positioning view described above makes use of curated metadata published by the Guardian as the basis of the map. (I also tried running full text articles through the Reuters OpenCalais service, and extracting entity data (‘implicit metadata’) that way, but the results were generally a bit cluttered. (I think I’d need to clean the article text a little first before passing it to the OpenCalais service.)) That is, we draw on the ‘expert’ tagging applied to the articles, and whatever sense is made of the article during the tagging process, to construct our own sensemaking view over a wider set of articles that all refer to the topic of interest.

PS would anyone from the Guardian care to comment on the process by which tags are applied to articles?

PPS a couple more… here’s how the Guardian position JISC recently…

JISC Positioning... Guardian

And here’s how “student fees” has recently been positioned:

In the context of tuition fees - openplatform tag-tag graph

Hmmm…

Tinkering with the Guardian Content API – SerendipiTwitterNews

Okay, so here I am again, again, trying yet again to get y’all to see just what I was on about with serendipitwitterous

So think of it this way – you’re on the Tube (route planning courtesy of a hack you knocked up using the TfL API), peering over the shoulder of someone reading the Guardian or the Observer, and an interesting headline catches your eye… You start to read it…. Serendipity has worked her charms again…

Got that? That’s what this new edition pipe/feed hack type thing lets you do – discover interesting news stories from the Guardian (or Observer) courtesy of a little pipe that does term extraction on your latest tweets and then runs them as searches on the OpenPlatform API, with the “/technology” filter set, to turn up news stories from the last 10 days or so that may be loosely related to something you tweeted about…

So here’s where we start – grab your most recent tweets, via a twitter RSS feed:

Run each tweet through the Yahoo term extraction service, and filter out null entries (i.e. ones that donlt contain any text) and any duplicates:

As a result, you might get something like this:

We’re now going to run those extracted terms through the Guardian Open Platform API. Firstly, we need to construct the RESTful URL that will call the service:

You’ll maybe notice a couple of filters in there? One limits stories to articles that have been tagged as Technology stories; the other limits the search to reasonably fresh stories (that is, stories with a datestamp from the last 10 days).

We now use these URIs to call the web service and get some XML back; when testing the service, I noticed some errors that appeared to be caused by the API limiting the number of calls per second to the service (it is still early beta days of testing, after all), so I’m limiting the number of calls I make in rapid succession to just a few queries:

Finally, we map the relevant parts of the XML response to RSS item elements:

And there we have it – a Yahoo pipe that does crude term extraction on your most recent tweets and uses them as query terms for recent tech stories in a search on the Guardian OpenPlatform Content API:

And I call it Guardian Tech SerendipiTwitterNews, okay? ;-)