Parliamentary Committees Treemap

Can’t sleep, can’t work do “proper work”, so I’ve taken the day off as holiday to play out some more of the thoughts and momentum that cropped up during Dev8D… Like this treemap of parliamentary committee membership (as of 1/2/10).

Parliamenttary committees treemap

Here’s the tale…;-)

Whilst resisting playing with the new Guardian Politics API (and absolutely dreading the consequences any other topic specific APIs they open up might have on my time!) I did have a quick peek at at it to see whether or not it had any details about Parliamentary committees. It didn’t, but just looking upped my desire enough to pop over to the TheyWorkForYou API to see if they had any appropriate calls: what I wanted was a list of committees, and the membership thereof… (if you’ve read Council Committee Treemaps From OpenlyLocal you’ll maybe guess why!;-)

And it seems they do – getCommittee will “[f]etch the members of the committee that match [a specified] name – if more than one committee matches, return their names” and for a specified committee “[r]eturn the members of the committee as they were on [a specified] date”.

What I wanted was to be able to “select all” committees, which there’s no explicit switch for; but a search on the letter ‘e’ did turn up quite a comprehensive list, so as long as they’re using a free text search, I guess this has the same effect!

So I can get a list of committees, and now I want the members of each one. Pipework, I think:-)

First up, fetch a list of the committees from the TheyWorkForYou API. Then, for each committee, generate the URL that will pull pack the committee membership on a particular date.

For each committee, we now annotate the committee item with the membership of the committee and do a little tidying.

(In terms of API calls, this pipe makes N+1 calls to the TheyWorkForYou API, where the first call returns N committees.)

This pipe gives a JSON feed containing a list of committees annotated with the names and ID of committee members. The structure is not totally dissimilar to the structure I used for the Openlylocal committees, so I should be able to reuse that code if I can make the representations match.

So how do they differ? In no particular order, the differences are:
– the Openlylocal committee representation elaborates each member with a .party attribute which does not exist in the TWFY member lists, as well as using .first_name and last_name rather than .name to name the committee member;
– the TheyWorkForYou list does not have an .id attribute for each committee, which the Openlylocal committee list does have, and which is required for the treemap tooltips to work.

First task, then, is to annotate the committee membership list with each member’s party. A call to another TheyWorkForYou API call (http://www.theyworkforyou.com/api/docs/getMPsInfo returns a list of MPs and their party.

It’s easy enough to elaborate the committee list with this information – create a party lookup table and then annotate as required… While we’re at it, we can do a fudge with the name and id…;-)

var partyLookup=[];
for (var i=0; i< mpDetails.length; i++){
 partyLookup[mpDetails[i].person_id]=mpDetails[i].party;
}

for (i=0; i<mysocMPcommittees.value.items.length; i++){
  var currComm=mysocMPcommittees.value.items[i];
  currComm.id="cid"+i;
  
  for (var j=0;j<currComm.members.length; j++){
     currComm.members[j].party=partyLookup[currComm.members[j].person_id];
    
    currComm.members[j].first_name="";
    currComm.members[j].last_name=currComm.members[j].name;
  }
}

It was then easy enough to just plug this into the code I’d used to display the Openlylocal treemap. You can see it here.

Looking over the TheyWorkForYou API, I guess it would be possible to do a similar sort of thing to map the parties of people who spoke in debates? (I’m not sure if there’s data on votes/divisions with a party breakdown? If so, that would be good to plot too, with hierarchy based on : division, ayes/nays, party, so we could see defections, as well as relative sizes of the votes?

The Yahoo Pipes Documentation Project – Initial Thoughts

One of the issues I keep coming across when trying to get folk interested in using Yahoo Pipes is the question of what happens if Yahoo Pipes Dies? I tend to post a lot of screenshots from some of the pipes I build, so I’d stand some chance of recreating them elsewhere, but this is not generally the case for most pipes users.

If we could compile a version of pipe into code that could be run on our own servers, then we could use Yahoo Pipes as a rapid prototyping tool and code generator, rather than as a runtime platfrom. This might then make the environment more palatable to developers working in a conservative regime (e.g. Starting to Think About a Yahoo Pipes Code Generator).

As a stop gap, I thought it would be worth starting up a little Yahoo Pipes Documentation Project to produce some sort of documentation around Yahoo pipes. Grabbing JSON Data from One Web Page and Displaying it in Another contains a couple of bookmarks that demonstrate a simple way of grabbing the content of a pipe.

Here’s a first attempt reporter, done as a bookmarklet that can be applied when editing a pipe. The result is dsiplayed in the browser console:

javascript:(function(){var p=editor.pipe.working;var d="----- ";var c=console.log;c(d+d);var q,r,w,i,m,k,K,ok,oK,x,X;var D=false;var ins=[];var outs=[];var b=[];var o=[];if(D)c("CONNECTIONS");m=p.wires;for (i=0;i<m.length;i++){if(D)c(d);q=m[i].src;r=m[i].tgt;K=r.moduleid;k=q.moduleid;if (!(b[K])){b[K]={};b[K].conns=[]};if (!(b[k])){b[k]={};b[k].conns=[]};ok={};oK={};x=b[k].conns.length;X=b[K].conns.length;b[k].conns[x]={};b[k].conns[x].o=q.id;b[k].conns[x].typ="out";b[k].conns[x].t=K;b[k].conns[x].T=r.id;b[K].conns[X]={};b[K].conns[X].typ="in";b[K].conns[X].f=k;b[K].conns[X].F=q.id;b[K].conns[X].i=r.id;if(D)c("FROM "+k+" ("+q.id+") TO "+K+" ("+r.id+")");}if(D)c(d+d); m=p.modules;c('MODULES');for (i=0;i<m.length;i++){q=m[i];if (!(b[q.id]))b[q.id]=[];b[q.id].n=q.type};for (i=0;i < m.length;i++){c(d);q=m[i];k=q.id;c("MODULE: "+k+" ("+q.type+")");c("-- PROPERTIES:");for (w in q)c(d+w+" "+q[w].toSource()+" ");c("-- CONNECTIONS:");if (b[k])if(b[k].conns)for (j=0;j<b[k].conns.length;j++){K=b[k].conns[j];if(K.typ=="in"){w=d+'<INPUT> [ '+K.i+' ]';if (K.f)w+=' from '+K.f+' [ '+b[K.f].n+" : "+K.F+' ]';c(w)};if(K.typ=="out")c(d+'<OUTPUT> [ '+K.o+' ] connects to '+K.t+' [ '+b[K.t].n+' : '+K.T+' ]')}};c(d+d)})()

Here’s a snippet of the sort of display it gives:

PS @hapdaniel just alerted me to a way I can get hold of a description of a pipe and make it available via a JSON feed. I’ll get a post up about how to do that when I get a chance… Note that this approach deprecates the method above and the approach alluded to in Grabbing JSON data post. (In the meantime, here’s a taster…)

Grabbing JSON Data from One Web Page and Displaying it in Another

Lots of web pages represent data within the page as a javascript object. But if you want to make use of that data in another page, how can you do that?

A case in point is Yahoo Pipes. The only place I’m currently aware of where we can look at how a particular Yahoo pipe is constructed is the Yahoo Pipes editor. The pipe is represented as a Javascript object within the page (as described in Starting to Think About a Yahoo Pipes Code Generator), but it’s effectively locked into the page.

So here’s a trick for liberating that representation…

Firstly, we need to know what the name of the object is. In the case of Yahoo Pipes, the pipe’s definition is contained in the editor.pipe.definition [NO: it’s in editor.pipe.working] object.

In order to send the object to another page on the web, the first thing we need to do is generate a text string view of it that we can POST to another web page. This serialised representation of the object can be obtained by calling the .toSource() function on it.

The following bookmarklets show what that representation looks like.

<!– *** [UPDATE: the following bookmarks don't provide a complete description of the pipe – .toSource() doesnlt appear to dig into arrays… ]*** <- WRONG…I thought the missing data is in the terminaldata but it isn’t.. hmmm… –> UPDATE – found it? editor.pipe.module_info DOUBLE UPDATE: nah… that is more the UI side of things.., so where are the actual pipe RULEs defined (e.g. the rules in a Regular Expression block
UPDATE – found the RULE data – *** UPDATE 2 – Found it… I should be using editor.pipe.working NOT editor.pipe.definition

Firstly, we can display the serialised representation in a browser alert box:

javascript:(function(){alert(editor.pipe.working.toSource())})()

Alternatively, we can view it in the browser console (for example, in Firefox, we might do this via the Firebug plugin):

javascript:(function(){console.log(editor.pipe.working.toSource())})()

The object actually contains several other objects, not all of which are directly relevant to the logical definition of the pipe (e.g. they are more to do with layout), so we can modify the console logging bookmarklet to make it easier to see the two objects we are interested in – the definitions of each of the pipe blocks (that is, the pipe editor.pipe.definition.modules), and the connections that exist between the modules (editor.pipe.definition.wires; [UPDATE: we also need the terminaldata]):

javascript:(function(){var c=console.log;var p=editor.pipe.working;c('MODULES: '+p.modules.toSource());c('WIRES: '+p.wires.toSource());c('TERMINALS: '+p.terminaldata.toSource())})()


[terminaldata not shown]

To actually send the representation to another web page, we can use a bookmarklet to dynamically create a form element, attach the serialised object to it as a form argument, append the form to the page and then submit it:

javascript:(function(){var ouseful={};ouseful=editor.pipe.working;ouseful=ouseful.toSource(); var oi=document.createElement('form');oi.setAttribute('method','post');oi.setAttribute('name','oif');oi.setAttribute('action','http://ouseful.open.ac.uk/ypdp/jsonpost.php');var oie=document.createElement('input');oie.setAttribute('type','text');oie.setAttribute('name','data');oie.setAttribute('value',ouseful);oi.appendChild(oie);document.body.appendChild(oi);document.oif.submit();})()

In this case, the page I am submitting the form to is a PHP page. The code to accept the POST serilaised object, and then republish as a javascript object wrapped in a callback function (i.e. package it so it can be copied and then used within a web page).

&lt;?php
$str= $_POST['data'];
$str = substr($str, 1, strlen($str) - 2); // remove outer ( and )
$str=stripslashes($str);
echo &quot;ypdp(&quot;.$str.&quot;)&quot;;
?&gt;

[Note that I did try to parse the object using PHP, but I kept hitting all sorts of errors with the parsing of it… The simplest approach was just to retransmit the object as Javascript so it could be handled by a browser.]

If we want to display the serialsed version of the object in another page, rather than in an alert box or the browser console, we need to pass the the serialised object within the URI using an HTTP GET to the other page, so we can generate a link to it. For long pipes, this might break..*

*(Anyone know of an equivalent to a URL shortening service that will accept HTTP POST arguments and give you a short URL that will do a POST on your behalf? [As well as the POST payload we’d need to pass the target URL (i.e. the address to which the POST data is to be sent), to the shortener. It would then give you a short URL, such that when you click on it it will POST the data to the desired target URL. I suppose another approach would be a service that will store the post data for you, give you a short URI in return, and then you call the short URI with the address of the page you want the data posted to as a key?)

PS If you do run the bookmarklet to generate a URI that contains the serialised version of the pipe, (that is, use a GET method in the form and a $_GET handler in the PHP script), you can load the object (wrapped in the ypdp() callback function) into your own page via a <script> element in the normal way, by setting the src attribute of the script to the URI that includes the serialsed version of the pipe description.

Starting to Think About a Yahoo Pipes Code Generator

Following a marathon session demoing Yahoo Pipes yesterday (the slides I didn’t really use but pretty much covered are available here) I thought I’d start to have a look at what would be involved in generating a Pipes2PHP, Pipes2Py, or Pipes2JS conversion tool as I’ve alluded to before (What Happens If Yahoo! Pipes Dies?)…

So how are pipes represented within the Yahoo Pipes environment? With a little bit of digging around using the Firebug extension to Firefox, we can inspect the Javascript object representation of a pipe (that is, the thing that is used to represent the pipework and that gets saved to the server whenever we save a pipe).

So to start, let’s look at the following simple pipe:

SImple pipe

Here’s a Firebug view showing the path (editor.pipe.definition should be: editor.pipe.working) to the representation of a pipe:

And here’s what we see being passed to the Yahoo pipes server when the pipe is saved…

Here’s how it looks as a Javascript object:

"modules":[{"type":"fetch","id":"sw-502","conf":{"URL":{"value":"http://writetoreply.com/feed","type":"url"}}},{"type":"output","id":"_OUTPUT","conf":{}},{"type":"filter","id":"sw-513","conf":{"MODE":{"type":"text","value":"permit"},"COMBINE":{"type":"text","value":"and"},"RULE":[{"field":{"value":"description","type":"text"},"op":{"type":"text","value":"contains"},"value":{"value":"the","type":"text"}}]}}],"terminaldata":[],"wires":[{"id":"_w3","src":{"id":"_OUTPUT","moduleid":"sw-502"},"tgt":{"id":"_INPUT","moduleid":"sw-513"}},{"id":"_w6","src":{"id":"_OUTPUT","moduleid":"sw-513"},"tgt":{"id":"_INPUT","moduleid":"_OUTPUT"}}

Let’s try to pick that apart a little… firstly, all the modules are defined. Here’s the Fetch module:

{
 "type":"fetch",
 "id":"sw-502",
 "conf":{
  "URL":{
   "value":"http://writetoreply.com/feed",
   "type":"url"
  }
 }
}

The output module:

{
 "type":"output",
 "id":"_OUTPUT",
 "conf":{}
}

The filter module:

{
 "type":"filter",
 "id":"sw-513",
 "conf":{
  "MODE":{"type":"text","value":"permit"},
  "COMBINE":{"type":"text","value":"and"},
  "RULE":[{
   "field":{"value":"description","type":"text"},
   "op":{"type":"text","value":"contains"},
   "value":{"value":"the","type":"text"}
  }]
 }
}

Each of these blocks (that is, modules) has a unique id. The wires then specify how these modules are connected.

So here’s the wire that connects the output of the fetch block to the input of the filter module:

{
 "id":"_w3",
 "src":{
  "id":"_OUTPUT",
  "moduleid":"sw-502"
 },
 "tgt":{
  "id":"_INPUT",
  "moduleid":"sw-513"
 }
}

And here we connect the output of the filter to the input of the output block:

{
 "id":"_w6",
 "src":{
  "id":"_OUTPUT",
  "moduleid":"sw-513"
 },
 "tgt":{
  "id":"_INPUT",
  "moduleid":"_OUTPUT"
 }
}

***UPDATE – I’m not sure if we also need to look at the terminaldata information. I seem to have lost sight of where the multiple “RULES” that might appear inside a block are described…? Ah…. editor.pipe.module_info? Hmm, not – that is more the UI side of things.., so where are the actual pipe RULEs defined (e.g. the rules in a Regular Expression block?)***

*** UPDATE 2 – Found it… I should be using editor.pipe.working NOT editor.pipe.definition

So what would a code generator need to do? I’m guessing one way would be to do something like this…

  • for each module, create an equivalent function by populating a templated function with the appropriate arguments e.g.
    f_sw-502(){ returnfetchURL(“http://writetoreply.org/feed&#8221;) }
  • for each wire, do something along the lines of f_sw-513(f_sw-502()); it’s been a long day, so I’m not sure how to deal with modules that have multiple inputs? But this is just the start, right…? (If anyone else is now intrigued enough to start thinking about building a code generator from a pipes representation, please let me know…;-)

As to why this approach might be useful?
– saving a copy of the Javascript representation of a pipe gives us an archival copy of the algorithm, albeit in a javascripty objecty way…
– if we have a code generator, we can use Yahoo Pipes as a rapid prototyping tool to create code that can be locally hosted.

PS a question that was raised a couple of times in the session yesterday related to whether or not Yahoo pipes can be run behind a corporate firewall. I don’t think it can, but does anyone know for sure? Is there a commercial offering available, for example, so corporate folk can run their own instance of pipes in the privacy of their own network?

PPS here’s a handy trick… when in a Yahoo pipes page, pop up the description of the pipe with this javascript call in a Firefox location bar:
javascript:alert(editor.pipe.definition.toSource());

Creating Database Query Forms in Google Spreadsheets – Sort Of

It’s all very well using a Google spreadsheet as a database, but sometimes you just want to provide a simple form to let people run a particular query. Here’s a quick way of doing that within a Spreadsheet…

So for example: Can you help me crowd source a solution?. The problem is as follows:

Students will make five choices from a list of over 200 projects that have been anonymised… We will give each project a code, and have already entered all the details into an excel sheet so we can tie the project code to the supervisor.

We need a solution that will enable students to enter their project code and then have the title of the project displayed as a check to make sure they have entered the code correctly. The list of projects is just too long for a drop down list, even when split by department (around 50 in each).

Does anyone have any suggestions of tools that we can use for students to submit this type of information, so that we get it in a format that we can use, and they get confirmation of the project titles they have chosen? A simple google form isn’t going to hack it!

Here’s one way…

Create a “form” – the text entry cell can be highlighted by setting the background colour from the spreadsheet toolbar:

Construct a query. In this case, I need to select three results columns (H, I and J) from another sheet (‘Sheet1’, the one that acts as the database and contains the project codes) so the query will be of the form “select H,I,J where H contains “BIOCHEM”; the search term (“BIOCHEM”) is pulled in from the query form show above:

=concatenate(“select H,I,J where H contains ‘”,B2,”‘”)

(As a rule of thumb, if you want your query to select cells A, D, AC, the range set in the first part of the query that defines the database should span the first to the last column in the select range (Sheet1!A:AC, for example).)

By using the contains relation, this query will generate a set of results that are, in effect, a list of auto-complete suggestions as the result of a searching on a partially stated query term.

Assuming I have placed the query in cell A4, I can automatically get the results from the query as follows:

Note that it would be possible to hide the query generator (the contents of cell A4) in another sheet and just have the search box and the results displayed in the user interface sheet.

Another approach is to query the spreadsheet via its API.

So for example, if the original spreadsheet database was published as a public document, we could also grab the results as an HTML table via an API using a URI of the form:

http://spreadsheets.google.com/tq?tqx=out:html
&tq=select%20H%2CI%2CJ%20where%20H%20contains%20%22SEARCHTERM%22
&key=SPREADSHEETKEY

Setting out:csv would return the results in comma separated variable format, so we could create a Yahoo pipes interface to query the form, for example:

Here’s how:

What would be really useful would be if the Google/Yahoo widget options for the feed respected the form elements, rather than just generating a widget that displays the feed corresponding to the current Run of the pipe with the provided search terms.

Building such a widget is something I need to put on my to do list, I guess?! Sigh…

Visualising Traffic Count Data from transport.data.gov.uk

In a couple of posts last year (Hackable SPARQL Queries: Parameter Spotting Tutorial and First Dabblings With Pipelinked Linked Data) I started to explore how it might be possible to use Yahoo Pipes as an environment for sharing – and chaining together (albeit inefficiently) – queries to the data.gov.uk open transport data datastore.

Those posts concentrated on querying the datastore in order to find the location of traffic monitoring points according to various search criteria. In this post, I’ll show you one way of visualising traffic count data from a traffic count point using Many Eyes Wikified.

The first thing we need to do is come up with a query that will pull traffic monitoring data back from the transport datastore. My first point of call for finding a query to get me started is usually to search over the data.gov.uk Google group archive in my mailbox. As ever, @jenit had posted a ‘getting started’ solution:-)
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;
PREFIX traffic: <http://transport.data.gov.uk/0/ontology/traffic#&gt;
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#&gt;
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#&gt;

SELECT ?direction (SUM(?value) AS ?total)
WHERE {
<http://transport.data.gov.uk/0/id/traffic-count-point/7422&gt;
traffic:count ?count .
?count a traffic:Count ;
traffic:category <http://transport.data.gov.uk/0/category/bus&gt; ;
traffic:direction ?direction ;
traffic:hour ?hour ;
rdf:value ?value .
FILTER (
REGEX(str(?hour), "^2008-")
)
}
GROUP BY ?direction

I tweaked this a little (using guesswork and pattern matching, rather than understanding, Chinese Room style;-) to come up with a tweaked query that appears to pull out traffic count data for different categories of vehicle on a particular day from a particular monitoring point:

SELECT ?vehicle ?direction ?hour (SUM(?value) AS ?total)
WHERE {
<http://transport.data.gov.uk/0/id/traffic-count-point/7422&gt;
traffic:count ?count .
?count a traffic:Count ;
traffic:category ?vehicle ;
traffic:direction ?direction ;
traffic:hour ?hour ;
rdf:value ?value .
FILTER (
REGEX(str(?hour), "^2008-05-02T") )
}
GROUP BY ?vehicle ?direction ?hour

I then turned this into a Yahoo Pipes query block, using the date and traffic monitoring point as input parameters:

Here’s how to do that:

Having got the query into the pipes ecosystem, I knew I should be able to get the data out of the pipe as JSON data, or as a CSV feed, which could then be wired into other web pages or web applications. However, to get the CSV output working, it seemed like I needed to force some column headings by defining attributes within each feed item:

To tidy up the output a little bit more, we can sort it according to time and day and traffic count by vehicle type:

It’s then easy enough to grab the CSV output of the pipe (grab the RSS or JSON URI and just change the &_render=rss or &_render=json part of the URI to &_render=csv) and wire it into somewhere else – such as into Many Eyes WIkified:

Doing a couple of quick views over the data in Many Eyes wikified, it seemed as if there was some duplication of counting, in that the numbner of motor vehicles appeared to be the sum of a number of more specific vehicle types:

Data in Many Eyes WIkified

Looking at the car, van, bus, HGV and motorbike figures we get:

SUmmed totals

So I made the judgement call to remove the possibly duplicate motor-vehicle data from the data feed and reimported the data into Many Eyes WIkified (by adding some nonsense characters (e.g. &bleurghh to the feed URI so that Many Eyes thought it was a new feed.)

It was then easy enough to create some interactive visualisations around the traffic data point. SO for example, here we have a bubble chart:

Do you spot anything about traffic flow going North versus South at 8am compared to 5pm?

Lets explore that in a little more detail with a matrix chart:

Traffic data matrix chart http://wikified.researchlabs.ibm.com/ousefuldatagovuk/TrafficDataPoints:TrafficBubble

This shows us the relative counts for different vehicle types, again by time of day. Notice the different distribution of van traffic compared to car traffic over the course of the day.

A treemap gives a slightly different take on the same data – again, we can see how there is a difference between North and South flowing volumes at different times of day within each category:

One thing that jumps out at me from the treemap is how symmetrical everything seems to be at noon?!

All the above visualisations are interactive, so click through on any of the images to get to the interactibve version (Java required).

As to how to find traffic monitoring point IDs – try this.

PS a disadvantage of the above recipe is that to generate a visualisation for a different traffic point, I’d need to use the desired parameters when grabbing the CSV feed from the pipe, and then create new Many Eyes Wikified data pages and visualisation pages. However, using nothing more than a couple of web tools, I have managed to prototype a working mockup of a visualisation dashboard for traffic count data that could be given to a developer as a reference specification for a “proper” application. And in the meantime, it’s still useful as a recipe… maybe?

PPS While I was playing with this over the weekend, it struck me that if school geography projects ever do traffic monitoring surveys, it’s now possible for them to get hold of “real” data. If there are any school geography teachers out there who’d like to bounce around ways of making this data useful in a school context, please get in touch via a comment below :-)

Calling One YQL Query Keyed by Another in Yahoo Pipes

The following isn’t on of my hacks – it comes from @pjdonnelly – but it contains a pattern I’d like to remember so I’m blogging it anyway…;-)

The issue it addresses is how to make on YQL query based on another. Once you see the pattern, it’s obvious…

Call the first query to retrieve a set of results that contain a key value you want to use in the second “as-if nested” query:

(The environment variable is store://datatables.org/alltableswithkeys.)

then simply use a Loop block to construct a query string for each item based on a key value contained in that item, (genrating queries of the form select source from flickr.photos.sizes where photo_id="4179541452" and label = "Medium") and another loop to run the query:

Very nice… and another example of a pragmatic approach to pipelinked data queries. ;-)

First Dabblings With Pipelinked Linked Data

One of the promises of the Linked Data lobby is the ability to combine data from different datasets that share common elements, although this ability is not limited to Linked Data (see, for example, Mash/Combining Data from Three Separate Sources Using Dabble DB). In this post, I’ll describe a quick experiment in using Yahoo Pipes to combine data from two different data sources and briefly consider the extent to which plug’n’play data can lower the barriers to entry for exploring the potential of Linked Data.

The datasets I’ll join are both data.gov.uk Linked Data datstores – the transport datastore and the Edubase/Education datastore. The task I’ve set myself is to look for traffic monitoring points in the vicinity of one or more schools and to produce a map that looks something like this:

So to get started, let’s grab a list of schools… The Talis blog post SPARQLing data.gov.uk: Edubase Data contains several example queries over the education datastore. The query I’ll use is derived trivially from one of those examples; in particular, it grabs the name and location of the two newest schools in the UK:
prefix sch-ont: <http://education.data.gov.uk/def/school/&gt;
prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#&gt;
SELECT ?school ?name ?date ?lat ?long WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:openDate ?date;
geo:lat ?lat;
geo:long ?long.
} ORDER BY DESC(?date) LIMIT 2

Pasting the query into the SPARYQL Pipe -map previewer shows a couple of points on a map, as expected.

So how can we look for traffic monitoring points located in the same area as a school? One of the big problems I have with Linked Data is finding out what the shared elements are between data sets (I don’t have a rule of thumb for doing this yet) so it’s time for some detective work – looking through example SPARQL queries on the two datasets, ploughing through the data.gov.uk Google group, and so on. Searching based on lat/long location data, e.g. within bounding box, is one possibility, but it’d be neater, to start with at least, to try to used a shared “area”, such as the same parish, or other common administrative area.

After some digging, here’s what I came up with: this snippet from a post to the data.gov.uk Google group relating to the transport datastore:
#If you’re prepared to search by (local authority) area instead of by a bounding box,
….
geo:long ?long ;
<http://geo.data.gov.uk/0/ontology/geo#area&gt; <http://geo.data.gov.uk/0/id/area/00DA&gt;;
traffic:count ?count .

and this one from the aforementioned Talis Edubase post relating to the education datastore:
prefix sch-ont:
SELECT ?name ?lowage ?highage ?capacity ?ratio WHERE {
?school a sch-ont:School;
sch-ont:districtAdministrative >http://statistics.data.gov.uk/id/local-authority-district/00HA&gt; .

The similar format of the area codes, and the similarity in language (“prepared to search by (local authority) area” and “id/local-authority-district/”) suggest to me that this two things actually refer to the same thing (I asked @jenit … it seems they do…)

So, here’s a recipe for searching for traffic monitoring locations in the same local authority district as a recently opened school. Firstly, modify the SPARQL query shown above so that it also returns the local authority area:

SELECT ?school ?name ?date ?district ?lat ?long WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:openDate ?date;
sch-ont:districtAdministrative ?district;
geo:lat ?lat;
geo:long ?long.
} ORDER BY DESC(?date) LIMIT 2

The result looks something like this:

Secondly, construct a test query on the transport datastore (http://services.data.gov.uk/transport/sparql) to pull out traffic monitoring points, along with their locations, using a local area URI as the search key:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;
PREFIX traffic: <http://transport.data.gov.uk/0/ontology/traffic#&gt;
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#&gt;
PREFIX area: <http://geo.data.gov.uk/0/ontology/geo#&gt;
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#&gt;
SELECT ?point ?lat ?long WHERE
{ ?point a traffic:CountPoint ;
geo:lat ?lat ;
geo:long ?long ;
<http://geo.data.gov.uk/0/ontology/geo#area&gt; <http://geo.data.gov.uk/0/id/area/00CG&gt;. }

We can create a pipe based around this query that takes an adminstrative area identifier, runs the query through a SPARYQL pipe, (SPARQL and YQL pipe) and returns the traffic monitoring points in that area:

The regular expression block is a hack used to put the region identifier into the form that is required by the transport endpoint if it is passed in using the form required by the education datastore.

Now we’re going to take results from the recent schools query and then look up the traffic monitoring points in that area via the pipe shown above:

The SPARYQL query at the top of the pipe runs the Edubase query and is then split – the same items are passed into ach of th two parts of the pipe, but thy are processed differently. In the left hand branch, we treat the lat and long elements from the Edubase query in order to create y:location elements that the pipe knows how to process as go elements (e.g. in the creation of a KML output from the pipe).

The right hand branch does something different: the loop block works through the list of recently opened schools on school at a time, and for each one looks up the region identifier and passes it to the traffic monitoring points by region pipe. The school item is then replaced by the list of traffic monitoring points in that region.

You can try the pipe out here: traffic monitoring near most recently opened schools

So that’s one way of doing it. Another way is to take the lat/long of each school and pass that information to a pipe that looks up the traffic monitoring points within a bounding box centered on the original location co-ordinates. This gives us a little more control over the notion of ‘traffic monitoring points in the vicinity of a school’.

Again we see a repeat of the fork and merge pattern used above, although this time th right hand branch is passed to a pip that looks up points within a bounding box specified by the latitude and longitude of each school. A third parameter specifies the size of the bounding box:

Notice from the preview of the pipe output how we have details from the left hand branch – the recently opened schools – as well as the right hand branch – the neighbouring traffic monitoring points. Here’s the result again:

As with any map previewing pipe, a KML feed is available that allows the results to be displayed in a(n embeddable) Google map:

(Quick tip: if a Google map chokes on a Yahoo pipes KML URI, use a URL shortener like TinyURL or bit.ly rto get a shortened version of the Yahoo Pipes KML URL, and then post that into the Google maps search box:-)

So there we have it – my take on using Yahoo Pipes to “join” two, err, Linked Data datasets on data.gov.uk :-) I call it pipelinked data :-)

PS some readers may remember how services like Google Fusion Tables can also be used to “join” tabular datasets sharing common columns (e.g. Data Supported Decision Making – What Prospects Does Your University Offer). Well, it seems as if the Google folks have just opened up an API to Google Fusion Tables. Now it may well be that Linked Data is the one true path to enlighentment, but don’t forget that there are many more mortals than there are astronauts…)

PPS for the promised bit on “lower[ing] the barriers to entry for exploring the potential of Linked Data”, that’ll have to wait for another post…

Hackable SPARQL Queries: Parameter Spotting Tutorial

Whenever I come across a new website or search tool, one of the first things I do is have a look at the URIs of resource pages and search results to see: a) whether I can make sense of them (that is, are they in any sense human readable), and b) whether they are “hackable”, to the extent that I can change certain parts of the URI in particular way and have a pretty good idea what the resulting page will look like.

If the URI is hackable, then it often means that it can be parameterised, in the sense that I can construct valid URIs from some sort of template within which part of the URI path, or one of the URI arguments, is replaced using a variable that can be assigned a particular value as required.

So for example, a search for the term ouseful in Google delivers the results page with URI that looks like:
http://www.google.com/search?client=safari&rls=en&q=ouseful&ie=UTF-8&oe=UTF-8

Comparing the search term that I entered (ouseful) with the URI, it’s easy to see how the search term is used in order to create the results page URI:
http://www.google.com/search?client=safari&rls=en&q=SEARCH_TERM_HERE&ie=UTF-8&oe=UTF-8

This technique applies equally to looking at SPARQL search queries, so here’s a worked through example that makes use of a query on the Talis n2 blog (I tend to use SparqlProxy for running SPARQL queries):
#List the uri, latitude and longitude for road traffic monitoring points on the M5
PREFIX road: <http://transport.data.gov.uk/0/ontology/roads#&gt;
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;
PREFIX geo: <http://geo.data.gov.uk/0/ontology/geo#&gt;
PREFIX wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#&gt;
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#&gt;
SELECT ?point ?lat ?long WHERE {
?x a road:Road.
?x road:number "M5"^^xsd:NCName.
?x geo:point ?point.
?point wgs84:lat ?lat.
?point wgs84:long ?long.
}

Looking carefully at the descriptive comment:

#List the uri, latitude and longitude for road traffic monitoring points on the M5

and the query:

...
?x road:number "M5"^^xsd:NCName.
...

we see how it is possible to parameterise the query such that we can replace the “M5” string with a variable and use it to pass in the details of (presumably) any UK road number.

In Yahoo Pipes, here’s what the parameterisation looks like – we construct the query string and pass in a value for the desired road number from a user text input (split the query string after ?x road:number “):

The rest of the pipe is built around the SPARYQL pattern that I have described before (e.g. Getting Started with data.gov.uk, Triplr SPARYQL and Yahoo Pipes):

By renaming the latitude and longitude value elements as y:location.lat and y:location.lon, the pipe infrastructure can do itself and provide us with a map based preview of the pipe output, as well as a KML output that can be viewed in Google maps (simply paste thee KML URI into the Google maps search box and use it as the search term) or Google Earth, for example:

Inspection of he the pipe’s KML output URL:
http://pipes.yahoo.com/pipes/pipe.run?
_id=78f6547cc12ac3ebcb84144ec3e37205
&_render=kml&roadnum=M5

shows that is is also hackable. Can you see how to change it so that it will return the traffic monitoring points on the A1, bearing in mind it currently refers to the M5?

So there we have it – given an example SPARQL query for road traffic monitoring locations on thee M5, we can parameterise the query by observation and construct a pipe that gives a map based preview, as well as a KML version of the output, all in less time than it takes to document how it was done… :-)

Here’s another example. This time the original query comes from @tommyh (geeky related stuff here;-); the query pulls a list of motorway service station locations from dbpedia:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#&gt;
PREFIX dbpprop: <http://dbpedia.org/property/&gt;
PREFIX yago-class: <http://dbpedia.org/class/yago/&gt;
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#&gt;
SELECT ?services ?label ?road ?lat ?long
WHERE {
?services dbpprop:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:infobox_motorway_services&gt; .
?services rdfs:label ?label
OPTIONAL {
?services dbpprop:road ?road .
?services dbpprop:lat ?lat .
?services dbpprop:long ?long .
} .
FILTER (isIRI(?road)) .
}
ORDER BY ASC(?label)

The results look like:

So how can we weak the original query to search for motorway services on the M1? By inspection of the query, we see the search is looking for services on any ?road (and more than that, on any isIRI(?road), whatever that means?!;-) Looking at the results, we see that the roads are identified in the form:
<http://dbpedia.org/resource/M40_motorway&gt;

So we can tweak the query with an additional condition that requires a particular road. For example:

WHERE {
?services dbpprop:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:infobox_motorway_services&gt; .
?services rdfs:label ?label .
?services dbpprop:road <http://dbpedia.org/resource/M1_motorway&gt;
OPTIONAL {
?services dbpprop:lat ?lat .
?services dbpprop:long ?long .
}
}

(I think we can drop the original FILTER too?)

To parameterise this query, we just ned to feed in the desired road number here:

<http://dbpedia.org/resource/ROADNUMBER_motorway>

Alternatively, we can hack in a regular expression to filter the results by road number – e.g. using the M1 again:

WHERE {
?services dbpprop:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:infobox_motorway_services&gt; .
?services rdfs:label ?label .
?services dbpprop:road ?road
OPTIONAL {
?services dbpprop:lat ?lat .
?services dbpprop:long ?long .
} .
FILTER (isIRI(?road) && regex(?road,"M1_")) .
}

This time, the parametrisation would occur here:
<em FILTER (isIRI(?road) && regex(?road,"ROADNUMBER_”))

Note that if we just did the regular expression on “M1” rather than “M1_” we’d get back results for the M11 etc as well…

In the spirit of exploration, let’s se if we can guess at/pattern match towards a little bit more. (Note guessing may or may not work – but if it doesn’t, you won’t break anything!)

The line:
?services rdfs:label ?label
would seem to suggest that human readable labels corresponding to URI identifiers may be recorded using the rdfs:label relation. So let’s see:

Create a ?roadname variable in the query and see if ?road rdfs:label ?roadname manages to pull out a useful label:
SELECT ?services ?label ?roadname ?road ?lat ?long
WHERE {
?services dbpprop:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:infobox_motorway_services&gt; .
?services rdfs:label ?label .
?services dbpprop:road ?road .
?road rdfs:label ?roadname
OPTIONAL

Ooh… that seems to work (in this case, at least… maybe it’s a dbpedia convention, maybe it’s a general convention, who knows?!:-)

But it’s a little messy, with different language variants also listed. However, another trick in my toolbox is memory. I remember seeing a filter option in a query once before:
&& lang(?someLabel)=’en’

Let’s try it – change the filter terms to:
FILTER (isIRI(?road) && regex(?road,”M1_”) && lang(?roadname)=’en’) .
and see what happens:

So now I have a query that I can use to find motorway service station locations on a particular UK motorway, and get the name of the motorway back as part of the results. And all with only a modicum of knowledge/understanding of SPARQL… Instead, I relied on pattern matching, a memory of a fragment of a previous query and a bit of trial and error…

PS If you want to try out hacking around with a few other SPARQL quries, I’ve started collecting some likely candidates: Bookmarking and Sharing Open Data Queries