OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for December 2009

Calling One YQL Query Keyed by Another in Yahoo Pipes

leave a comment »

The following isn’t on of my hacks – it comes from @pjdonnelly – but it contains a pattern I’d like to remember so I’m blogging it anyway…;-)

The issue it addresses is how to make on YQL query based on another. Once you see the pattern, it’s obvious…

Call the first query to retrieve a set of results that contain a key value you want to use in the second “as-if nested” query:

(The environment variable is store://datatables.org/alltableswithkeys.)

then simply use a Loop block to construct a query string for each item based on a key value contained in that item, (genrating queries of the form select source from flickr.photos.sizes where photo_id="4179541452" and label = "Medium") and another loop to run the query:

Very nice… and another example of a pragmatic approach to pipelinked data queries. ;-)

Written by Tony Hirst

December 21, 2009 at 12:05 pm

Posted in Pipework

Time for a University Prepress?

with 2 comments

When I first joined the OU as a lecturer, I was self-motivated, research active, publishing to peer reviewed academic conferences outside of the context of a formal research group. That didn’t last more than a couple of years, though… In that context, and at that time, one of the things that struck me about the OU was that research active academics were expected to produce written work for publication in two ways: for research, through academic conferences and journals; and for teaching, via OU course materials.

The internal course material production route was, and still is, managed through a process of course team review in the authoring stage and then supported by editors, artists and picture researchers for publication, although I don’t remember so much involvement from media project managers ten years or so ago, if they even existed then? Pagination and layout was managed elsewhere, and for authors who struggled to use the provided document templates, the editor was at hand for technical review as well as typos and grammar, as well as reference checking, and a course secretary could be brought in to style the document appropriately. Third party rights were handled by the course manager, and so on.

In contrast, researchers had to research and write their papers, produce images, charts, tables as required, and style the document as a camera ready document using a provided style sheet. In addition, published researchers would also review (and essentially help edit) works submitted to other journals and conferences. Th publisher contributed nothing except perhaps project management and the production and distribution of the actual print material (though I seem to remember getting offprints, receiving requests for them, and mailing them out with an OU stamp on an OU envelope).

Although I haven’t published research formally for some time, I suspect the same is still largely true nowadays…

Given that the OU is a publication house, publishing research and teaching materials as a way of generating income, I wonder if there is an opportunity for the Library to support the research publication process providing specialist support for research authors, including optimising them for discovery!

At the current time, many academic libraries host their institution’s repository, providing a central location within which are lodge copies of academic research publications produced by members of that institution. Some academic publishers even offer an ‘added value’ service in their publication route whereby a published article, as written, corrected, layed out, paginated, rights cleared, and rights waived by the author (and reviewed for free by one or more of their peers) will be submitted back to the institution’s repository.

[Cue bad Catherine Tate impression]: what a f*****g liberty… [!]

So as the year ends, here’s a thought I’ve ranted to several people over the year: academic libraries should seize the initiative from the academic publishers, adopt the view that the content being produced by the academy is valuable to publishers as well as academics, that the reputation of journals is in part built on the reputation of the institutions and academics responsible for producing the research papers, and set up a system in which:

- academics submit articles to the repository using an institutional XML template (no more faffing around with different style sheets from different publishers), at which point they are released using a preview stylesheet as a preprint;

- journals to which articles are to be submitted are required to collect the articles from the repository. Layout and pagination is for them to do, before getting it signed off by the author;

- optionally, journal editors might be invited to bid for the right to publish an article formally. The benefit of formal publication for the publisher is that when a work is cited, the journal gets the credit for having published the work.

That is all… ;-)

PS RAE/REF style accounting could also be used in part to set journal pricing and payments. Crap journals that no-on cites content in would get nothing. Well cited journals would be recompensed more generously… There would of course bee opportunities for gaming the system, but addressing this would be similar in kind to implementing measures that search engines based on PageRank style algorithms take against link farms, etc.

Written by Tony Hirst

December 17, 2009 at 1:20 pm

First Dabblings With Pipelinked Linked Data

with 9 comments

One of the promises of the Linked Data lobby is the ability to combine data from different datasets that share common elements, although this ability is not limited to Linked Data (see, for example, Mash/Combining Data from Three Separate Sources Using Dabble DB). In this post, I’ll describe a quick experiment in using Yahoo Pipes to combine data from two different data sources and briefly consider the extent to which plug’n'play data can lower the barriers to entry for exploring the potential of Linked Data.

The datasets I’ll join are both data.gov.uk Linked Data datstores – the transport datastore and the Edubase/Education datastore. The task I’ve set myself is to look for traffic monitoring points in the vicinity of one or more schools and to produce a map that looks something like this:

So to get started, let’s grab a list of schools… The Talis blog post SPARQLing data.gov.uk: Edubase Data contains several example queries over the education datastore. The query I’ll use is derived trivially from one of those examples; in particular, it grabs the name and location of the two newest schools in the UK:
prefix sch-ont: <http://education.data.gov.uk/def/school/>
prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT ?school ?name ?date ?lat ?long WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:openDate ?date;
geo:lat ?lat;
geo:long ?long.
} ORDER BY DESC(?date) LIMIT 2

Pasting the query into the SPARYQL Pipe -map previewer shows a couple of points on a map, as expected.

So how can we look for traffic monitoring points located in the same area as a school? One of the big problems I have with Linked Data is finding out what the shared elements are between data sets (I don’t have a rule of thumb for doing this yet) so it’s time for some detective work – looking through example SPARQL queries on the two datasets, ploughing through the data.gov.uk Google group, and so on. Searching based on lat/long location data, e.g. within bounding box, is one possibility, but it’d be neater, to start with at least, to try to used a shared “area”, such as the same parish, or other common administrative area.

After some digging, here’s what I came up with: this snippet from a post to the data.gov.uk Google group relating to the transport datastore:
#If you’re prepared to search by (local authority) area instead of by a bounding box,
….
geo:long ?long ;
<http://geo.data.gov.uk/0/ontology/geo#area> <http://geo.data.gov.uk/0/id/area/00DA>;
traffic:count ?count .

and this one from the aforementioned Talis Edubase post relating to the education datastore:
prefix sch-ont:
SELECT ?name ?lowage ?highage ?capacity ?ratio WHERE {
?school a sch-ont:School;
sch-ont:districtAdministrative >http://statistics.data.gov.uk/id/local-authority-district/00HA> .

The similar format of the area codes, and the similarity in language (“prepared to search by (local authority) area” and “id/local-authority-district/”) suggest to me that this two things actually refer to the same thing (I asked @jenit … it seems they do…)

So, here’s a recipe for searching for traffic monitoring locations in the same local authority district as a recently opened school. Firstly, modify the SPARQL query shown above so that it also returns the local authority area:

SELECT ?school ?name ?date ?district ?lat ?long WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:openDate ?date;
sch-ont:districtAdministrative ?district;
geo:lat ?lat;
geo:long ?long.
} ORDER BY DESC(?date) LIMIT 2

The result looks something like this:

Secondly, construct a test query on the transport datastore (http://services.data.gov.uk/transport/sparql) to pull out traffic monitoring points, along with their locations, using a local area URI as the search key:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX traffic: <http://transport.data.gov.uk/0/ontology/traffic#>
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX area: <http://geo.data.gov.uk/0/ontology/geo#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?point ?lat ?long WHERE
{ ?point a traffic:CountPoint ;
geo:lat ?lat ;
geo:long ?long ;
<http://geo.data.gov.uk/0/ontology/geo#area> <http://geo.data.gov.uk/0/id/area/00CG>. }

We can create a pipe based around this query that takes an adminstrative area identifier, runs the query through a SPARYQL pipe, (SPARQL and YQL pipe) and returns the traffic monitoring points in that area:

The regular expression block is a hack used to put the region identifier into the form that is required by the transport endpoint if it is passed in using the form required by the education datastore.

Now we’re going to take results from the recent schools query and then look up the traffic monitoring points in that area via the pipe shown above:

The SPARYQL query at the top of the pipe runs the Edubase query and is then split – the same items are passed into ach of th two parts of the pipe, but thy are processed differently. In the left hand branch, we treat the lat and long elements from the Edubase query in order to create y:location elements that the pipe knows how to process as go elements (e.g. in the creation of a KML output from the pipe).

The right hand branch does something different: the loop block works through the list of recently opened schools on school at a time, and for each one looks up the region identifier and passes it to the traffic monitoring points by region pipe. The school item is then replaced by the list of traffic monitoring points in that region.

You can try the pipe out here: traffic monitoring near most recently opened schools

So that’s one way of doing it. Another way is to take the lat/long of each school and pass that information to a pipe that looks up the traffic monitoring points within a bounding box centered on the original location co-ordinates. This gives us a little more control over the notion of ‘traffic monitoring points in the vicinity of a school’.

Again we see a repeat of the fork and merge pattern used above, although this time th right hand branch is passed to a pip that looks up points within a bounding box specified by the latitude and longitude of each school. A third parameter specifies the size of the bounding box:

Notice from the preview of the pipe output how we have details from the left hand branch – the recently opened schools – as well as the right hand branch – the neighbouring traffic monitoring points. Here’s the result again:

As with any map previewing pipe, a KML feed is available that allows the results to be displayed in a(n embeddable) Google map:

(Quick tip: if a Google map chokes on a Yahoo pipes KML URI, use a URL shortener like TinyURL or bit.ly rto get a shortened version of the Yahoo Pipes KML URL, and then post that into the Google maps search box:-)

So there we have it – my take on using Yahoo Pipes to “join” two, err, Linked Data datasets on data.gov.uk :-) I call it pipelinked data :-)

PS some readers may remember how services like Google Fusion Tables can also be used to “join” tabular datasets sharing common columns (e.g. Data Supported Decision Making – What Prospects Does Your University Offer). Well, it seems as if the Google folks have just opened up an API to Google Fusion Tables. Now it may well be that Linked Data is the one true path to enlighentment, but don’t forget that there are many more mortals than there are astronauts…)

PPS for the promised bit on “lower[ing] the barriers to entry for exploring the potential of Linked Data”, that’ll have to wait for another post…

Written by Tony Hirst

December 15, 2009 at 8:43 pm

Posted in Pipework, Tinkering

Tagged with

Hackable SPARQL Queries: Parameter Spotting Tutorial

with one comment

Whenever I come across a new website or search tool, one of the first things I do is have a look at the URIs of resource pages and search results to see: a) whether I can make sense of them (that is, are they in any sense human readable), and b) whether they are “hackable”, to the extent that I can change certain parts of the URI in particular way and have a pretty good idea what the resulting page will look like.

If the URI is hackable, then it often means that it can be parameterised, in the sense that I can construct valid URIs from some sort of template within which part of the URI path, or one of the URI arguments, is replaced using a variable that can be assigned a particular value as required.

So for example, a search for the term ouseful in Google delivers the results page with URI that looks like:
http://www.google.com/search?client=safari&rls=en&q=ouseful&ie=UTF-8&oe=UTF-8

Comparing the search term that I entered (ouseful) with the URI, it’s easy to see how the search term is used in order to create the results page URI:
http://www.google.com/search?client=safari&rls=en&q=SEARCH_TERM_HERE&ie=UTF-8&oe=UTF-8

This technique applies equally to looking at SPARQL search queries, so here’s a worked through example that makes use of a query on the Talis n2 blog (I tend to use SparqlProxy for running SPARQL queries):
#List the uri, latitude and longitude for road traffic monitoring points on the M5
PREFIX road: <http://transport.data.gov.uk/0/ontology/roads#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX geo: <http://geo.data.gov.uk/0/ontology/geo#>
PREFIX wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?point ?lat ?long WHERE {
?x a road:Road.
?x road:number "M5"^^xsd:NCName.
?x geo:point ?point.
?point wgs84:lat ?lat.
?point wgs84:long ?long.
}

Looking carefully at the descriptive comment:

#List the uri, latitude and longitude for road traffic monitoring points on the M5

and the query:

...
?x road:number "M5"^^xsd:NCName.
...

we see how it is possible to parameterise the query such that we can replace the “M5″ string with a variable and use it to pass in the details of (presumably) any UK road number.

In Yahoo Pipes, here’s what the parameterisation looks like – we construct the query string and pass in a value for the desired road number from a user text input (split the query string after ?x road:number “):

The rest of the pipe is built around the SPARYQL pattern that I have described before (e.g. Getting Started with data.gov.uk, Triplr SPARYQL and Yahoo Pipes):

By renaming the latitude and longitude value elements as y:location.lat and y:location.lon, the pipe infrastructure can do itself and provide us with a map based preview of the pipe output, as well as a KML output that can be viewed in Google maps (simply paste thee KML URI into the Google maps search box and use it as the search term) or Google Earth, for example:

Inspection of he the pipe’s KML output URL:
http://pipes.yahoo.com/pipes/pipe.run?
_id=78f6547cc12ac3ebcb84144ec3e37205
&_render=kml&roadnum=M5

shows that is is also hackable. Can you see how to change it so that it will return the traffic monitoring points on the A1, bearing in mind it currently refers to the M5?

So there we have it – given an example SPARQL query for road traffic monitoring locations on thee M5, we can parameterise the query by observation and construct a pipe that gives a map based preview, as well as a KML version of the output, all in less time than it takes to document how it was done… :-)

Here’s another example. This time the original query comes from @tommyh (geeky related stuff here;-); the query pulls a list of motorway service station locations from dbpedia:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpprop: <http://dbpedia.org/property/>
PREFIX yago-class: <http://dbpedia.org/class/yago/>
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT ?services ?label ?road ?lat ?long
WHERE {
?services dbpprop:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:infobox_motorway_services> .
?services rdfs:label ?label
OPTIONAL {
?services dbpprop:road ?road .
?services dbpprop:lat ?lat .
?services dbpprop:long ?long .
} .
FILTER (isIRI(?road)) .
}
ORDER BY ASC(?label)

The results look like:

So how can we weak the original query to search for motorway services on the M1? By inspection of the query, we see the search is looking for services on any ?road (and more than that, on any isIRI(?road), whatever that means?!;-) Looking at the results, we see that the roads are identified in the form:
<http://dbpedia.org/resource/M40_motorway>

So we can tweak the query with an additional condition that requires a particular road. For example:

WHERE {
?services dbpprop:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:infobox_motorway_services> .
?services rdfs:label ?label .
?services dbpprop:road <http://dbpedia.org/resource/M1_motorway>
OPTIONAL {
?services dbpprop:lat ?lat .
?services dbpprop:long ?long .
}
}

(I think we can drop the original FILTER too?)

To parameterise this query, we just ned to feed in the desired road number here:

<http://dbpedia.org/resource/ROADNUMBER_motorway>

Alternatively, we can hack in a regular expression to filter the results by road number – e.g. using the M1 again:

WHERE {
?services dbpprop:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:infobox_motorway_services> .
?services rdfs:label ?label .
?services dbpprop:road ?road
OPTIONAL {
?services dbpprop:lat ?lat .
?services dbpprop:long ?long .
} .
FILTER (isIRI(?road) && regex(?road,"M1_")) .
}

This time, the parametrisation would occur here:
<em FILTER (isIRI(?road) && regex(?road,"ROADNUMBER_”))

Note that if we just did the regular expression on “M1″ rather than “M1_” we’d get back results for the M11 etc as well…

In the spirit of exploration, let’s se if we can guess at/pattern match towards a little bit more. (Note guessing may or may not work – but if it doesn’t, you won’t break anything!)

The line:
?services rdfs:label ?label
would seem to suggest that human readable labels corresponding to URI identifiers may be recorded using the rdfs:label relation. So let’s see:

Create a ?roadname variable in the query and see if ?road rdfs:label ?roadname manages to pull out a useful label:
SELECT ?services ?label ?roadname ?road ?lat ?long
WHERE {
?services dbpprop:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:infobox_motorway_services> .
?services rdfs:label ?label .
?services dbpprop:road ?road .
?road rdfs:label ?roadname
OPTIONAL

Ooh… that seems to work (in this case, at least… maybe it’s a dbpedia convention, maybe it’s a general convention, who knows?!:-)

But it’s a little messy, with different language variants also listed. However, another trick in my toolbox is memory. I remember seeing a filter option in a query once before:
&& lang(?someLabel)=’en’

Let’s try it – change the filter terms to:
FILTER (isIRI(?road) && regex(?road,”M1_”) && lang(?roadname)=’en’) .
and see what happens:

So now I have a query that I can use to find motorway service station locations on a particular UK motorway, and get the name of the motorway back as part of the results. And all with only a modicum of knowledge/understanding of SPARQL… Instead, I relied on pattern matching, a memory of a fragment of a previous query and a bit of trial and error…

PS If you want to try out hacking around with a few other SPARQL quries, I’ve started collecting some likely candidates: Bookmarking and Sharing Open Data Queries

Written by Tony Hirst

December 14, 2009 at 1:39 pm

Posted in Tinkering, Pipework, Data

Tagged with

Bookmarking and Sharing Open Data Queries

with 2 comments

Over the last few months, I had several aborted attempts at trying to get to grips with SPARQL’n'RDF, two key ingredients in the Linked Data initiative. So as the sort of self-directed learner who often relies on learning by example, I’ve put together a Google form to collect together example SPARQL and Google Spreadsheet (aka Guardian Datastore) queries that I can remix and reuse for my own purposes.

Here’s an example of part of the form:

The form collects a description of the query, its endpoint, and ontologies used in the query, the query itself, and optionally a link to an example output from the query, as well as other bits of info (e.g. there’s a place for a link to a blog post describing the query).

Here are some of the bookmarked queries:

At the moment, the saved queries can only be viewed in the spreadsheet, but with time allowing I hope to build a front end/explorer that will allow you to run the queries, see preview results of the queries etc etc. (Note this is intended as a tool for !astronauts to get started with/teach about/learn about/explore various datasets. Folk who would be put off by first being exposed to the RDF’n'SPARQL, rather than seeing the data in a table, plotted on a chart, etc etc. Remember, most people who see lat/long data in a table do not see in their mind’s eye a map with corresponding markers on it; they see a list of largely meaningless numbers…)

You can find the form here: QUERY Sharing Form and the results here: Example Queries [UPDATE: to reun a query, you need to find the correct endpoint and then use the PREFIX column entries, along with the SPARQL/SELECT common stuff, to run the query

e.g. this is should form the basis of a valid query in the form here:

PREFIX road: <http://transport.data.gov.uk/0/ontology/roads#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX geo: <http://geo.data.gov.uk/0/ontology/geo#>
PREFIX wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?point ?lat ?long WHERE {
  ?x a road:Road.
  ?x road:number "M5"^^xsd:NCName.
  ?x geo:point ?point.
  ?point wgs84:lat ?lat.
  ?point wgs84:long ?long.
}

There aren’t many examples in there at the moment, but the form is an open one and you can use it if you like… I’m also taking suggestions for how to improve the form so that other folk might b tempted into using it… ;-) (Note that I intend to keep tweaking the spreadsheet as I use it in order to make it more useful for me, if no-one else…)

The spreadsheet that collects the results should also be open as a read only document (let me know if you try to us it and have any problems doing so) so fel free to browse through the examples. I intend to put a front end of sorts onto the spreadsheet at some point using the Google Visualisation API,(cf. my original (and currently stalled) Guardian DataStore Explorer), but if you want to beat me to it, go for it :-)

Written by Tony Hirst

December 13, 2009 at 3:10 pm

Posted in Data

Tagged with

Keeping Your Facebook Updates Private

with 2 comments

So it seems as if Facebook is trying to encourage everyone to open up a little, and just share… Ah, bless… I suppose it is getting near to Christmas, after all…

So if you don’t want the world and Google to know everything you’re posting about on Facebook, and you are quite happy with privacy settings as they currently are, thank you very much, here’s what I (think) you need to do… Continue to the next step and change the settings from Everyone:

to Old Settings:

When you hover over the Old Settings radio button, a tooltip should pop up telling you what your current settings are. If anything looks odd, make a note of it so that you can change the setting later.

If you think you’d like to make things available to Everyone, bear in mind these important things to remember:

Information you choose to share with Everyone is available to everyone on the internet.

And when you install an application:

When you visit a Facebook-enhanced application, it will be able to access your publicly available information, which includes Name, Profile Photo, Gender, Current City, Networks, Friend List, and Pages. This information is considered visible to Everyone.

To save the settings, click to do exactly what it says on the button:

If, whilst changing the settings, you noticed that an Old Setting tooltip suggested that your current privacy settings were different to what you thought they were, you’ll need to go in to the Privacy Settings panel, which you can find from the Settings on the toolbar at the top of each Facebook page:

Looking at the actual privacy settings page, there are several menu options that lead to yet more menu options and then screenfuls of different settings…

When I have a spare 2-3 hours, I’ll try to post a summary of them… (unless anyone already knows of a good tutorial on “managing your Facebook privacy settings”?) For now, though, I’m afraid you’re own trying to track down the setting you disagreed with so that you can change it to a setting you do want to have…

Written by Tony Hirst

December 10, 2009 at 10:06 am

Posted in Evilness, Infoskills

Tagged with ,

Programming Pipes With Delicious and Sharing data.gov.uk SPARQL Queries As A Result

leave a comment »

In the post Sharing Linked Data Queries With Mortals… I described a rather clunky pattern for sharing SPARQL queries onto a data.gov.uk sparql endpoint using delicious, along with a Yahoo pipe to generate a SPARQLProxy URI that identified a CSV formatted output from the query that could be consumed in something like Google spreadsheets (e.g. Viewing SPARQLed data.gov.uk Data in a Google Spreadsheet).

In this post, I’ll refine that pattern a little more and show how to use delicious to bookmark a “processed” form of the output of the query, along with all the ingredients needed to generate that output. In a later post (hopefully before Christmas) I’ll try to show how the pattern can be used to share queries into other datastores, such as Google visualization API queries into a Google spreadsheet.

[The post describes two independent operations - firstly, how to generate a sparqlproxy uri from a set of delicious tags; secondly, how to generate a map from sparqlproxy output.]

In what follows, I’m using delicious as a database, and delicious tags as machine tags that can be used as arguments within a Yahoo pipe. The intention is not to suggest that this is even a good way of sharing SPARQL queries and demo outputs, but it does show an improvised way of how to share them,. It also provides just enough raw material to allow UI designers to think how we might capture, share and display sample use cases that go from SPARQL query to human meaningful output.

So here’s what a “finished” delicious bookmark looks like:

The three “machine tags”:
- endpoint:http://services.data.gov.uk/transport/sparql
- output:csv
- query:http://codepad.org/hPo2XIzx/raw.txt
are used as instructions in a Yahoo pipe that generates a query using SPARQLProxy:

A feed of bookmarked queries is pulled in from delicious, and checked to see that all the programming arguments that are required are there. (The sparlproxy_demo tag requirement is just a convenience.)

If all the arguments are available, their values are captured and passed to a routine to construct the SPARQLProxy URIs:

The query bookmarked by the tags is a query onto the transport database that pulls out the location traffic monitoring points on thee M5. The bookmarked URI is a demonstration of how to use the output of that query. In the current example, the bookmarked demo URI looks like this:

http://maps.google.com/maps?f=q&source=s_q&hl=en&geocode=
&q=http:%2F%2Fpipes.yahoo.com%2Fpipes%2Fpipe.run%3F_id%3D1c77faa95919df9dbea678a6bf4881c6%26_render%3Dkml
&sll=37.0625,-95.677068&sspn=44.25371,70.751953&ie=UTF8&z=8

That is, it is a bookmark to a Google map that is displaying a KML feed pulled in from a Yahoo pipe.

This is the Google Map:

This is the Yahoo pipe the KML is pulled from:

And this is a peek inside that pipe:

The URI for the fetched CSV file is on that was generated from the bookmarked query tags by the first pipe shown above.

So to recap – this bookmark:

links through to a Google map showing traffic monitoring locations on the M5. The map is an ‘end-user’ demo that shows how Government data may be displayed in a meaningful way. In addition, th tags in the bookmark carry enough information for the user to construct a SPARQL qury that will generate the data that is displayed by the map. A utility Yahoo pipe (ID 22059ac6f967d7117b1262586c92b371) can take a delicious RSS feed containing the bookmark and generate a SPARQLProxy URI that calls the tagged endpoint with the tagged query and generates the tag specified output. There is then a break in the chain. A utility pipe capable of handling SPPARQLProxy generated CSV from data.gov.uk transport data generates a KML version of the data, which is then passed to a Google map.

So what? So I don’t know what… but it has got me thinking that what might be useful is a quick way of sharing:
- info that is sufficient to generate and run a particular SPARQL query.
- along with a link to an ‘end user’ demo showing how that data might be used or displayed
- all in one handy package…

Which is what the above does… sort of.. though it’s still way too complicated for mortals…

PS It occurs to me that it might be possible to define a pipeline using delicious tags too? Maybe something like:
pipeline:22059ac6f967d7117b1262586c92b371?name=psychemedia&tag=transportDemo//
1c77faa95919df9dbea678a6bf4881c6?_render=kml//
gmap

where the first step in the pipeline (2059ac6f967d7117b1262586c92b371?name=psychemedia&tag=transportDemo) says “run the delicious RSS feed from psychemedia/transportDemo (assuming I had recursively bookmarked that record with transportDemo) through the pipe with ID 2059ac6f967d7117b1262586c92b371 to generate the SPARQLProxy URI, then pass that to the pipe with ID 1c77faa95919df9dbea678a6bf4881c6 which should generate a KML output, which in turn should be sent to a Google map?!

Written by Tony Hirst

December 9, 2009 at 11:28 pm

Meanwhile, Over on the Arcadia Blog(s)… Redux

with 2 comments

A month or so ago, I posted a round-up of items I’d published on the various Arcadia Project blogs ( Meanwhile, Over on the Arcadia Blog(s)…). Here’s a follow up to that one, providing a quick review of the various Arcadia posts I’ve produced since then, posts that might in other circumstances have normally appeared on this blog.

PS For completeness in this summary of posts I’ve recently blogged elsewhere, there’s a smattering of stuff on the WriteToReply/Actually blog:

Phew… next week, back to normal – ish – though I intend to carry on posting library related stuff on the Arcadia blogs.

Written by Tony Hirst

December 9, 2009 at 9:40 pm

Posted in Anything you want

Tagged with

A Final Nail in the Coffin of “Google Ground Truth”?

with 2 comments

I’ve written before about how Google’s personalisation features threaten the notion of some sort of “Google Ground Truth”, the ability for two different individuals in different locations to enter the same term into the Google search box, and get back similar results (e.g. Another Nail in the Coffin of “Google Ground Truth”?).

So what threats are there? Google Personalised Search for logged in Google users is one obvious source of differences, as are regional differences from the different national search engines (e.g. google.ca versus google.co.uk).

With more and more browsers become location aware, I wonder whether we will increasingly see regional, or even hyperlocal, differences in standard web search based on browser location (something that presumably already exists in the local search engines).

Social signals (links from your friends or amplified by them) and real time signals also act as potential sources of difference for personalised ranking factors.

And for users engaged in a search session, the ranking of results you see in the third search in a session may even be influenced by the terms (and results you clicked on?!) in the first or second queries of that session.

Anyway, it seems that as of the weekend, there is another threat – perhaps a final threat – to that notion: Personalized Search for everyone:

Previously, we only offered Personalized Search for signed-in users, and only when they had Web History enabled on their Google Accounts. What we’re doing today is expanding Personalized Search so that we can provide it to signed-out users as well. This addition enables us to customize search results for you based upon 180 days of search activity linked to an anonymous cookie in your browser. It’s completely separate from your Google Account and Web History (which are only available to signed-in users). You’ll know when we customize results because a “View customizations” link will appear on the top right of the search results page. Clicking the link will let you see how we’ve customized your results and also let you turn off this type of customization.

Chris Lott also made a very perceptive comment:

PS It also looks like Google are looking for even more traffic data to help feed their stats collection’n'analysis engines: Introducing Google Public DNS

PPS it seems that Google just announced real time search results integration into the Google homepage. It’s still rolling out, but here’s a preview of what the integration looks like:

Read more at Relevance meets the real-time web. Exciting times…

PPPS Seems like there’s no global, or necessarily even national, ground truth in Google Suggest results either: Google localised Suggest

Written by Tony Hirst

December 7, 2009 at 8:04 pm

Sharing Linked Data Queries With Mortals…

with 2 comments

So I know, I know, I b****y well know how important the whole RDF thing is for Linked Data, and I know we’re not there yet with respect to actual people using data pulled from data.gov.* sources, and I’m starting to be persuaded that maybe data.gov.uk is only there to feed the growth of semantic web developer capacity but ultimately, ultimately, it will probably be folk who can’t cope with anything other than a spreadsheet who are going to have to use this data…

…so the spirit in which this is offered is one of just trying to protect the interests of potential future end users while the geeky tech developer astronauts (astronauts being @iand’s term;-) do the groundwork’n'spadework and make design decisions whose full impact may not otherwise be realised for a little while yet…

So what am I offering…? A quick’n'dirty way of sharing bookmarks into the sparqlproxy Web Service that I posted about in Viewing SPARQLed data.gov.uk Data in a Google Spreadsheet.

So how does it work? Like this…

The geeky tech SPARQL speaking astronaut writes their SPAQRL query and posts it into codepad:

They grab the link to the raw text and bookmark it in delicious; the SPARQL endpoint for the query is pasted into the description, and a brief description of the query into the title; the required output is identified using an output: machine tag (e.g. output:, output:sparql, output:html, output:csv, output:xml, output:exhibit or output:gvds):

(An alternative might be to have the endpoint as the title, and the description as the description, or a brief description as a title, a full description asa description, and a endpoint: “machine tag” for the endpoint, but this was just a proof of concept, right? ;-)

The following pipe constructs the SPARQLProxy query for each bookmark using the specified query, endpoint and output type (at the moment, the pipe also requires a sparqlproxy_demo tag to be present):

A link to the result of the query, suitably transformed, is then rewritten as the link in the output feed.

A bit of tidying up on the pipe lets you specify a delicious user and/or tag as the origin of the bookmarked links…

So there we have it, an easy way to share SPARQLed queries and get access to “human usable” outputs…

PS there’s no reason, in the recipe above, not to also use the sparql endpoint URI in a tag or machine tag too, to allow for queries run over the same bookmark to be collected together and pulled out of delicious by tag/endpoint…

Written by Tony Hirst

December 4, 2009 at 9:33 pm

Follow

Get every new post delivered to your Inbox.

Join 126 other followers