OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for November 2010

Accessing Government Education Data in Scraperwiki via the Edubase/Education Datastore API

leave a comment »

There’s lots of education data out there, but do we always need to scrape it from LEA websites? How can we easily access the data that’s in the central government datastore, and bring it into an environment we’re happy working from?

Although lots of school data has been made available as data for some time now, it’s publication as Linked Data means there’s a considerable barrier to entry in terms of functional access to, and use of, the data. (For an example of how to interrogate the Edubase Linked Data API in Scraperwiki, see Accessing Linked Data in Scraperwiki via YQL.) As an attempt to mask some of the horrors of SPARQL from mortal users, @jenit et al worked on a simpler webservice that meant you could access the data.gov.uk education datastore via a “friendly”, human readable URL, such as the following:

  • http://services.data.gov.uk/education/api/school/constituency-name/Horsham: list of schools within the constituency of Horsham
  • /education/api/school/local-authority-name/{la}: schools by local authority name
  • /education/api/school/district/{district} : schools by district ID (I think? Not sure exactly what ID’s these might be?)
  • /education/api/school/area/{minLat},{minLong};{maxLat},{maxLong}: schools within a particular geogrpahical area, as described by a latitude/longitude bounding box.

There’s a wealth of data that can be returned about a school, so various views over the data are also supported using a URL variable (for example, ?_view=provision or &_view=performance

short – shows very basic information
medium – shows a few more fundamental details about the schools, such as its address
provision – describes the kind and number of children that they take
location – describes where the school is
performance – gives information related to their performance
admin – gives administrative information
all – gives you everything that’s known about each school

If you know any particular data attributes you want to further filter the results on, they can be specified literally. For example, the following (far from complete) list of attributes gives some idea of what’s possible, this time passed via explicit URL args:

  • ?nurseryProvision=true
  • &gender.label=Girls
  • ofstedSpecialMeasures=true
  • for searching number ranges, the min- and max- prefixes may be applied to certain parameters. For example: &max-statutoryHighAge=10 searches for schools where statutoryHighAge<=10

Jeni did a great write up of the API at A Developers’ Guide to the Linked Data APIs – Jeni Tennison (which I cribbed from heavily in the above;-). You can find a full overview of the education API documentation here: Linked Data API Configuration APIs: Edubase API

So… how can we use this in Scraperwiki? Here’s a demo:

import simplejson
import urllib
import scraperwiki

#------- USER SETTINGS ------
# Original API documentation at: http://services.data.gov.uk/education/api/api-config#schools
# Original blog post by @jenit describing the API used: http://data.gov.uk/blog/guest-post-developers-guide-linked-data-apis-jeni-tennison
# Original blog post describing this Scraperwiki page: http://blog.ouseful.info/2010/11/03/accessing-government-education-data-in-scraperwiki-via-the-edubaseeducation-datastore-api/

# The main query
eduPath='school/constituency-name/Horsham'

# Filters, as a list:
eduFilters=['min-statutoryHighAge=7','max-statutoryHighAge=10']

# _views - not considered yet...

# key and label data is displayed in the console for each result, and added to the Scraperwiki database
# keys are the top level attributes we want to display. For a result item, display each item[key]
keys=['establishmentNumber','label']

# labels are used to display labels of top level items, e.g. item[label]['label']
labels=['typeOfEstablishment','phaseOfEducation']
# Note, if you have item[path][wherever][label], or deeper down a path, we don't handle that (yet?!)

# The school ID will always be added to the Scraperwiki database (it's the database ID for a record).
# If latitude/longitude data is available, it will also be added to the database.


# Note that the script doesn't yet handle multiple pages of results either...

#-------------------------- 
  
# This function displays the results, and also adds results to the Scraperwiki database.
# We always look for school ID (this is the table ID) and latlng for mapping, if that data exists
def printDetails(item,keys=['establishmentNumber','label'],labels=[]):
    txt=[]
    record={}
    for key in keys:
        if key in item:
            txt.append(str(item[key]))
            record[key]=item[key]
        else:
            record[key]=''
    if 'establishmentNumber' not in keys:
        record['establishmentNumber']=item['establishmentNumber']
    for attribute in labels:
        if attribute in item:
            txt.append(item[attribute]['label'])
            record[attribute]=item[attribute]['label']
        else:
            record[attribute]=''
    if 'lat' in item:
        latlng=(item['lat'],item['long'])
        scraperwiki.datastore.save(["establishmentNumber"], record,latlng=latlng)
    else:
        scraperwiki.datastore.save(["establishmentNumber"], record)
        pass
    print ', '.join(txt)    
    
    
# This is where we construct the Edubase Linked Data API URL, and then call it, returning JSON
# Need to find a way of handling results spread over several results pages
data=simplejson.load(urllib.urlopen('http://services.data.gov.uk/education/api/'+eduPath+'.json'+'?'+'&'.join(eduFilters)))['result']
items=data["items"]

for item in items:
    printDetails(item,keys,labels)
    print item

You can find the code running on Scraperwiki here: ouseful scraperwiki – playing with Education datastore API

Here’s an example of what gets put in the Scaperwiki database:

Example scraperwiki datatable - education datastore API

Hopefully what this demo does is show how you can start exploring the Education datastore in Scraperwiki withougt having to do too much. More explanation/guidance, or at least futher examples, are required in order to demonstrate:
- the construction of valid “eduPath” statements, if possible showing how they can reuse identifier codes from other sources;
- the use of different _views, and maybe handlers for those views that add all the data to the Scraperwiki database automagically;
- how to inspect returned results so you can identify what keys and labels can be used from a result when you want to construct your own Scraperwiki database records;
- handlers for data down the result item path (i.e. more than handlers just for item[key] and item[label]['label'], but also item[here][there], item[here][there][everywhere]['label'] etc.)
- results are only pulled back from the first page of results; need to find some way of handling results over multiple pages, maybe limiting results to a max number of results within that. (Maybe the tweepy Cursor code could be reused for this???)

Written by Tony Hirst

November 3, 2010 at 1:14 pm

Posted in Data

Tagged with , ,

Accessing Linked Data in Scraperwiki via YQL

with 3 comments

A comment from @frabcus earlier today alerted me to the fact that the Scraperwiki team had taken me up on my suggestion that they make the Python YQL library available in the Scraperwiki environment, so I thought I ought to come up with an example of using it…

YQL provides a general purpose standard query interface “to the web”, interfacing with all manner of native APIs and providing a common way of querying with them, and receiving responses from them. YQL is extensible too – If there isn’t a wrapper for your favourite API, you can write one yourself and submit it to the community. (For a good overview of the rationale for, and philosophy behind YQL, see Christian Heilmann’s the Why of YQL.)

Browsing through the various community tables, I found one for handling SPARQL queries. The YQL wrapper expects a SPARQL query and an endpoint URL, and will return the results in the YQL standard form. (Here’s an example SPARQL query in the YQL developer console using the data.gov.uk education datastore.)

The YQL query format is:
select * from sparql.search where query=”YOUR_SPARQL_QUERY” and service=”SPARQL_ENDPOINT_URL”
and can be called in Python YQL in the following way (Python YQL usage):

def run_sparql_query(query, endpoint):
    y = yql.Public()
    query='select * from sparql.search where query="'+query+'" and service="'+endpoint+'"'
    env = "http://datatables.org/alltables.env"
    return y.execute(query, env=env)

For a couple of weeks now, I’ve been look for an opportunity to try to do something – anything – with the newly released Ordnance Survey Linked Data (read @gothwin’s introduction to it for more details: /location /location /location – exploring Ordnance Survey Linked Data – Part 2).

One of the things the OS Linked Data looks exceedingly good for is acting as glue, mapping between different representations for geographical and organisational areas; the data can also return regions that neighbour on a region, which could make for some interesting “next door to each other” ward, district or county level comparisons.

One of the most obvious ways in to the data is via a postcode. The following Linked Data query to the ordnance survey SPARQL endpoint (http://api.talis.com/stores/ordnance-survey/services/sparql) returns the OS district ID, ward and district name that a postcode exists in:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX postcode: < http://data.ordnancesurvey.co.uk/ontology/postcode/>

select ?district ?wardname ?districtname where { <http://data.ordnancesurvey.co.uk/id/postcodeunit/MK76AA>
postcode:district ?district; postcode:ward ?ward.
?district skos:prefLabel ?districtname.
?ward skos:prefLabel ?wardname
}

Here is is running in the YQL developer console:

OS Posctcode query in YQL developer console

(Just by the by, we can create a query alias for that query if we want, by changing the postcode (MK76AA in the example to @postcode. This gives us a URL argument/variable called postcode whose value gets substituted in to the query whenever we call it:

http://query.yahooapis.com/v1/public/yql/psychemedia/ospostcodelookupdemo1?postcode=INSERTPOSTCODEHERE&env=http://datatables.org/alltables.env

[Note we manually need to add the environment variable &env=http://datatables.org/alltables.env to the URL created by the query alias generator/wizard.]

YQL query alieas for sparql query

So… that’s SPARQL in YQL – but how can we use it in Scraperwiki… The newly added YQL wrapper makes it easy.. here’s an example, based on the above:

os_endpoint='http://api.talis.com/stores/ordnance-survey/services/sparql'

os_query='''
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX postcode: <http://data.ordnancesurvey.co.uk/ontology/postcode/>

select ?district ?wardname ?districtname where {
<http://data.ordnancesurvey.co.uk/id/postcodeunit/MAGIC_POSTCODE> postcode:district ?district; postcode:ward ?ward.
?district skos:prefLabel ?districtname.
?ward skos:prefLabel ?wardname
}
'''
postcode="MK7 6AA"

os_query=os_query.replace('MAGIC_POSTCODE',postcode.replace(' ',''))

def run_sparql_query(query, endpoint):
    y = yql.Public()
    query='select * from sparql.search where query="'+query+'" and service="'+endpoint+'"'
    env = "http://datatables.org/alltables.env"
    return y.execute(query, env=env)

result=run_sparql_query(os_query, os_endpoint)

for row in result.rows:
    print postcode,'is in the',row['result']['wardname']['value'],'ward of',row['result']['districtname']['value']
    record={ "id":postcode, "ward":row['result']['wardname']['value'],"district":row['result']['districtname']['value']}
    scraperwiki.datastore.save(["id"], record) 

I use the MAGICPOSTCODE substitution to give me the freedom to create a procedure that will take in a postcode argument and add it in to the query. Note that I am probably breaking all sorts of Linked Data rule by constructing the URL that uniquely identifies (reifies?) the postcode in the ordnance survey URL namespace (that is, I construct something like <http://data.ordnancesurvey.co.uk/id/postcodeunit/MK76AA>, which contravenes the “URIs are opaque” rule that some folk advocate, but I’m a pragmatist;-)

Anyway, here’s a Scraperwiki example that scrapes a postcode from a web page, and looks up some of its details via the OS: simple Ordnance Survey Linked Data postcode lookup

The next thing I wanted to do was use two different Linked Data services. Here’s the setting. Suppose I know a postcode, and I want to lookup all the secondary schools in the council area that postcode exists in. How do I do that?

The data.gov.uk education datastore lets you look up schools in a council area given the council ID. Simon Hume gives some example queries to the education datastore here: Using SPARQL & the data.gov.uk school data. The following is a typical example:

prefix sch-ont: <http://education.data.gov.uk/def/school/>

SELECT ?name ?reference ?date WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:uniqueReferenceNumber ?reference ;
sch-ont:districtAdministrative <http://statistics.data.gov.uk/id/local-authority-district/00MG> ;
sch-ont:openDate ?date ;
sch-ont:phaseOfEducation .
}

Here, the secondary schools are being identified according to the district area they are in (00MG in this case).

But all I have is the postcode… Can Linked Data help me get from MK7 6AA to 00MG (or more specifically, from <http://data.ordnancesurvey.co.uk/id/postcodeunit/MAGIC_POSTCODE> to <http://statistics.data.gov.uk/id/local-authority-district/00MG>?)

Here’s what the OS knows about a postcode:

What the OS knows about a postcode

If we click on the District link, we can see what the OS knows about a district:

Local authority area code lookup in OS Linked Data

The Census Code corresponds to the local council id code used in the Education datastore (thanks to John Goodwin for pointing that out…). The identifier doesn’t provide a Linked Data URI, but we can construct one out of the code value:
<http://statistics.data.gov.uk/id/local-authority-district/MAGIC_DISTRICTCODE>

(Note that the statistics.data.gov.uk lookup on the district code does include a sameas URL link back to the OS identifier.)

Here’s how we can get hold of the district code – it’s the dmingeo:hasCensusCode you’re looking for:

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX admingeo: <http://data.ordnancesurvey.co.uk/ontology/admingeo/>
PREFIX postcode: <http://data.ordnancesurvey.co.uk/ontology/postcode/>

select ?district ?nsdistrict ?wardname ?districtname where {
<http://data.ordnancesurvey.co.uk/id/postcodeunit/MAGIC_POSTCODE> postcode:district ?district; postcode:ward ?ward.
?district skos:prefLabel ?districtname.
?ward skos:prefLabel ?wardname .
?district admingeo:hasCensusCode ?nsdistrict.
}
'''

postcode='MK7 6AA'
os_endpoint='http://api.talis.com/stores/ordnance-survey/services/sparql'
os_query=os_query.replace('MAGIC_POSTCODE',postcode.replace(' ',''))

result=run_sparql_query(os_query, os_endpoint)

for row in result.rows:
    print row['result']['nsdistrict']['value']
    districtcode=row['result']['nsdistrict']['value']
    print postcode,'is in the',row['result']['wardname']['value'],'ward of',row['result']['districtname']['value']
    record={ "id":postcode, "ward":row['result']['wardname']['value'],"district":row['result']['districtname']['value']} 

So what does that mean… well. we managed to look up the district code from a postcode using the Ordnance Survey API, which means we can insert that code into a lookup on the education datastore to find schools in that council area:

def run_sparql_query(query, endpoint):
    '''
    # The following string replacement construction may be handy
    query = 'select * from flickr.photos.search where text=@text limit 3';
    y.execute(query, {"text": "panda"})
    '''
    y = yql.Public()
    query='select * from sparql.search where query="'+query+'" and service="'+endpoint+'"'
    env = "http://datatables.org/alltables.env"
    return y.execute(query, env=env)

edu_endpoint='http://services.data.gov.uk/education/sparql'    

edu_query='''
prefix sch-ont:  <http://education.data.gov.uk/def/school/>

SELECT ?name ?reference ?date WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:uniqueReferenceNumber ?reference ;
sch-ont:districtAdministrative <http://statistics.data.gov.uk/id/local-authority-district/MAGIC_DISTRICTCODE> ;
sch-ont:openDate ?date ;
sch-ont:phaseOfEducation <http://education.data.gov.uk/def/school/PhaseOfEducation_Secondary>.
}
'''
districtcode='00MG'
edu_query=edu_query.replace('MAGIC_DISTRICTCODE',districtcode)
result=run_sparql_query(edu_query, edu_endpoint)
for row in result.rows:
    for school in row['result']:
        print school['name']['value'],school['reference']['value'],school['date']['value']
        record={ "id":school['reference']['value'],"name":school['name']['value'],"openingDate":school['date']['value']}
        scraperwiki.datastore.save(["id"], record) 
​

Here’s a Scraperwiki example showing the two separate Linked Data calls chained together (click on the “Edit” tab to see the code).

Linked Data in Scraperwiki

Okay – so that easy enough (?!;-). We’ve seen how:
- Scraperwiki supports calls to YQL;
- how to make SPARQL/Linked Data queries from Scraperwiki using YQL;
- how to get data from one Linked Data query and use it in another.

A big problem though is how do you know whether there is a linked data path from a data element in one Linked Data store (e.g. from a postcode lookup in the Ordnance Survey data) through to another datastore (e.g. district area codes in the education datastore), where you is a mere mortal and not a Linked Data guru?! Answers on the back of a postcard, please, or via the comments below;-)

PS whilst doing a little digging around, I came across some geo-referencing guidance on the National Statistcics website that suggests that postcode areas might change over time (they also publish current and previous postcode info). So what do we assume about the status (currency, validity) of the Ordnance Survey postcode data?

PPS Just by the by, this may be useful to folk looking for Linked Data context around local councils: @pezholio’s First steps to councils publishing their own linked data

Written by Tony Hirst

November 2, 2010 at 10:18 am

Tesco the Tech Company…?

leave a comment »

In passing, a handful of things that recently caught my eye on Nick Lansley’s Tech for Tesco blog:

- Tesco Freeview experiment: apparently, “Tesco.com R&D has been given access to a 32kbps [broadcast] digital stream …” So? Nick Lansley explains further:

[M]ost Freeview set-top boxes can see a “Channel Zero” on channel 306 (multiplex C) but most set-top boxes can’t pick up (or indeed understand) the information contained in it. The [Tesco Technika or Dion branded box[es] with ‘Channel Zero’] … can read the content of this channel – it’s this channel I have been given access as a conduit to delivering content.

I can imagine getting marketing to sponsor a cookery show and allow compatible set-top box (or TV) users to get the ingredients listed on the screen at the push of a button and they use the remote control to quickly add one or more of them to their online grocery basket without getting in the way of the watching the show. Importantly, this would work whether the show is being watched live or played back via PVR (on future PVR-enabled boxes).

Interesting…. And also interesting to see how this compares with the pretensions of a “global online university” that has had a “close” relationship (i.e. gives them cash) with the BBC for years… ;-)
(Just by the by, I’m also reminded that Tesco has started producing straight to DVD films for sale exclusively in Tesco Stores (Tesco goes to Trolleywood), and wonder: will Channel 0 stream video trailers too…?!;-)

- How to make “Sat-nav” work inside a Tesco Store: over recent months, folk at several HEIs have started mulling over the notion of on-campus location services (e.g. to my knowledge at least: @stuartbrown and @liamgh at the OU, @alexbilbie at Lincoln; any others?). Once you get indoors, there’s a problem though, because GPS doesn’t work when line of sight is lost to the satellites… which makes indoor use difficult… One alternative is to use wifi triangulation, detecting the relative signal strengths of various wifi hotspots whose location you know, and calculating location based on that.

TEsco - wifi hotspot triangulation

Which is what this post describes, along with several example use cases. (Again, just by the by, I notice Tesco previously has a patent in the wifi area: PERFORMANCE ENHANCING WIRELESS NETWORK CONFIGURATION). Of course, if your phone knows where you are, then so does the Tesco App. But then again, in shopping centre localisation/shopper tracking is old news (old Sunday Times article: Shops track customers via mobile phone).

- QR Codes are all the rage at the moment, aren’t they? As for example, QR codes now appearing on Tesco print ads, as are on-phone barcode canners (which make it easy to add things to your shopping list when you’re at home or possibly also take things off your shopping list once you add them to your basket in store…) Of course, QR codes are just one integration point between the physical world and the digital:

Tesco mobile

- SMS is pretty much universal, whereas smartphones aren’t. Here’s an example for a link request using SMS: Search the Tesco Recipe site using an SMS text message. How does it work?

- Type ‘COOK’ followed by two or three of the key ingredients you have observed.
- Send the message to 83726 – that’s “TESCO” spelt out on your phone’s keyboard.

and get a link back to a URL on the Tesco recipe site for a recipe containing those ingredients. All that’s needed for a full SMS round trip is a collection of tiny recipes, such as those published by @cookbook… (e.g. as described in this New York Times article: Take 1 Recipe, Mince, Reduce, Serve)

It’s easy for the online echo chamber to focus in on what Facebook and Google are up to… But don’t forget the real world… there’s a huge potential for evil there too…!;-)

Written by Tony Hirst

November 1, 2010 at 12:04 pm

Posted in Thinkses

Tagged with

Follow

Get every new post delivered to your Inbox.

Join 126 other followers