Accessing Government Education Data in Scraperwiki via the Edubase/Education Datastore API

There’s lots of education data out there, but do we always need to scrape it from LEA websites? How can we easily access the data that’s in the central government datastore, and bring it into an environment we’re happy working from?

Although lots of school data has been made available as data for some time now, it’s publication as Linked Data means there’s a considerable barrier to entry in terms of functional access to, and use of, the data. (For an example of how to interrogate the Edubase Linked Data API in Scraperwiki, see Accessing Linked Data in Scraperwiki via YQL.) As an attempt to mask some of the horrors of SPARQL from mortal users, @jenit et al worked on a simpler webservice that meant you could access the data.gov.uk education datastore via a “friendly”, human readable URL, such as the following:

  • http://services.data.gov.uk/education/api/school/constituency-name/Horsham: list of schools within the constituency of Horsham
  • /education/api/school/local-authority-name/{la}: schools by local authority name
  • /education/api/school/district/{district} : schools by district ID (I think? Not sure exactly what ID’s these might be?)
  • /education/api/school/area/{minLat},{minLong};{maxLat},{maxLong}: schools within a particular geogrpahical area, as described by a latitude/longitude bounding box.

There’s a wealth of data that can be returned about a school, so various views over the data are also supported using a URL variable (for example, ?_view=provision or &_view=performance

short – shows very basic information
medium – shows a few more fundamental details about the schools, such as its address
provision – describes the kind and number of children that they take
location – describes where the school is
performance – gives information related to their performance
admin – gives administrative information
all – gives you everything that’s known about each school

If you know any particular data attributes you want to further filter the results on, they can be specified literally. For example, the following (far from complete) list of attributes gives some idea of what’s possible, this time passed via explicit URL args:

  • ?nurseryProvision=true
  • &gender.label=Girls
  • ofstedSpecialMeasures=true
  • for searching number ranges, the min- and max- prefixes may be applied to certain parameters. For example: &max-statutoryHighAge=10 searches for schools where statutoryHighAge<=10

Jeni did a great write up of the API at A Developers’ Guide to the Linked Data APIs – Jeni Tennison (which I cribbed from heavily in the above;-). You can find a full overview of the education API documentation here: Linked Data API Configuration APIs: Edubase API

So… how can we use this in Scraperwiki? Here’s a demo:

import simplejson
import urllib
import scraperwiki

#------- USER SETTINGS ------
# Original API documentation at: http://services.data.gov.uk/education/api/api-config#schools
# Original blog post by @jenit describing the API used: http://data.gov.uk/blog/guest-post-developers-guide-linked-data-apis-jeni-tennison
# Original blog post describing this Scraperwiki page: https://blog.ouseful.info/2010/11/03/accessing-government-education-data-in-scraperwiki-via-the-edubaseeducation-datastore-api/

# The main query
eduPath='school/constituency-name/Horsham'

# Filters, as a list:
eduFilters=['min-statutoryHighAge=7','max-statutoryHighAge=10']

# _views - not considered yet...

# key and label data is displayed in the console for each result, and added to the Scraperwiki database
# keys are the top level attributes we want to display. For a result item, display each item[key]
keys=['establishmentNumber','label']

# labels are used to display labels of top level items, e.g. item[label]['label']
labels=['typeOfEstablishment','phaseOfEducation']
# Note, if you have item[path][wherever][label], or deeper down a path, we don't handle that (yet?!)

# The school ID will always be added to the Scraperwiki database (it's the database ID for a record).
# If latitude/longitude data is available, it will also be added to the database.


# Note that the script doesn't yet handle multiple pages of results either...

#-------------------------- 
  
# This function displays the results, and also adds results to the Scraperwiki database.
# We always look for school ID (this is the table ID) and latlng for mapping, if that data exists
def printDetails(item,keys=['establishmentNumber','label'],labels=[]):
    txt=[]
    record={}
    for key in keys:
        if key in item:
            txt.append(str(item[key]))
            record[key]=item[key]
        else:
            record[key]=''
    if 'establishmentNumber' not in keys:
        record['establishmentNumber']=item['establishmentNumber']
    for attribute in labels:
        if attribute in item:
            txt.append(item[attribute]['label'])
            record[attribute]=item[attribute]['label']
        else:
            record[attribute]=''
    if 'lat' in item:
        latlng=(item['lat'],item['long'])
        scraperwiki.datastore.save(["establishmentNumber"], record,latlng=latlng)
    else:
        scraperwiki.datastore.save(["establishmentNumber"], record)
        pass
    print ', '.join(txt)    
    
    
# This is where we construct the Edubase Linked Data API URL, and then call it, returning JSON
# Need to find a way of handling results spread over several results pages
data=simplejson.load(urllib.urlopen('http://services.data.gov.uk/education/api/'+eduPath+'.json'+'?'+'&'.join(eduFilters)))['result']
items=data["items"]

for item in items:
    printDetails(item,keys,labels)
    print item

You can find the code running on Scraperwiki here: ouseful scraperwiki – playing with Education datastore API

Here’s an example of what gets put in the Scaperwiki database:

Example scraperwiki datatable - education datastore API

Hopefully what this demo does is show how you can start exploring the Education datastore in Scraperwiki withougt having to do too much. More explanation/guidance, or at least futher examples, are required in order to demonstrate:
– the construction of valid “eduPath” statements, if possible showing how they can reuse identifier codes from other sources;
– the use of different _views, and maybe handlers for those views that add all the data to the Scraperwiki database automagically;
– how to inspect returned results so you can identify what keys and labels can be used from a result when you want to construct your own Scraperwiki database records;
– handlers for data down the result item path (i.e. more than handlers just for item[key] and item[label][‘label’], but also item[here][there], item[here][there][everywhere][‘label’] etc.)
– results are only pulled back from the first page of results; need to find some way of handling results over multiple pages, maybe limiting results to a max number of results within that. (Maybe the tweepy Cursor code could be reused for this???)