A few days ago, I had reason to start pondering URI schemes for open data released by educational institutions. The OU, like a couple of other HEIs, is looking at structuring – and opening up – various sorts of data, and there are also mutterings around what a data.ac.uk styled site might have to offer.
Being a lazy sort, it seems to me that in figuring out how we might collate data from across the ac.uk environment, we could look to the gov.uk environment. So for example, data.gov.uk as a central index over data from both central and local government, which each have their own concerns, and within a type, are likely to share some common features: all local councils will have some of the same sort of data to share, government departments might share some requirements for consistent, centralised reporting (such as website costs and usage) as well their own peculiar data releases, and so on. In the ac.uk context, we have the HEIs (and FE colleges) in one set, research councils and other project related funding bodies in another.
If we look to local council data, we can also spot intermediate layers appearing that apply a canonical structure to a range of variously published data from the local councils. For example, Openly Local is making a play to act as the canonical source for a whole range of local council data across all the UK’s councils; the Police API “allows you to retrieve information about neighbourhood areas in all 43 English & Welsh police forces”, RateMyPlace is a “one stop shop for information on Food Safety Inspections in Staffordshire”, aggregating information from several councils and representing it via a single API, and so on. (For an example of how different councils can publish ostensibly the same data in a wie variety of formats, see Library Location Data on data.gov.uk).
Looking at the list of local councils with open data sites as collected on the OpenlyLocal open data scoreboard (and as extracted from theOpenlyLocal API via a Yahoo Pipe), are any conventions appearing to emerge in the location of local council open data homepages?
– http://www.aberdeencity.gov.uk/open_data/open_data_home.asp (Aberdeen City Council)
– http://www.bournemouth.gov.uk/Data/ (Bournemouth Borough Council)
– http://www.bristol.gov.uk/opendata (Bristol City Council)
– http://www.darlington.gov.uk/Generic/Info/opendata.htm (Darlington Borough Council)
– http://www.eaststaffsbc.gov.uk/opendata/Pages/default.aspx (East Staffordshire Borough Council)
– http://eastsussex.gov.uk/about/standards/opendata.htm (East Sussex County Council)
– http://www.eden.gov.uk/about-this-site/open-data/ (Eden District Council)
– http://data.london.gov.uk/ (Greater London Authority)
– http://picandmix.org.uk/ (Kent County Council)
– http://www2.lichfielddc.gov.uk/data/ (Lichfield District Council)
– http://data.lincoln.gov.uk/ (Lincoln City Council)
– http://www.brent.gov.uk/xml (London Borough of Brent)
– http://www.hillingdon.gov.uk/data (London Borough of Hillingdon)
– http://www.sutton.gov.uk/index.aspx?articleid=10077 (London Borough of Sutton)
– http://www.rbwm.gov.uk/web/transparency.htm (Royal Borough of Windsor and Maidenhead)
– http://www.salford.gov.uk/opendata.htm (Salford City Council)
– http://www.stratford.gov.uk/opendata (Stratford-on-Avon)
– http://www.sunderland.gov.uk/localpublicdata (Sunderland City Council)
– http://www.trafford.gov.uk/opendata/ (Trafford Council)
– http://opendata.walsall.org.uk/ (Walsall Metropolitan Borough Council)
– http://opendata.warwickshire.gov.uk/ (Warwickshire County Council)
– http://www.westberks.gov.uk/index.aspx?articleid=20365 (West Berkshire Council)
With only a small number of councils fully engaged, as yet, with open data, no dominant top level naming scheme has yet appeared, although there are a couple of early runners:
- /opendata  (e.g. http://www.stratford.gov.uk/opendata)
- /data  (e.g. http://www.hillingdon.gov.uk/data)
- data.  (e.g. http://data.london.gov.uk/)
- opendata.  (e.g. http://opendata.walsall.org.uk/)
As yet, there is no agreement on the following naming approaches:
- /opendata/Pages  (e.g. http://www.eaststaffsbc.gov.uk/opendata/Pages/default.aspx)
- /Data (e.g. http://www.bournemouth.gov.uk/Data/)
- /localpublicdata (e.g. http://www.sunderland.gov.uk/localpublicdata)
- /xml (e.g. http://www.brent.gov.uk/xml)
- open_data/ (e.g. http://www.aberdeencity.gov.uk/open_data/)
Several other councils appear to be offering a specific page to handle (at the moment) open data issues (e.g. http://www.salford.gov.uk/opendata.htm or http://www.westberks.gov.uk/index.aspx?articleid=20365), or even separate domains for their data site (e.g. http://picandmix.org.uk/)
Does any of this matter? At the top level, I’m not sure it does, except in setting expectations and providing a sound footing for a scaleable URI scheme. The Cabinet Office Guidance on designing URI sets, which outlines many considerations that need to be taken into account when defining URI schemes particularly for use as identifiers in RDF inspired Linked Data, suggests that domains should “[e]xpect to be maintained in perpetuity” and that “the choice of domain should provide the confidence to the consumer, …, the domain itself … convey[ing] an assurance of quality and longevity.”
In the foreseeable future, I suspect that (pragmatically) it is likely that the majority of data that will be released in the short term will be published as Excel spreadsheets or inforamlly formatted CSV/TSV data, with some sites publishing raw XML. (As Library Location Data on data.gov.uk describes, even when councils ostensibly release the same sort of data, there is no guarantee that they will do it in similar ways: of the 5 councils publishing the locations of local libraries, 5 different data formats were used… ) It is unlikely that councils will be early adopters of Linked Data across the board. (If they were, it might be seen as excluding users in the short term, because while many people are familiar with working with spreadsheets (a widely adopted “end user” technology for people who work with data in their day job), familiar routes in to and out of Linked Data stores are not there yet…) That said, if local councils do end up wanting to publish data with well formed URIs into the Linked Data space, it would be handy if their current URI scheme was designed with that in mind, and in such a way that the minting of future Linked Data URIs isn’t likely to conflict or clash.