Looking over the UK Gov Transparency Board’s draft Public Data Principles, one of the suggested principles (#) proposes that:
Public data underlying the Government’s own websites will be published in reusable form for others to use – anything published on Government websites should be available as data for others to reuse. Public bodies should not require people to come to their websites to obtain information.
One example of how this might work is to look at the Direct Gov Syndication API, but there are maybe some simpler alternatives…? Like RSS…
So for example, over on Mash the State, Adrian Short had a go at hassling local councils into publishing RSS feeds by the end of 2009, although not many of them took up the challenge at the time… (maybe the new principles will nudge them towards doing this?) Here, for example, are some obvious starting points:
– council news (here’s an example council news feed from Shropshire Council);
– recent planning applications (here’s an example Planning RSS feed from Lichfield District Council);
– current roadworks (here’s an example traffic/roadworks feed from Glasgow City Council);
– council jobs (here’s an example council advertised jobs feed from Sutton Council);
– current consultations (here’s an example open consultations feed from Bristol City Council).
In accord with another of the draft Public Data principles (#),
Public data will be timely and fine grained – Data will be released as quickly as possible after its collection and in as fine a detail as is possible. Speed may mean that the first release may have inaccuracies; more accurate versions will be released when available.
Release data quickly, and then re-publish it in linked data form – Linked data standards allow the most powerful and easiest re-use of data. However most existing internal public sector data is not in linked data form. Rather than delay any release of the data, our recommendation is to release it ‘as is’ as soon as possible, and then work to convert it to a better format.
even if the published feeds could be better (e.g. planning feeds might benefit from geo-data that allow planning notices to be displayed at an appropriate location on a map), there’s no reason not to start opening up this “data” now in a way that supports syndication.
At a government departmental level, one of the things I’ve been interested in tracking previously has been government consultation announcements. It’s possible to search for these, and generate email alerts and RSS subscriptions, via Tell Them What You Think. A list of government department consultation websites can also be found on Direct Gov: list of Government consultation websites. (To make that list a little more portable, I popped it onto my doodlings area of WriteToReply; WTR: Government consultation websites; and courtesy of the magic of the digress.it theme we run there, it’s easy enough to get an RSS feed out with each department listed as a separate item (although rather than resolving to the consultation web page URLs, the feed links point back to the corresponding paragraph on Doodlings): some sort of feed of Government consultation websites.)
If each of those consultation websites published an autodiscoverable RSS feed containing the currently open consultations (and maybe even made that data available as a calendar feed as well, with consultation opening and closing dates specified), it would be simple for aggregating services like Tell Them What You Think, or announcement services like a Direct Gov “New Consultations” feature, to consume and re-present this information in an alternative context.
(Note that consultation websites should also be making consultation information available on consultation web pages in a machine readable way using RDFa. E.g. see @lesteph’s Adding RDFa to a consultation.)
Any changes to website design – changes that break the screenscraping routines used by many services like Tell Them What You Think – would be able to continue operating as long as the RSS feed URLs remained unchanged. (Of course, it might be that aggregating services parse the content of RSS feeds in particular ways to extract structured information from them, essentially scraping the feed contents, so in those cases, if the way feed content was presented were to change, the services would still break…)
Anyway, to return to the draft Public Data Principle I opened this post with, RSS (and related protocols such as Atom) can go a long way towards helping achieve the aim that “[p]ublic bodies should not require people to come to their websites to obtain information”.
Amen to RSS on government sites, but I’ve never quite followed why you’re so keen on feeds being autodiscoverable. Useful, sure, but since you often need to point an app at a domain and perhaps choose the most suitable out of several feeds, what’s the benefit of linking it up in the ?
Re: consultations – the idea is that a machine (perhaps Directgov’s engine) could come and explore links from a known consultation listing page, and crawl the individual documents from that, using the RDFa information to handle opening and closing dates (and therefore status), contact details and so on. As it happens, at DIUS we used to manually roll an atom feed (hang on a sec – oh, it’s still there: http://sandbox.bis.gov.uk/TRASH/atomiser/) which had this for a load of consultations, in the pre RDFa (and pre CMS!) days. This was then used by TellThemWhatYouThink to access our listing easily and accurately.
Why autodisoverability?
– future proofing for one… browsers have been aware of RSS for ages, but most folk are blind to the fact and don’t make use of it. One day, someone may figure out how to make RSS work for everyday users;
– pragmatics, for another. Eg given a list of homepage URLs for councils, OpenlyLocal could discover and republish additional feed URLs, as well as syndicate content. Sometimes you can make a good guess from a feed URL or title what it’s about (sometimes you can’t). If there was a de facto convention about what to put in the autodiscovery title element, it would make it easier for machines to locate and subscribe to relevant feeds just given a website homepage.
On several occasions in the past, I’ve taken lists of URLs and run them a Yahoo pipe with a feed autodiscovery block, and as a result generated aggregated news feeds in an effortless way.
Re: the RDFa: agreed, that can provide an unobtrusive way of making machine readable data available to harvesters. The only prob with it (as with putting autodiscovery link tags in page headers) is that it’s yet another tweak that has to be made to a page template. And eg in a local council context, that may be difficult to get at, as this post suggests:
http://data.gov.uk/blog/publishing-local-open-data-important-lessons-open-election-data-project
In terms of the Public Data principle that opens the above post, I think RSS provides a pragmatic solution to it.
If you can add links to the page head, then why not support autodiscovery too? Eg if you have crazy URL schemes, autodiscovery may be able to help a client do some sort of ad hoc content negotiation, eg in terms of getting hold of an RSS version of a page’s content, rather than the default heavily templated HTML page, with all its attendant chrome? (So for example, if I could put autodiscovery links into this hosted WordPress blog, I’d have one to the “RSS version of this page” link that appears in the Page hacks in the right hand sidebar.)
tony
Hi Tony,
Found this via Steph’s site. Nice piece.
RDFa makes complete sense, we’ve put our money where our mouth is in this respect:
http://www.delib.co.uk/dblog/were-making-opinion-suite-join-up-with-linked-data-through-rdfa/
Also interestingly dealing with an RSS situation at the moment syndicating public contributions. We chose by design to show only moderated contributions – should emphasise this was our choice not a client request. It’s a tricky balance though; I’m concerned to be as open as possible within the limits of practicality around offensive content. Also the utility of the feeds for mashing up, analysis etc is then dependent on the speed of moderation, going faster has a cost / workload implication for moderators.
Hi Andy,
Thanks for the comment. I think one of the things we need to think about on WriteToReply is the way we use of RDFa so we can play nicely in a linked way with other consultation related activities going on out there. One approach might be to have some sort of metadata cover sheet or solicitation form that could be configured with operational consultation data (or import it from the RDFa on a parent consultation site) and then republish it as metadata in a standard way.
WRT RSS, there is always the danger – particularly now with PUSH, that an inappropriate comment can go a long way round the web very quickly… Post-moderation is always going to be at risk of this, I think? I think things like Twitter have a policy defined so that if a user deletes a Tweet, downstream services should respect that deletion, but I’m not sure a similar protocol has been defined as part of pubsubhubbub?
tony