OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for the ‘Policy’ Category

Data Sharing is Good, Right? Or is HM Gov Evil?

I made a couple of soft resolutions to myself at the start of this year, one of which was to try to take more interest in policy matters, particular in areas that impact upon the web and “information”. But I suspect that getting my head round the implications of proposed new legislation is going to be non-trivial.

For example, the MySpace generation believes that sharing personal information in public is the thing you do, right? But what about when government agencies can freely share your personal data between themselves.

Becuase it seems that Her Majesty’s Government also seems to think the naive MySpace generation way… A couple of days ago, the Coroners and Justice Bill was introduced to Parliament containing a proposed amendment to the Data Protection Act:

152 Information sharing
After section 50 of the Data Protection Act 1998 (c.29) insert—
50A Power to enable information sharing
(1) Subject to the following provisions of this Part, a designated authority may by order (an “information-sharing order”) enable any person to share information which consists of or includes personal data.

(3) For the purposes of this Part a person shares information if the
(a) discloses the information by transmission, dissemination or otherwise making it available, or
(b) consults or uses the information for a purpose other than the purpose for which the information was obtained.”

I’m not sure what this might mean in practice (if you can think of any scenarios, please post them as a comment), so I’ll try to keep an ear out for what examples are given by Members when the Bill goes through its readings.

It seems, though, that there are “Explanatory notes” that explain the intention behind some of the proposals: Explanatory notes (Clause 152: Information Sharing):

691. Section 50A(1) creates an order-making power to enable a person to share information that consists of, or includes, personal data. The power is conferred on a designated authority. “Designated authority” is defined in new section 50A(2) as the Scottish Ministers, the Welsh Ministers, a Northern Ireland Department or an appropriate Minister. Section 50C determines when each of these designated authorities is entitled to make an order. An order under section 50A is known as an information-sharing order.

692. New section 50A(3) sets out the definition of data-sharing for the purposes of this section. Sharing in this section includes both the disclosure of data between two or more persons (such as when one company provides its client list to another company for commercial purposes), as well as where a single person uses some data for a purpose other than that which it was obtained for (for example where a Government Department obtains information for the purposes of exercising one particular statutory function such as the collection of tax but then later wishes to use the same information for another statutory function such as the provision of benefits and credits).

Here’s a bit more from the introduced Bill itself:

50B Information-sharing orders: supplementary provision
(1) An information-sharing order may—
(a) confer powers on the person in respect of whom it is made;
(b) remove or modify any prohibition or restriction imposed (whether by virtue of an enactment or otherwise) on the sharing of the information by that person or on further or onward disclosure of the information;
(c) confer powers on any person to enable further or onward disclosure of the information;
(d) prohibit or restrict further or onward disclosure of the information;
(e) impose conditions on the sharing of information;
(f) provide for a person to exercise a discretion in dealing with any matter;
(g) enable information to be shared by, or disclosed to, the designated authority;
(h) modify any enactment.

Now I’m not a lawyer, and I don’t speak Legislation, but what do paragraphs b and c mean exactly? In “real terms”? And how do they operate differently to g? Read them again… go on… read them…

(And “confer powers”? WTF? Like Heroes, or something?! Heh, heh… but seriously, folks, how far does that “powers” word unpack…?)

And some more:
(1) An information-sharing order may—
(b) remove or modify any prohibition or restriction imposed (whether by virtue of an enactment or otherwise) on the sharing of the information by that person or on further or onward disclosure of the information;
(c) confer powers on any person to enable further or onward disclosure of the information;

(g) enable information to be shared by, or disclosed to, the designated authority;

Paragraph f looks to me (but what do I know?) like a get-out/escape clause, if “exercise a discretion” means “I’ll do what the f**k I want” (which is how I’d naively translate it? Maybe a lawyer could correct my interpretation (“I’ll do what the f**k I want if I can bully or bluster you into believing I had a good reason at the time”, maybe?)

Here are the “explanatory notes“:

697. New section 50B of the 1998 Act makes supplemental provisions in relation to the information-sharing order powers in section 50A, and includes a non-exhaustive list of the kinds of provisions that may be included in an order under section 50A. New section 50B(1) provides that an order may remove or modify any legal barrier to information-sharing. This could be by repealing or amending other primary legislation, changing any other rule of law (for example, the application of the common law of confidentiality to defined circumstances), or creating a new power to share information where that power is currently absent. This section also provides for the conferral of powers on particular persons, the imposition of prohibitions and restrictions upon disclosure or sharing, and the provisions of a power to allow persons to exercise a discretion in dealing with such matters.

When popular websites changes their terms and conditions, it sometimes hits the blogosphere. But we have to remember that the web isn’t for everyone, and that most people don’t use the web as aggressively as do most OUseful.info readers. But if you live in the UK, the above “terms and conditions” could well apply to you one day soon. So maybe we should be taking more of an interest? (Or maybe y’all do, and it’s just me who’s late to the party…?)

One way forward might be if we encourage the growth of initiatives like the following (reviewed here) by making use of the resources provided and adding our own commentary/contribution to the discussion, as well as airing these matters in a wider forum than the Westminster village?

which leads to a page of White Paper Resources for Media and Bloggers. Resources that us bloggers can use to build a post around (remembering, of course, who actually produced the resources…), or at least that will act as a starting point for developing an understanding of what the government thinks it’s trying to achieve with yet more legislation. (See also: New Opportunities – Goevrnment 2.0 sites.)

(Remember also the reach out into the blogosphere by the debate on the future of Higher Education? We all got involved, right? Err….)

Anyway, anyway, if Tesco Clubcard’s T’s and C’s included the “data-sharing” provisions outlined above, would you be any more or less concerned?

Nah, probably not; who cares anyway…?!

PS if you do want to follow the Bill along, you can track its progress here:
Coroners and Justice Bill 2008-09: Progress of Bill including links to debates

I didn’t notice an RSS feed, though? If I get a chance, I’ll try to put one together over the weekend… (although, arguably, doing it now would be more useful – and more timely – than revising online course materials (for a deadline I’ve already missed) that go live to students for the first time in, err, March 2010, I believe?!;-)

Written by Tony Hirst

January 16, 2009 at 10:30 am

Posted in Evilness, Policy

Glanceable Committee Memberships with Treemaps

A quickie post, this one, to complement a post from a long time ago where I plotted out – as a network – the links between people who served on the same committee on the Isle of Wight Council (Visualising CoAuthors in Open Repository Online Papers, Part 3, half way through the post).

In this case, I trawled the Isle of Wight Council committees to populate the rows of a spreadsheet with column headings “Committee Name” and “Councillor”.

Pasting the results into Many Eyes gives an IW Council membership dataset that can be easily visualised. So for example, here’s a glanceable treemap showing the membership of each committee:

The search tool adds yet another dimension to the visualisation, in this case allowing us to pick out the various committees the searched for named individual sits on.

Here’s a glanceable treemap showing the committees each councillor is a member of:

It strikes me that if the search tool supported Boolean expressions, such as AND and OR (maybe with each term being realised by a different colour bounding box?), it would be possible to explore the variation – or similarity – in make-up of different committees? On the first tree map, this approach would make it obvious which committees the same groups of people were sitting on?

And why would we want to do this? To identify potential clashes of interest, maybe, or a lack of variation in the composition of different committees that might, ideally, be independent of each other?

PS Hmm, I suppose you could use a similar visualisation to look at the distribution of named directors across FTSE 100 companies and their subsidiaries, suppliers and competitors, for example? ;-) Does anyone have simple lists of such information in a spreadsheet anywhere?;-)

Written by Tony Hirst

January 29, 2009 at 1:56 pm

Posted in Data, Policy, Visualisation

Public Policy Engagement with Commentariat

One of the weak resolutions that I made to myself at the start of the year was that I would try to take a little bit more interest in UK national policy development decisions that: a) affect all of us; and b) that I might be “qualified” in some sense to comment on.

So it’s quite handy that UK gov appears to be exploring ways of engaging with online communities.

Last year, I commented on the debate on the future of higher education, which used a public blog as one of the avenues for engagement…

…and a couple of days ago, Twittering MP @tom_watson announced the Power of Information Task Force report. Beta

The what???

A “commentariat” enhanced WordPress version of the Power of Information Task Force report report that uses a version of the Commentpress extension to allow readers to comment on the report at the level of “meaningful chunks’ (that is chunks that are larger than paragraphs and small than whole posts).

We are publishing this report in beta before we hand it in formally to the Government. We wanted to give the the community that has contributed to the Taskforce’s work the chance to make suggestions while the report is in draft. The report will be here for comment for two weeks. We shall make small improvements as we go along. Then we shall consider the views raised, adapt the report if we think it helps makes the case to the Government and hand it in to the Cabinet Office. So please go ahead and comment.

You can read more about Commentariat theme (which has been released as an open theme by it’s DIUS developers :-) on @lesteph‘s blog: Introducing Commentariat & the POI Taskforce Report.

One thing I’d quite like to see is a daily/serialised feed for the report so that I could read it over several days in a series of manageable chunks. After all, us natives of the blogosphere all have acquired attention deficit disorder, don’t we, and can’t cope with reading more than 500 words on any topic all in one go…;-) (Seriously, though, drip feeding the report gets a different dynamic going with the reader that might be worth exploring?)

NB Even with these DIUS Interactive initiatives, it seems that MPs don’t think that the DIUS folks are fully entering into the spirit of online engagement as much as they might (“DIUS ‘has not yet found its feet’“):

[T]he MPs, in their review of the department’s annual report, said that “we had high hopes of DIUS demonstrating innovatory methods of operation”.
“We were disappointed in the examples of innovation in its own operations DIUS cited: use of new social media, ‘hot-desking’ and remote working, which for many are far from new,” they said.

And the recent Ofcom review and the Carter “Digital Britain” interim report are fine examples of the “use of new social media”?! Did you see the number of mentions each one gave to those media my colleague John Naughton would call “pull media” compared to the traditional “push media” model of traditional broadcast? (Martin Belam provides the summary stats in Digital Britain Interim report – first impressions if you didn’t…)

(Just by the by, the DIUS geeks had already run a Commentpress-powered consultation (now closed) with the Innovation Nation: Interactive report. They also have a DIUS Netvibes dashboard running, and a Google CSE running over the DIUS empire… Uploads to the DIUS Youtube channel appear to have stalled recently, though… These attempts at engagement stand in stark contrast to the way the Lords Communications Committee has encouraged the community to comment on its recent report on Government Communications?! or maybe the HMGov doesn’t see value in soliciting discussion and commentary around reports? FFS, Be less boring.)

So where do we come in (we being you, and me, and any other readers of blogs like this…)? Well it’s down to us to start engaging back, isn’t it… After all, it takes two (or more) to have a conversation… As the DIUS folks explore ways of engaging with the feeds’n’comments world we live in, at least at a technology level (using feeds and blog machinery), we have to work with them to bring them into the conversations we are having and engage with them as they are trying to engage with us. There’s bound to bit a bit of fumbling at first, but, we know we’re just making it up all the time anyway, right?!;-)

PS it seems like the DIUS folks are also trying to open things up at the document level? ConsultationXML: getting reusable data out of horrid PDFs. But I’m too tired to chase this through just now to find out exactly what they’re up to… G’night, all…

PPS how could I forget this? Directgov innovate:

Directgov have created the innovate.direct.gov.uk developer network to inform the greater developer community about available resources, to provide a platform to connect with one another, and to showcase new ideas with the aim of supporting and encouraging innovation.

Over time we will provide content feeds and API’s allowing people to develop new and interesting ideas and applications for use by the greater community.

Until it gets up to speed, one of the best places to find government APIs is probably still the Show Us A Better Way site…

PPPS See also: New Opportunities: Resources for media and bloggers – a good attempt at making blog friendly resources available for the New Opportunities White Paper.

Written by Tony Hirst

February 3, 2009 at 12:23 am

Posted in Policy

Tagged with

Comment on “Digital Britain” at WriteToReply.org

In an scathing review of Stephen Carter’s “Digital Britain” interim report – Reporting behind closed doors – technology columnist Bill Thompson noted how difficult it is for the digerati to comment back on the report:

The widespread coverage has certainly provided a rich source of suggestions, comments, ideas and critical reviews to feed into the next stage of the process.

Unfortunately for those who lack access to mainstream media outlets like newspapers and broadcasters or their associated websites, there is no easy way to respond directly to its author. The report website has no information at all on how to make a contribution, and you’ll have to read through 72 pages of the report before you find a suggestion that “organisations or individuals interested in joining the discussion should register their interest at digitalbritain@berr.gsi.gov.uk”

Apparently the Digital Britain team will follow up these expressions of interest, which is nice of them, and we must just hope that Carter and his expert panel will be carefully reviewing every blog post and online comment to ensure they don’t miss anything important.

But it doesn’t have to be this way, as the some of the consultation initiatives coming out of DIUS show (Public Policy Engagement with Commentariat).

So a couple of days ago I posted the following tweet:

And I got this reply…

…which was quickly followed by this one…

And now, two evenings (incl. a rather late night, last night), a lunchbreak and morning coffee later, Joss has writetoreply.org up and running (I got in the way not getting Daily Feeds working;-), a commentpress site for commenting on public documents.

And the first report to be hosted there? Digital Britain – The Interim Report, of course:-)

So if you want to comment on the report, as @billt surely does, head over to http://writetoreply.org/ now and follow the link for the Digital Britain, Interim Report; or go there directly: Digital Britain, Interim Report on writetoreply.org.

We can’t guarantee that anyone who actually produced the report will read the comments, of course, but there is a comment feed for them to subscribe to if they want to;-)

Written by Tony Hirst

February 4, 2009 at 10:38 am

Posted in Policy

Tagged with ,

writetoreply.org – Some Quick Thoughts

So it’s been a fun couple of days getting the writetoreply.org site up and seeing the first few comments roll in to the commentable version of the Digital Britain Interim report.

We made the Guardian Technology blog tonight – Digital Britain: Comments please! – and I can only reiterate the point Jack Schofield made in it:

So far, however, WriteToReply.org has only had 10 comments, spread over six sections and dozens of paragraphs.

I hope this is because not enough people know about it, rather than because not enough people care.

So have you commented yet? (I will as soon as I finish commenting on the POIT report, which I’m still half-way through!;-)

As we get more comments, we maybe able to roll out a few new features, and it will also give us something to work with on a comment dashboard/reporting pattern that we can make available to the report’s authors.

Also, be warned that I’m not going to post too much here about the site – we’ll be starting a blog [UPDATE: available at http://writetoreply.ord/actually] on the writetoreply site itself in a day or to capture what we’re learning and what we’re thinking – so if you’re interested in keeping close tabs on what we’re up to, I’d suggest following @writetoreply on Twitter. (I will post round up/summary linking reports here, though, so you’ll still get to see glimpses of what we’re doing ;-)

If you want to get involved with brainstorming ideas for the site – or suggesting reports to host there – please send a message to @WriteToReply or contribute to the wiki: WriteToReply wiki

One thing I do want to mention here – almost as a note to self, because I’ll pursue this more on the WriteToReply blog – is that even if we don’t get many comments on the site, there is still value in it being there…


Because each paragraph is identified by a named anchor, each paragraph is linked to by a unique URI; for example, here’s a link to Action 1 of the Digital Britain Interim Report:

What this means is that if people want to comment about a particular section, action or paragraph within the report on their own blog or other publication, they can link to it.

Like in this post from the Nominet blog – :

A Storm in a Teacup or a Perfect Storm?

Which results in a Trackback on the WriteToReply site, that is included in the comment feed, and that looks this:

(Note that this is where we have to start upping the spam/trackback spam defense tools!;-)

What this means is that the paragraph, action point, section or whatever can become a linked resource, or linked context, and can support remote commenting.

And in turn, the remark made on the third party site can become a linked annotation to the corresponding part of the original report…


Well through the judicious use of trackbacks, link: search limits on the bigger search engines, and link searches in services like BackType (that I discovered via Euan Semple:-), we’ll find ways of pulling those remote comments and discussions into the writetoreply environment (hopefully…?!;-)

So even if you don’t want to comment on the Digital Britain Interim report on the WriteToReply site, but you do care, why not post your thoughts on your own blog, and link your thoughts directly back to the appropriate part of the report on WriteToReply?

(And remember, the final report will have consequences, so if you have something to contribute, make sure you do… :-)

Written by Tony Hirst

February 5, 2009 at 11:29 pm

Posted in Infoskills, Policy

UK Gov Getting into the Web…?!

Last week I posted about An Example Netvibes Dashboard for the Digital Britain Interim Report on WriteToReply on Actually…, the WriteToReply blog.

It seems that there’s also an official (?) Digital Britain Pageflakes dashboard too (nice to see some WTR feeds make it on there;-):

And it seems that the Cabinet Office are also using Netvibes, along with a whole host of feed powered goodness on a public dashboard being used to support the Open Source, Open Standards and Re–Use: Government Action Plan:

Here’s how it’s described:

To help bring together the online debate around this Action Plan, we’ve set up a public page which contains links to blog posts, news stories and tweets about UK government, open source and open standards. If you write about this online, please use the tag #ukgovOSS to help us find your comment.

Although the ukgovOSS Action Plan was only published yesterday, we’ve already re-published it on WriteToReply: WriteToReply: Open Source, Open Standards and Re–Use: Government Action Plan., which means there’s lots of lovely feed goodness, an e-book version of the Action Plan, and so on.

I also popped up a quick Netvibes demo tab:

This includes a full feed (via a feed reverser pipe) of the document content – so you can read it within the Netvibes context:

along with separate section level comment feeds pulled in from the WriteToReply website.

As soon as we’ve settled on some “tab patterns”, we’ll start publishing tabs to an official WriteToReply Netvibes page.

We’ve also started talking to the Cabinet Office folks about how we can work with them as part of their outreach site.

One possible way forward is for us to syndicate comments, possibly directly to the Cabinet Office Netvibes page. We already have a sort of precedent for this – WriteToReply comments on the Digital Britain Interim Report are being syndicated on the official Have Your Say: Digital Britain website:

Not bad progress for a couple of weeks worth, methinks?!

Written by Tony Hirst

February 25, 2009 at 6:04 pm

Posted in Policy

Tagged with ,

The Fake Digital Britain Report

Jumping on the “Fake” bandwagon, we’ve decided to do a little experiment over on WriteToReply, by providing t’community who complained bitterly about the Digital Britain Interim report an opportunity to come up with something better…

And so, I’d like to announce the The Fake Digital Britain Report wiki.

So if you think that we need 2Gbps rather than 2Mbps broadband access, then argue your case on the wiki pages…

The initial section headings are taken form the original WTR republication of the report (“Digital Britain Interim Report” on WriteToReply although of course, they are subject to change… (A lot of people were complaining that the UK games industry was not well represented in the interim report, so now they have an opportunity to add in the missing section…;-)

As ever, a feed is available from the fake report in the form of a changes feeds to the wiki: Recent changes to “The Fake Digital Britain Report” feed.

Another thing we’re trying to do with the Fake Digital Britain report is find a way of supporting the wiki activity by pulling in comments made to the report on WriteToReply to the “Fake Digital Britain Report” discussion page:

This is achieved using the MediaWIki Extension:RSS:

The re-use of the original section headings in the wiki page means that there’s also a sensible mapping to the comments in the discussion page, which are pulled in at the section level from WTR.

PS We’re also going to have a look at the WIki Article Feeds Extension to see if we can do anything interesting with that… In the meantime, we’ve already got a demonstration of how to pull a mediwiki page into WordPress page here: Guidelines for re-publishers (scraped from the wiki) (uses the Append WIki page plugin (I think?).

Who knew that blikis could be so much fun…?;-)

Written by Tony Hirst

March 1, 2009 at 7:56 pm

MPs Expenses by Constituency (Sort Of…)

A few weeks ago, I posted several maps visualising MPs’ expenses (Visualising MPs’ Expenses Using Scatter Plots, Charts and Maps). A couple of days later, I created another map that I didn’t post at the time, partly becuase it’s very approximate, but it does demonstrate something I haven’t logged on OUseful.info before – how to do overlays on Google maps…

So here’s a the link: MPs expenses block map.

The blocks are defined using the bounding box co-ordinates for each MP’s constituency as made available by TheyWorkForYou (specifically, using the getGeometry API call).

The data set for the map was constructed by adding bounding box data for each constituency to a Dabble DB table, and then joining it with expenses data from another table.

PS following this tweet from @ElrikMerlin “Oh, that IS cool. What happens if you colour boxes by party and simply have area proportional to amount?” I knocked up a quick proportional symbol map that shows the total travel expenses claimed by party, where the circle diameter is proportional to the total expenses and the colour denotes the party.

MPs total travel expenses by party

But’ that’s enough for now… this is supposed to be a holiday weekend, after all…!

Written by Tony Hirst

May 3, 2009 at 12:38 pm

Plug’n’Play Public Data

Whilst listening to Radio 4′s Today programme this morning, I was pleasantly surprised to hear and interview with Hans Rosling about making stats’n’data relevant to Joe Public (you can find the interview, along with a video overview of the Gapminder software, here: Can statistics be beautiful?).

The last few weeks have seen the US Government getting into the data transparency business with the launch of data.gov whose purpose is “to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government”. The site offers access to a wide range of US Government datasets in a range of formats – XML, CSV, KML etc. (The site also gives links to widgets and other (parent?) sites that expose data.)

Providing URIs directly to CSV fils, for example, means that is is trivial to pull the data into online spreadsheets/databases, such as Google spreadsheets, or Dabble DB, or visualisation tools such as Many Eyes Wikified; and for smaller files, Yahoo Pipes provides a way of converting CSV or XML files to JSON that can be easily pulled in to a web page.

Realising that there may be some business in public data, Microsoft, Amazon and Google have all been sniffing around this area too: for Microsoft, it’s the Open Government Data Initiative (OGDI), for Amazon, it’s big data via Public Datasets on AWS, and for Google… well, Google. They bought Rosling’s Trendalyser, of course, and recently made a brief announcement about Public Data on Google, as well as Google Squared, which is still yet to be seen in public. With the publication of a Java support library for the Google Visualisation API open wire protocol/query language, you can see them trying to get their hooks into other people’s data. (The thing is, the query language is just so darned useful;-) Wolfram Alpha recently opened up their computational search over a wide range of curated data sets, and Yahoo? They’re trying to encourage people to make glue, I think, with YQL, YQL Execute and YQL Open Data Tables.

In the UK, we have the National Statistics website (I’m not even going to link to it, it’s that horrible..) as well as a scattered collection of resources as listed on the Rewired State: APIs wiki page; and, of course, the first steps of a news media curated datastore from the Guardian.

But maybe things are set to change? In a post on the Cabinet Office Digital Engagement blog, Information and how to make it useful, Richard (Stirling?) picks up on Recommendation 14 of the POIT (Power of Information Taskforce) Review Final Report, which states:

Recommendation 14
The government should ensure that public information data sets are easy to find and use. The government should create a place or places online where public information can be stored and maintained (a ‘repository‘) or its location and characteristics listed (an online catalogue). Prototypes should be running in 2009.

and proposes starting a conversation about “a UK version of data.gov”:

What characteristics would be most useful to you – feeds (ATOM or RSS) or bulk download by e.g. FTP, etc?
Should this be an index or a repository?
Should this serve particular types of data e.g. XML, JSON or RDF?
What examples should we be looking at (beyond data.gov e.g. http://ideas.welcomebackstage.com/data)?
Does this need it’s own domain, or should it sit on an existing supersite (e.g. http://direct.gov.uk)?

I posted my starter for 10 thoughts as a comment to that post (currently either spamtrapped, or laughed out of court), but there’s already some interesting discussion started there, as well as thoughtful response on Steph Gray’s Helpful Technology blog (Cui bono? The problem with opening up data) which picks up on “some more fundamental problems than whether we publish the data in JSON or RSS” such as:

- Which data?
- Who decides whether to publish?
- Who benefits?
- Who pays?
- For how long?

My own stance is from a purely playful, and maybe even a little pragmatic, position: so what?

There are quite a few ways of interpreting this question of course, but the direction I’ll come at it (in this post at least) is in terms of use by people whose job it isn’t…

Someone like me… so a population of one, then… ;-)

So what do I know? I know how to cut and paste URLs in to things, and I know how to copy other peoples’ code and spot what bits I need to change so that it does “stuff with my stuff”.

I know that I can import CSV and Excel spreadsheets that are hosted online from their URL into Google spreadsheets, and from a URL as CSV into something like Dabble DB (which also lets me essentially merge data from two sources into a new data table). Yahoo Pipes also consumes CSV. I know that I can get CSV out of a Google spreadsheet or Dabble DB (or from a Yahoo pipe if CSV went in). I know that I can plot KML or geoRSS files on a Google map simply by pasting the URL into a Google map search box. I know I can get simple XML into a Google spreadsheet, and more general XML into a Yahoo Pipe. I know that YQL will also let me interrogate XML files and emit the results as XML or JSON. Pipes is good as emitting JSON too. (JSON is handy because you can pull it into a web page without requiring and help from script running on a server.) I’ve recently discovered that the Google Visualisation API query language and open wire protocol lets me run queries on datastores that support it, such as Google spreadsheets and Pachube. I know that Many Eyes Wikified will ingest CSV and then allow me to easily create a set of interactive visualisation

So what would I want from a UK version of data.gov, and why?

- CSV, XML and JSON output, with KML/GeoRSS where appropriate, keyed by a simple URI term;
- a sensible (i.e. a readable, hackable) URI pattern for extracting data: good examples are the BBC Programmes website and Google spreadsheets (e.g. where you can specify cell ranges);
- data available from a URI via an HTTP GET (not POST; GETable resources are easily pulled into other services, POST requested ones aren’t; don’t even think about SOAP;-);
- if possible, being able to query data or extract subsets of it: YQL and the Google Viz API query language show a possible way forward here. Supporting the Google open-wire protocol, or defining YQL open data tables for data sets brings the data into an environment where it can be interrogated or subsetted. (Pulling cell ranges from spreadsheets is only useful where the cells you want are contiguous.)

Although it pains me to suggest hooking into yet more of the Googleverse, a UK version of data.gov could do worse than support the Google visualization API open-wire protocol. Why? Well, for example, with only an hour or two’s coding, I was able to pull together a site that added a front end on to the Guardian datastore files on Google spreadsheets: First Steps Towards a Generic Google Spreadsheets Query Tool, or At Least, A Guardian Datastore Interactive Playground (Okay, okay, I know – it shows that I only spent a couple of hours on it… but it was enough to demonstrate a sort of working rapid prototype…;-)

As to whether the data is useful, or who’s going to use it, or why they’re going to use it, I don’t know: but I suspect that if it isn’t easy to use, then people won’t. If one of the aims of data.gov style approaches is to engage people in conversations with data, we need to make it easy for them. Essentially, we want people to engage in – not quite ‘enterprise mashups’, more civic mashups. I’m not sure who these people are likely to be – activitists, policy wonks, journalists, concrned citizens, academics, students – but they’re probably not qualified statisticians with a blackbelt in R or SPSS.

So for example, even the Guardian datastore data is quite hard to play with for most people (it’s just a set of spreadsheets, right? So what can I actually do with them?). In contrast, the New York Times Visualization Lab folks have started looking at making it easier for readers to intrrogate the data in a visual way with Many Eyes Wikified, which is one reason I started trying to think about what a query’n’visualisation API to the Guardian datastore might look like…

PS just in case the Linked Data folks feel left out, I still think RDF and semweb geekery is way too confusing for mortals. Things like SPARCool are starting to help, but IMHO it’s still way too quirky syntactic for a quick hit… SQL and SQL like languages are hard enough, especially when you bear in mind that most people don’t know (or care) that advanced search exists on web search engines, let alone what it does or how to use it.

PPS see also National Research Council Canada: Gateway to Scientific Data (via Lorcan Dempsey).

Written by Tony Hirst

June 1, 2009 at 11:27 am

Posted in Data, Policy

Tagged with ,


Get every new post delivered to your Inbox.

Join 757 other followers