OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for June 2010

Don’t Tell Us What To Do – Let Us Surprise You…

leave a comment »

The dreaded lurgy beat me yesterday and so far today, so I’ve spent the last 36 hours drifting in and out of fitful sleep with a podcast backing track…

Here’s something I woke up to a few minutes in that might well have an audio quote or two worth grabbing – Clay Shirky on why you shouldn’t tell people what to do if you want them to participate collaboratively… this maybe has consequences for the way we think of designing forum related discussion activities in online courses?


Clay Shirky, Technology Insight, O’Reilly Media Gov 2.0 Summit, September 2009 [via IT Conversations]

Written by Tony Hirst

June 11, 2010 at 11:59 am

onFormSubmit – Raising Web Scale Events in Google Spreadsheets

leave a comment »

What happens if you want to actually do something with a particular response from a web survey form at the time it is submitted, other than just collect it?

One of the handy things about Google Spreadsheets is the ability to create interactive web survey forms that can collect data that is then posted into a corresponding spreadsheet. Around the time of the Google I/O event, several event related features were released as part of Google Apps script, the javascript scripting framework that supports an increasing number of Google apps. And by “event” I don’t mean something like the upcoming Isle of Wight Festival – I mean computational events, that can be used to trigger other computational actions…

One of the new events is onFormSubmit, which I finally got round to playing with last night. Here’s my “Hello World” example:

So here’s the code:

//Test function for Google Apps Script onFormSubmit
//Two sheets in a single spreadsheet doc
//First sheet corresponds to form
//Second sheet just displays one of the elements from the most recent form submission
// the function testOnSub() has a trigger associated with it: 'From spreadsheet' 'On form submit'
function testOnSub() {
  var ss = SpreadsheetApp.openById(SPREADSHEET_KEY);
  var sheet=ss.getSheets()[1];
  var form=ss.getSheets()[0];
  var lr=form.getLastRow();
  var el=form.getRange(lr,2,1,1).getValue();
  var t=el;
  sheet.getRange(1,1,1,1).setValue(t);
}​

Here’s how to set it…

Google apps script - spreadsheet events

What next? Earlier this week, I watched a compelling presentation from @progrium, based around the following slide deck:

Among the handy tools demonstrated (I loved the idea of clickhooks (src), clickable links with webhook callback actions) was a webhook debugging tool, postbin. What this tool does is just capture and redisplay stuff that is posted to it… which makes it ideal for a quick demo…

So for example, suppose I have a Google form set up, and I want to perform a particular action using a third party webservice on some element contained in the form submission, or maybe only on certain items according to what information was submitted via the form, as soon as the form is submitted. Here’s one way of doing that (code on gisthub):

// Simple spreadsheet, with first sheet containing form submission repsonses
// when the form is submitted:
// 1) grab the latest response,
// 2) post it to a third party service via an HTTP POST
function testWebhook() {
  var ss = SpreadsheetApp.openById(SPREADSHEET_ID);
  var form=ss.getSheets()[0];
  var lr=form.getLastRow();
  var el=form.getRange(lr,2,1,1).getValue();
  var t=el;
  //The following escape palaver is gleaned from a Google help forum...
  var p="val1="+encodeURIComponent(t).replace(/%20/g, "+")+"&val2="+encodeURIComponent(form.getRange(lr,3,1,1).getValue()).replace(/%20/g, "+");

  // Here's where we do the callback...
  var x=UrlFetchApp.fetch('http://www.postbin.org/YOURPASTEBINID',{method: 'post', payload: p});
}​

Attach the on form submit trigger event to the function, and here’s the response when we submit a form:

Pastebin response from Google spreadsheet onFormSubmit callback

Clever, eh?

So what does this mean? It means that I can set up a Google Survey form and as soon as anyone posts a submission, I can process it, either within the Google Apps environment using Google Apps script, or using third party services that accept an HTTP post input.

As Jeff Lindsay suggests, the evented web is increasingly a reality…

Written by Tony Hirst

June 9, 2010 at 12:54 pm

Posted in Google Apps, Tinkering

Tagged with

Liberating Data from the Guardian… Has it Really Come to This?

with 2 comments

When the data is the story, should a news organisation make it available? When the Telegraph started trawling through MPs’ expenses data it had bought from a source, industry commentators started asking questions around whether it was the Telegraph’s duty to release that data (e.g. Has Telegraph failed by keeping expenses process and data to itself?).

Today, the Guardian released its University guide 2011: University league table, as a table:

Guardian university tables, sort of

Yes, this is data, sort of (though the javascript applied to the table means that it’s hard to just select and copy the data from the page – unless you turn javascript off, of course:

Data grab

but it’s not like the data that the Guardian are republishing it in their datastore, as they did with these league tables…:

Guardian datastore

…which was actually a republication of data from the THES… ;-)

I’ve been wondering for some time when this sort of apparent duplicity was going to occur… the Guardian datastore has been doing a great job of making data available (as evidenced by its award from the Royal Statistical Society last week, which noted: “there was commendable openness with data, providing it in easily accessible ways”) but when the data is “commercially valuable” data to the Guardian, presumably in terms of being able to attract eyeballs to Guardian Education web pages, there seems to be some delay in getting the data onto the datastore… (at least, it isn’t there yet/wasn’t published contemporaneously the original story…)

I have to admit I’m a bit wary about writing this post – I don’t want to throw any spanners in the works as far as harming the work being done by the Datastore team – but I can’t not…

So what do we learn from this about the economics of data in a news environment?

- data has creation costs;
- there may be a return to be had from maintaining limited, priviliged or exclusive access to the data as data OR as information, where information is interpreted, contextualised or visualised data, or is valuable in the short term (as for example, in the case of financial news). By withholding access to data, publishers maintain the ability to generate views or analysis of the data that they can create stories, or attractive website content, around. (Just by the by, I noticed that an interactive Many Eyes widget was embedded in a Guardian Datablog post today:-)
- if you’ve incurred the creation cost, maybe you have a right to a limited period of exclusivity with respect to profiting from that content. This is what intellectual property rights try to guarantee, at least until the Mickey Mouse lawyers get upset about losing their exclusive right to profit from the content.

I think (I think) what the Guardian doing is not so different to what the Telegraph did. A cost was incurred, and now there is a (hopefully limited) period in which some sort of return is attempting to be generated. But there’s a problem, I think, with the way it looks, especially given the way the Guardian has been championing open data access. Maybe the data should have been posted to the datablog, but with access permissions denied until a stated date, so that at least people could see the data was going to be made available.

What this has also thrown up, for me at least, is the question as to what sort of “contract” the datablog might have, implied or otherwise, with third parties who develop visualisations based on data in the Guardian Datastore, particularly if those visualisations are embeddable and capable of generating traffic (i.e. eyeballs, = ad impressions, = income…).

It also gets me wondering; does there need to be a separate datastore? Or is the ideal case where the stories themselves are linking out to datasets directly? (I suppose that would make it hard to locate the data? On second thoughts, the directory datastore approach is much better…)

Related: Time for data.ac.uk? Or a local data.open.ac.uk?

PS I toyed with the idea of republishing all the data from the Guardian Education pages in a spreadsheet somewhere, and then taking my chances with the lawyers in the court of public opinion, but instead, here’s a howto:

Scraping data from the Grauniad

So just create a Google spreadsheet (you don’t even need an account: just go to docs.google.com/demo), double click on cell A1 and enter:

=ImportHtml(“http://www.guardian.co.uk/education/table/2010/jun/04/university-league-table”,”table”,1)

and then you’ll be presented with the data, in a handy spreadsheet form, from:
http://www.guardian.co.uk/education/table/2010/jun/04/university-league-table

For the subject pages – e.g. Agriculture, Forestry and Food, paste in something like:

=ImportHtml(“http://www.guardian.co.uk/education/table/2010/jun/04/university-guide-agriculture-forestry-and-food”,”table”,1)

You can probably see the pattern… ;-)

(You might want to select all the previously filled cells and clear them first so you don’t get the data sets messed up. If you’ve got your own spreadsheet, you could always create a new sheet for each table. (It is also possible to automate the scraping of all the tables using Google Apps script: Screenscraping With Google Spreadsheets App Script and the =importHTML Formula gives an example how…))

An alternative route to the data is via YQL:

Scraping HTML table data in YQL

Enjoy…;-) And if you do grab the data and produce some interesting visualisations, feel free to post a link back here… ;-) To give you some ideas, here are a few examples of education data related visualisations I’ve played around with previously.

PPS it’ll be interested to see if this post gets picked up by the Datablog, or popped into the Guardian Technology newsbucket… ;-) Heh heh…

Written by Tony Hirst

June 8, 2010 at 1:21 pm

Time for data.ac.uk? Or a local data.open.ac.uk?

with 7 comments

Over a swift half at the end of the rather wonderful Liver’n'Mash, Mike Nolan chatted through some of his thoughts around his presentation on data.ac.uk that I hadn’t been able to get to see.

From looking back over Mike’s presentation slides, he seems to be advocating “let’s do what we can, as soon as possible”, particularly with respect to sharing things like information about courses and events, as well as other data from university information systems. That emphasis seems to me to be on syndicating information already available on university websites in a more data-like way (RSS news feeds, for example, or calendar feeds); this is similar to the approach taken by the new DirectGov API, I think?

(By the by, I came across this related presentation earlier today, something I’d prepared for the SocialLearn project,as was, as couple of years ago: Portable Course Data.)

As part of the same conversation, Brian Kelly suggested that just as the open data lobby had been calling for government to open up it’s data, government might well respond by calling up public sector organisations to open up their data. This has already started to happen, for example with a letter from Downing Street last week calling on local councils to get ready to open up some of their financial and organisational chart data.

I think Brian is right in suggesting that Higher Education should brace itself to expect similar treatment… (A lot of this data is already out there, it has to be said. For example, here’s a spreadsheet detailing VCs’ pay.)

So what is my take on how to get started with data.ac.uk, or a more local version, such as data.open.ac.uk?

To my mind, the quickest start is to just republish data that is already available in data form. So for example:

- student satisfaction data is available from the Direct Gov Unistats service (OU data [XLS]; general download list);
- funding data about current grants is provided on research council sites. The EPSRC, for example, provide a way of accessing spreadsheets for funding received by various OU departments: OU Awards from the EPSRC (see more generally the full list of funded organisations; (if you know similar ways of getting similar data from other research councils, or funders such as JISC, please post a link in the comments to this post:-)
- financial data, where already published; the OU’s public financial statements can be found on the Freedom of Information minisite, for example (OU FOI: financial statements);
- organisational data, where already published. Again the OU seems to be ahead of the game on this one via the FOI site: OU FOI: organisational structure; (the FOI site also includes pay grade details, so you’ll be able to see just how overpaid I really am, despite all my wittering;-)
- RAE (Research Assessment Exercise) data: one possible source of this information is the Guardian DataStore (Guardian datastore: RAE data, original data from rae.ac.uk [XLS]).

(From that quick list, the OU seems to be doing really well via the OU FOI website. Are other HEIs as far on as this, I wonder, or does having Open in the university name create raised expectations around the OU on matters such as this?!)

The Guardian has also republished quite a range of additional HE related data in its datastore, some of which I’ve even played with before… e.g. Does Funding Equal Happiness in Higher Education? (though there have been one or two, err, niggles with the data… in previous spreadsheets;-) or for a fuller list: OUseful visualisations around education data.

Another possible source of data in a raw form is from the data.gov.uk education datastore (an example can be found via here, which makes me wonder about the extent to which a data.ac.uk website might just be an HE/FE view over that wider datastore? (Related: @kitwallace on University data.) And then maybe, hence: would data.*.ac.uk be a view over data.ac.uk for a particular institution. Or *.sch.ac.uk a view over a data.sch.ac.uk view over the full education datastore?

As to how best to publish the data? That’ll probably take another post, though a really quick win could be achieved by just grabbing the appropriate data from a Guardian datastore spreadsheet on Google docs, putting it into another Google doc, and then just embedding it in a page…;-)

PS In his post, Mike mentioned an old hack of mine that searched for autodiscoverable RSS feeds on *.ac.uk websites. I’d also done one that puts up screenshots of 404 pages… Maybe I need one that looks for the existence of data.*.ac.uk subdomains?!

PPS Finally, it’s probably worth just paying heed to notions of Good and bad Trasnparency. The line I’m suggesting above is one of convenient discovery as much as anything else, pulling (links to) all the data sets related to an institution into an area of the institution’s own website. Cf. the similar approach taken by data.gov.uk, which is to act primarily as a directory layer, as well as hosting national level datastores for particular datasets.

Written by Tony Hirst

June 7, 2010 at 5:38 pm

Ba dum… Education for the Open Web Fellowship: Uncourse Edu

with one comment

A couple of weeks ago, I started getting tweets and emails linking to a call for an Education for the Open Web Fellowship from the Mozilla and Shuttleworth Foundations.

The way I read the call was that the fellowship provides an opportunity for an advocate of open ed on the web to do their thing with the backing of a programme that sees value in that approach…

…and so, I’ve popped an (un)application in (though not helped with having spent the weekend in a sick bed… bleurrrgh… man flu ;-) It’s not as polished as it should be, and it could be argued that it’s unfinished, but that is, erm, part of the point… After all, my take on the Fellowship is that the funders are seeking to act as a patron to a person and helping them achieve as much as they can, howsoever they can, as much as it is supporting a very specific project? (And if I’m wrong, then it’s right that my application is wrong, right?!;-)

The proposal – Uncourse Edu – is just an extension of what it is I spend much of my time doing anyway, as well as an attempt to advocate the approach through living it: trying to see what some of the future consequences of emerging tech might be, and demonstrating them (albeit often in way that feels too technical to most) in a loosely educational context. As well as being my personal notebook, an intended spin-off of this blog is to try help drive down barriers to use of web technologies, or demonstrate how technologies that are currently only available to skilled developers are becoming more widely usable, and access to them as building blocks is being “democratised”. As to what the barriers to adoption are, I see them as being at least two-fold: one is ease of use (how easy the technology is to actually use); the second is attitude: many people just aren’t, or don’t feel they’re allowed to be, playful. This stops them innovating in the workplace, as well as learning for themselves. (So for example, I’m not an auto-didact, I’m a free player…;-)

The Fellowship applications are templated (loosely) and submitted via the Drumbeat project pitching platform. This platform allows folk to pitch projects and hopefully gather support around a project idea, as well as soliciting (small amounts of) funding to help run a project. (It’d be interesting if in any future rounds of JISC Rapid Innovation Funding, projects were solicited this way and one of the marking criterion was the amount of support a pitched proposal received?)

I’m not sure if my application is allowed to change, but if it doesn’t get locked by the Drumbeat platform it may well do so… (Hopefully I’ll get to do at least another iteration of the text today…) In particular, I really need to post my own video about the project (that was my undone weekend task:-(

Of course, if you want to help out producing the video, and maybe even helping shape the project description, then why not join the project? Here’s the link again: Uncourse Edu.

PS I think there’s a package on this week’s OU co-produced episode of Digital Planet on BBC World Service (see also: Digital Planet on open2) that includes an interview with Mark Shuttleworth and a discussion about some of the work the Shuttleworth Foundation gets up to… (first broadcast is tomorrow, with repeats throughout the week).

DISCLAIMER: I’m the OU academic contact for the Digital Planet.

Written by Tony Hirst

June 7, 2010 at 12:52 pm

Manchester Digital Presentation – Open Council Data

leave a comment »

I spent much of yesterday trying to come up with some sort of storyline for a presentations I’m giving in at a Manchester Digital Open Data: Concept & Practice event tomorrow evening on Open Civic Data, quite thankful that I’d bought myself a bit of slack with the original title: “Open Data Surprise”…

Anywhere, a draft of the slides as here (Manchester opendata presentation):

though as ever, they don’t necessarily give the full picture without me talking over them…

The gist of the presentation is basically as follows: there is an increasing number of local council websites out there making data available. By data, I mean stuff that developers, or tinkerers such as myself, can wrangle with, or council officers actually use as part of their job. The data that’s provided can be thought of along a spectrum ranging from fixed archival data (i.e. reports, things that aren’t going to change) to timely data, such as dates of forthcoming council elections, or details of current or planned roadworks. Somewhere in-between is data that changes on a regular cycle, such as the list of councillors, for example. The most timely of all data is live data, such as bus locations, or their estimated arrival time at a particular bus stop.

Lots of councils are starting to offer “data” via maps. What they are actually doing is providing information in a natural way, which is not a Bad Thing, although it’s not a Data Thing – if I wanted to create my own map view of the data, it’s generally not easy for me to do so. Providing data files that contain latitude and longitude is one way that councils can make the data available to users, but there is then a barrier to entry in terms of who can realistically make use of that data. Publishing geo-content in the KML format is one way we can can improve this, because tools such as Google Earth provide a ready way of rendering KML feeds that is accessible to many more users.

As a pragmatist, I believe that most people who use data do so in the context of a spreadsheet. This suggests that we need to make data available in that format, notwithstanding the problems that might arise from the difficulties of keeping an audit trail of the origins of that data in that format once files become merged. As a realist, I appreciate that most people don’t know that it’s possible to visualise their data, or what sorts of insight might be afforded by visualising the data. Nor do they know how it is becoming increasingly easy to create visualisations on top of data presented in an appropriate way. Just as using KML formats allows almost anyone to crate their own google Map by pasting a KML file URL into a Google maps search box, so the use of simple data formats such as CSV allow users to pull data into visualisation environments such as IBM’s Many Eyes. (For more on this line of thinking, see Programming, Not Coding: Infoskills for Journalists (and Librarians..?!;-)). As a data junkie, I think that the data should be “linkable” across data sets, and also queryable. As a contrarian, I think that Linked Data is maybe not the way forward at this time, at least in terms of consumer end-user evangelism/advocacy… (see also: So What Is It About Linked Data that Makes it Linked Data™?

The data that is starting to be published by many councils typically corresponds to the various functions of the council – finance, education, transport, planning, cultural and leisure services, for example. Through publishing this data in an open way, third party vertical providers can both aggregate data as information across councils, as well as adding contextual value. Some councils are entering into partnership with other councils to develop vertical services to which the council can publish it’s data, before pulling it back into the council’s own website via a data feed. And as to whose data it is anyway, it might be ours, but it’s also theirs: data as the business of government. Which makes me think: the most effective council data stores will be the ones that are used by councils are data consumers in their own right, rather than just as data publishers*.

(* There is a corollary here with open educational resources, I think? Institutions that make effective use of OERs are institutions who use at least their own OERs, as well as publishing them…?)

Recent communications from Downing Street suggest the new coalition government is serious in its aim to open up public data (though as @lesteph points out, this move towards radical transparency is not without its own attendant risks), so data releases are going to happen. The question is, are we going to help that data flow so that it can get to where it needs to go?

Written by Tony Hirst

June 2, 2010 at 12:27 pm

Posted in Anything you want, Infoskills

Tagged with

Plotting Comment Networks in Gephi, Part II – Merging Datasets Using Google Fusion Tables

with one comment

Following on from the previous post on comment graphs, here’s something a little more interesting.

The data I’m working with is a CSV file that contains a list of data pairs; for each comment on a photo, it gives the user ID of the person making the comment and the PhotoID the comment was made on. In a second file, I have pairs of photo ID and the user ID of the person who took the photo. I’m thinking it would be interesting to look directly at the edges between user IDs, using a directed graph in which the edges go from the person who made a comment to the person who uploaded the photo being commented on.

So how to generate this merged data file? One non-programmatic way that occurred to me was to use a Google Fusion Table. Simply upload the two data files, and then create a new one that merges the data around the photo ID (that is, photo ID is the common term that joins together the ID of someone commenting, and the person who uploaded the photo being commented on):

Google Fusion Table

That gives a merged data table that looks something like this:

Merged data

This data can be exported, and the photo ID column removed to give a two column CSV file that contains the user ID of someone who took a photo, paired (optionally) with the ID of a person who commented on that photo.

Where multiple people commented on the same photo, multiple rows will result.

Loading the data into Gephi, colouring nodes by out-degree (from photographer ID to commenter ID) and sizing them by in-degree (number of incoming comments) gives something like this:

comment/phographer network

If we auto-select neighbours and colour the edges according to direction, we get:

Neighbours in a directed graph (gephi)

We can also see what happens if we try to cluster the network using the Modularity filter – several partitions are identified, and expanding one lets us see which users were grouped together, presumbaly becuase there was a high degree of commenting between them:

Gephi - clustering using modularity statistic

If we now run the Network Diameter statistic, we can look at the Betweeness Centrality across the commenting network, colouring for InDegree and sizing for Betweenness:

Gephi - betweenness centrality

The resulting chart shows us which individuals are most active in terms of commenting on, and being commented upon, by other members of the network.

We can generate a similar saw of representation based on favouriting behaviour too. In this case, I started off withteh Favouriter table, and then merged in the Photo user id table – which meant that every favourite had a user ID of the photographer associated with it compared to the above case where I started with the photo table and then merged in the commenter IDs – which meant there were some rows that only had photographer ID (ie some photos had no comments… which means, if we filter the nodes in the comment graph based on in-degree zero, we can view the individuals who have received no comments? (Which may be because they didn’t upload any photos? Eg they might be moderators?)

In the following image, we see individuals whose photos have been heavily favourited. In the top left we see an individual who has favourited lots of of photos (lots of links going out) but not been favourited in return:

Favourting behaviour in gephi

If we return the Network stats, and look at Betweenness, we can see which individuals are favouriting widely across the whole userbase:

Betweenness

So there we have it. Using Google Fusion tables, we can generate a user-to-user graph that relates the IDs of users commenting on (or favouriting) each others photos, based on two separate data sets: one that relates user ID of a commenter to a photo; and a second that relates a photo to the ID of the user who uploaded the photo. The resulting graph userID to userID data allows us to use Gephi to plot diagrams that use directed edges to show person A favourited person B’s photo, or person A had a photo commented on by person B.

Written by Tony Hirst

June 1, 2010 at 2:03 pm

Follow

Get every new post delivered to your Inbox.

Join 150 other followers