What Are JISC’s Funding Priorities?

I’ve just got back home from a rather wonderful week away at the JISC Developer Happiness Days (dev8D), getting a life (of a sort?!;-) so now it’s time to get back to the blog…

My head’s still full of things newly learned from the last few days, so while I digest it, here’s a quick taster of something I hope to dabble a little more with over the next week for the developer decathlon, along with the SplashURL.net idea (which reminds me of my to do list…oops…)

A glimpse of shiny things to do with JISC project data (scraped from Ross’s Simal site… [updated simal url] (see also: Prod).

Firstly, a Many Eyes tag cloud showing staffing on projects by theme:

Secondly, a many Eyes pie chart showing the relative number of projects by theme:

As ever, the data may not be that reliable/complete, because I believe it’s a best effort scrape of the JISC website. Now if only they made their data available in a nice way???;-)

Following a session in the “Dragon’s Den”, where I was told by Rachel Bruce that these charts might be used for good as a well as, err, heckling, I guess, Mark van Harmalen that I should probably pay lip service to who potential users might be, and Jim Downing’s suggestion that I could do something similar for research council projects, I also started having a play with data pulled from the the JISC website.

So for example, here’s a treemap showing current EPSRC Chemistry programme area grants >2M UKP by subprogramme area:

And if you were wondering who got the cash in the Chemistry area, here’s a bubble chart showing projects held by named PIs, along with their relative value:

If you try out the interactive visualisation on Many Eyes, you can hover over each person bubble to see what projects they hold and how much they’re worth:

PS thanks to Dave Flanders and all at JISC for putting the dev8D event on and managing to keep everything running so smoothly over the week:-) Happiness 11/10…

Mash/Combining Data from Three Separate Sources Using Dabble DB

Over dinner with friends a couple of nights ago, I was asked how I typically approach problem solving tasks. Thinking about it, it’s a bottom-up AND top-down approach where I attack both ends of the problem (the “what I’ve got now” end and the “ultimate vision”) at the same time, in the hope that the tiny steps taken from each end meet up somewhere in the middle…

So for example, in the dev8D Dragon’s Den I mentioned the desire to put together a thematic choropleth map depicting the funding that’s going into different UK Government office regions as a result of JISC or EPSRC project awards. Here’s how I’ve started to work out how to do that…

(What follows gets a little involved at times, so the main trick to look out for is how to create a single data table by mashing together data from three separate data tables.)

At one end, is the output surface. A quick scout around turned up no flash components or KML overlays I could use on Google maps or ThematicMapping (ffs why can’t National Statistics make some free warez available???) so I opted for the amMap interactive map instead.

To plot the map, I need to be able to sum the value of project grants over lead HEIs within particular GORs (got that?;-) So where’s the data?

All over the place, that’s where…

  • EPSRC Support By Organisation shows the total amount of current project funding awarded to each HEI by EPSRC;

    Hmm, no GOR, no geolocation data… Which means I need a mapping from HEI to GOR…

  • …but the closest I can find is a listing of the postcodes of each HEI: HERO screenscraper, and even that’s a scrape of another service…

    (Thanks @lesteph;-)

  • and finally, here’s a mapping from postcode areas to GORs: postcode area lookup table.

    There’s a warning though: please note “regions” were recorded for my own visual aid and are NOT an attempt to tie in with current UK Administrative Regions.. Hmm – okay – add that one to the caveats/risk assessment. If the maps turns out very wrong, that’s EPSRC’s problem, right, for not making the data available in a clean way?!;-)

Okay, so those are the data sources: one contains HEI names and project funding data, one contains HEI names, location data (well, postcodes) and homepage URIs, and one contains mappings from postcode towns to UK regions (which loosely relate, possibly, to GORs).

Now at this point point I’ve already decided that I want to try use Dabble DB to somehow conflate the data from these three separate sources (though I’m not totally sure how… it’s just something I seem to remember from somewhere and somewhen a long time ago that Dabble DB supports if there are common fields – and matching strings – across different data tables).

Getting the data into Dabble DB is a copy and paste operation, but I’m going to take an intermediate step, highlighting and copy the tables from the separate web pages and pasting them into a Google spreadsheet. Why? Because I already know that this works and it’ll also let me cast an eye over the data to make sure it looks about right.

Looking at the HEI names from EPSRC and the HERO screenscrape, they don’t really match though, which means that Dabble DB won’t be able to use HEI names to idenify common rows in the HE location and EPSRC project tables. However, the HERO screenscrape page does have the HEI homepage URI, and a look beneath the “Go to Site” link on the EPSRC page shows that those links point to the HEI homepage…

…which means I should be able to link items in the EPSRC projects listing to items in the HEI location table by virtue of common homepage URIs.

A quick Javascript bookmarklet hack using this bookmarklet:

javascript:(function (){var a=document.getElementsByTagName(‘a’); for (var i=0;i<a.length;i++){if (a[i].firstChild){var n=a[i].firstChild.nodeValue; if (n) if ((n.match(“site”))) a[i].innerHTML=a[i].href;}};})()

and the URIs are exposed, so I can copy and paste the table and drop it into a spreadsheet, with the HERO data and postcode/region data in separate sheets.

A quick look over the URIs from both sources in the spreadsheets shows minor differences though – some URIs end with a “/” and others don’t (there are also a few broken scrapes that I tidy by hand); now if Dabble DB uses strict string matching to relate data in one table to data in another table (which I’d guess is likely) then missed matches will presumably occur?

So just to be safe, we need a data cleaning stage. To do this, I copy the data from the URI column in each spreadsheet, drop it into my TextWrangler text editor, and just clean up all the URIs so they end with a trailing / by searching for \.uk$ and replacing it with .uk/

Then I copy the URIs from the text editor and past them back into the appropriate column in the appropriate spreadsheet.

Looking at the postcode/GOR table, I need to get one or two letter postal town identifiers from the HEI postcodes, so to do this I copy the postcode column from the spreadsheet, and paste it into my text editor. This time I do a regular expression powered search and replace using this regexp: ([A-Z]+).* and replacing with \1

So now I have three spreadsheets on Google docs, which I can scan by eye to make sure they look okay, then easily copy and paste into separate tables (known as separate categories) in the same Dabble DB project, like this:

– the EPSRC data:

EPSRC data in DabbleDB

– HERO screenscrape data:

– and the postcode/region mapping data:

Now for the fun part; each of the above tables is a separate category, with separate column fields, in a Dabble DB project. It is possible to link a column with a similar column in another category, and consequently “pair” similar items in different tables. (So a column containing a particular URI, for example, in a row in one table/category can be related to a particular row in a particular column in another category/table, if the corresponding cell there contains the same URI (Dabble DB handles the actual pairings, you just have to link the columns).

So playing blind, I linked the URI column in the EPSRC category with a new category, which I called Meta:

This created a new table/category – Meta – with a couple of columns: a ‘Name” column, containing the URIs, and a column that linked back to corresponding entries in the EPSRC project category.

And then I did the same linking for the URI column in the HEI Location table/category, which automatically added another column in the Meta table that linked across to rows in the corresponding HEI Location table:

In the Meta category view, I can now add additional columns that are derived from columns in the other, linked tables. So for example, I can add a derived column corresponding to the value of project grants that is pulled in from the linked EPSRC projects column:

So my Meta table/category now looks like this:

Which is pretty clever I think..? ;-)

But then it gets more so… Suppose I link the Postcode town column from the HEI location table with the Postcode/Regional mapping table:

If you’ve been keeping up, you might now expect the UK HEI to be linked to from the Postcode/Region table, which it is:

But the link is symmetrical… and if one category is linked to a second category that is in turn linked to a third category, the columns from the first category can be used as derived columns in the second and the third category…

…which means in the Meta category, I can pull in columns derived from the Postcode/Region category via the HEI location category, first by grabbing the postcode town column into Meta:

To give this:

Then pull in a further derived field from the postcode town column from the Postcode/Region category:

And so now we have a rather more complete Meta category view containing linked items from all three tables (one of which is actually linked indirectly via one of the others):

Clever, eh??? So now I know how to annotate data in one table using data from another table if the two tables each have a column that contains similar data :-)

Okay, so now I have a table that contains rows that contain both project funds and UK regions info – so now I’m in a position to calculate the total amount of funds flowing into each region and then plot them on the thematic map…

…but this post is already way too long, so that’ll have to be for another day…

(Plus I’m not totally sure how to do it yet… and Mission Impossible is just starting (this is a scheduled post…;-)

Using Dabble DB in an Online Mashup Context

So it seems no-one really saw why I got so excited by Dabble DB doing the linked data thing with data tables…

…so here’s an example…

First of all, importing some data via copy-and-paste:

…and we commit it:

All so simple, right…

So let’s pull some other data in from somewhere else; as CSV from a Google spreadsheet, perhaps?

(Note that the spreadsheet could have itself imported the data by scraping a table or list from an HTML page, or grabbing it via webservice with a RESTful API.)

So we import it:

…and commit it:

I’m not sure what the cacheing/refresh policy in Dabble DB is? For example, if the Google spreadsheet data changes, will Dabble keep up with the changes, and how often? (Maybe someone from Dabble DB could post a comment to clarify this?;-)

And finally, we grab data for the third table by screenscraping a table from an HTML page – this page:

Give it the URL:

Select the table:

…and commit it:

So now I have the the tables, by different means, that I used in the previous demo.

If I do some table linking in a similar way to the previous demo, I can get a table that lists grants awarded to different HEIs, along with their postcodes. (This doesn’t actually use the HTML table scraped data, but another mashup could…I could have added the Government Office Region(-ish) data to the table, for example.)

So just to be clear, here: this table is made up from columns from two separate tables. The JISC project data comes from one table, the HEI postcode location from another. The HEI homepage URI is common to both original data tables and is used to key the combined table.

And then I can export the data…

…and shove it into a pipe – using CSV, I think?

Then we can filter on just the HEIs that have been awarded grants, and have been geocoded to somewhere in the UK:

And we can get a map:

…and the KML, geo-RSS etc…

… and maybe take the JSON output from the pipe and use it to drive a proportional symbol map, showing the number of projects awarded to each institution, for example…

In the same way that Yahoo Pipes lets you do crazy stuff with lists, so Dabble DB lets you get funky with data tables… What’s not to like (except the lack of regular expressions for data cleaning, maybe…?;-)

So there we have it:

  • some cut and pasted data in one table (HE location data), and a CSV imported table from a Google spreadsheet (the JISC project allocation data); (the HTML table scraped data is superfluous in this example);
  • linked tables in Dabble DB to reconcile the data in the two tables;
  • the mashed data table then gets exported from Dabble DB as CSV into a Yahoo pipe;
  • the pipe geocodes the postcode location data for each HEI and exports the geo-coded feed as JSON;
  • some Javascript in an HTML page pulls in the JSON, and plots proportional symbols on a Google map where the size of the symbol is proportional to the number of projects awarded to each HEI.

Job done, hopefully ;-)

PS I’ve started reflecting a little on how I pull these mashup things together, and I think it’s a bit like chess… I’ve completely assimilated various patterns or configurations of particular data representations and how they can be processed that let me “see several moves ahead”. And in messing around with Dabble DB, it’s like I’ve just learned a new configuration, like a pin or a fork, that I didn’t really appreciate for; but now it’s something I “get”, and something I can look for, something that may be “several moves ahead”, whenever I get an urge to have a tinker… And that’s why I think this post, and the previous one on the topic, are maybe gonna be important if you want to keep up over the coming months…;-) Does that make sense…?

PPS @dfflanders and Ross, if you’re reading… being able to table or list scrape from HTML (so no embedded tables), or grab *simple* XML feeds into Google spreadsheets is one one way of making data available. Fixing on some canonical URIs in a standard format for HEI homepages would also be a start… (EPSRC uses different – valid, but maybe deprecated? – URIs for homepages compared to the URIs listed in the scraped HERO database, for example? I’m not sure what sort of HEI homepage URIs JISC uses… the one that the HEIs use themselves would be a start?)

Online Apps for Live Code Tutorials/Demos

With Dev8D coming up, here’s a quick round-up/reminder of some tools/techniques for hacking around with code via a browser, or running interactive coding presentations in a browser…

And if your presentation includes visits to websites, remember to share the URL via a SplashURL bookmarklet (developed at Dev8D last year; SplashURL screencast.)

PS if you know of any other apps in a similar vein, or links to videos showing really effective ways of presenting code, please add a comment below.

HTML5 presentation in HTML5

PPS On the notion of live docs/literate programming, see also:
dexy.it
– Wolfram computable document format (?)

PPPS seems someone is “monetising” interactive coding tutorials… Codecademy

PPPPS sort of related to CDF, the notion of ‘active readers and reactive documents‘ eg as implemented using Tangle Javascript Library

PPPPPS R in the cloud – eg RStudio runs as a cross platform desktop client but can also run as a web service; services such as CloudStat and Jeroen Oom’s hosted ggplot app.

Starting to Think About a Yahoo Pipes Code Generator

Following a marathon session demoing Yahoo Pipes yesterday (the slides I didn’t really use but pretty much covered are available here) I thought I’d start to have a look at what would be involved in generating a Pipes2PHP, Pipes2Py, or Pipes2JS conversion tool as I’ve alluded to before (What Happens If Yahoo! Pipes Dies?)…

So how are pipes represented within the Yahoo Pipes environment? With a little bit of digging around using the Firebug extension to Firefox, we can inspect the Javascript object representation of a pipe (that is, the thing that is used to represent the pipework and that gets saved to the server whenever we save a pipe).

So to start, let’s look at the following simple pipe:

SImple pipe

Here’s a Firebug view showing the path (editor.pipe.definition should be: editor.pipe.working) to the representation of a pipe:

And here’s what we see being passed to the Yahoo pipes server when the pipe is saved…

Here’s how it looks as a Javascript object:

"modules":[{"type":"fetch","id":"sw-502","conf":{"URL":{"value":"http://writetoreply.com/feed","type":"url"}}},{"type":"output","id":"_OUTPUT","conf":{}},{"type":"filter","id":"sw-513","conf":{"MODE":{"type":"text","value":"permit"},"COMBINE":{"type":"text","value":"and"},"RULE":[{"field":{"value":"description","type":"text"},"op":{"type":"text","value":"contains"},"value":{"value":"the","type":"text"}}]}}],"terminaldata":[],"wires":[{"id":"_w3","src":{"id":"_OUTPUT","moduleid":"sw-502"},"tgt":{"id":"_INPUT","moduleid":"sw-513"}},{"id":"_w6","src":{"id":"_OUTPUT","moduleid":"sw-513"},"tgt":{"id":"_INPUT","moduleid":"_OUTPUT"}}

Let’s try to pick that apart a little… firstly, all the modules are defined. Here’s the Fetch module:

{
 "type":"fetch",
 "id":"sw-502",
 "conf":{
  "URL":{
   "value":"http://writetoreply.com/feed",
   "type":"url"
  }
 }
}

The output module:

{
 "type":"output",
 "id":"_OUTPUT",
 "conf":{}
}

The filter module:

{
 "type":"filter",
 "id":"sw-513",
 "conf":{
  "MODE":{"type":"text","value":"permit"},
  "COMBINE":{"type":"text","value":"and"},
  "RULE":[{
   "field":{"value":"description","type":"text"},
   "op":{"type":"text","value":"contains"},
   "value":{"value":"the","type":"text"}
  }]
 }
}

Each of these blocks (that is, modules) has a unique id. The wires then specify how these modules are connected.

So here’s the wire that connects the output of the fetch block to the input of the filter module:

{
 "id":"_w3",
 "src":{
  "id":"_OUTPUT",
  "moduleid":"sw-502"
 },
 "tgt":{
  "id":"_INPUT",
  "moduleid":"sw-513"
 }
}

And here we connect the output of the filter to the input of the output block:

{
 "id":"_w6",
 "src":{
  "id":"_OUTPUT",
  "moduleid":"sw-513"
 },
 "tgt":{
  "id":"_INPUT",
  "moduleid":"_OUTPUT"
 }
}

***UPDATE – I’m not sure if we also need to look at the terminaldata information. I seem to have lost sight of where the multiple “RULES” that might appear inside a block are described…? Ah…. editor.pipe.module_info? Hmm, not – that is more the UI side of things.., so where are the actual pipe RULEs defined (e.g. the rules in a Regular Expression block?)***

*** UPDATE 2 – Found it… I should be using editor.pipe.working NOT editor.pipe.definition

So what would a code generator need to do? I’m guessing one way would be to do something like this…

  • for each module, create an equivalent function by populating a templated function with the appropriate arguments e.g.
    f_sw-502(){ returnfetchURL(“http://writetoreply.org/feed&#8221;) }
  • for each wire, do something along the lines of f_sw-513(f_sw-502()); it’s been a long day, so I’m not sure how to deal with modules that have multiple inputs? But this is just the start, right…? (If anyone else is now intrigued enough to start thinking about building a code generator from a pipes representation, please let me know…;-)

As to why this approach might be useful?
– saving a copy of the Javascript representation of a pipe gives us an archival copy of the algorithm, albeit in a javascripty objecty way…
– if we have a code generator, we can use Yahoo Pipes as a rapid prototyping tool to create code that can be locally hosted.

PS a question that was raised a couple of times in the session yesterday related to whether or not Yahoo pipes can be run behind a corporate firewall. I don’t think it can, but does anyone know for sure? Is there a commercial offering available, for example, so corporate folk can run their own instance of pipes in the privacy of their own network?

PPS here’s a handy trick… when in a Yahoo pipes page, pop up the description of the pipe with this javascript call in a Firefox location bar:
javascript:alert(editor.pipe.definition.toSource());

Some of My Dev8D Tinkerings – Yahoo Pipes Quick Start Guide, Cross-Domain JSON with JQuery and Council Committee Treemaps from OpenlyLocal

One of the goals I set myself for this year’s Dev8D was to get round to actually using some of the things I’ve been meaning to try out for ages, particularly Google App Store and JQuery, and also to have a push on some of the many languishing “projects” I’ve started over the last year, tidying up the code, making the UIs a little more presentable, and so on…

Things never turn out that way, of course. Instead, I did a couple of presentations, only one of which I was aware of beforehand!;-) a chance remark highlighting me to the fact I was down to do a lightning talk yesterday…

I did start looking at JQuery, though, and did manage to revisit the Treemapping Council Committees Using OpenlyLocal Data idea I’d done a static proof of concept for some time ago…

On the JQuery front, I quickly picked up how easy it is to grab JSON feeds into a web page if you have access to JSON-P (that is, the ability to attach a callback function to a JSON URL so you can call a function in the web page with the object as soon as it loads), but I also ran into a couple of issues. Firstly, if I want to load more than one JSON feed into a page, and then run foo(json1, json2, json3, json4, json5), how do I do it? That is, how do I do a “meta-callback” that fires when all the separate JSON calls have loaded content into the page. (Hmm – I just got a payoff from writing this para and then looking at it – it strikes me I could do a daisy chain – use the callback from the first JSON call to call the second JSON object, use the callback from that to call the third, and so on; but that’s not very elegant…?) And secondly, how do I get a JSON object into a page if there is no callback function available (i.e. no JSON-P support)?

I’m still stuck on the first issue (other than the daisy chain/bucket brigade hack), but I found a workaround for the second – use a Yahoo pipe as a JSON-P proxy. I’ll be writing more about this in a later post, but in the meantime, I popped a code snippet up on github.

On the Openlylocal/council treemap front, I’d grabbed some sample JSON files from the Openlylocal site as I left Dev8D last night for the train home, and managed to hack the resulting objects into a state that could be used to generate the treemap from them.

A couple of hours fighting with getting the Openlylocal JSON into the page (solved as shown above with the Pipes hack) and I now have a live demo – e.g. http://ouseful.open.ac.uk/test/ccl/index-dyn.php?id=111. The id is the openlylocal identifier used to identify a particular council on the Openlylocal site.

If you’re visiting Openlylocal council pages, the following bookmarklet will (sometimes*;-) display the corresponding council committee treemap:

javascript:var s=window.location.href;s=s.replace(/.*=/,””);window.location.href=”http://ouseful.open.ac.uk/test/ccl/index-dyn.php?id=”+s;

(It works for pages with URLs that end =NNN;-)
Council committee treemap

The code is still a bit tatty, and I need to tidy up the UI, (and maybe also update to a newer JIT visualisation library), so whilst the URI shown above will persist, I’ll be posting an updated version to somewhere else (along with a longer post about how it all works) when I get round to making the next set of tweaks… Hopefully, this will be before Dev8D next year!;-)

PS I also had a huge win in discovering a javascript function that works at least on Firefox: .toSource(). Apply it to a javascript object (e.g. myobj.toSource() and then if you do things like alert(myobj.toSource()) you can get a quick preview of the contents of that object without having to resort to a debugger or developer plugin tool:-)

PPS can you tell my debugging expertise is limited to: alert(“here”); all over the place ;-) Heh heh…

Grabbing JSON Data from One Web Page and Displaying it in Another

Lots of web pages represent data within the page as a javascript object. But if you want to make use of that data in another page, how can you do that?

A case in point is Yahoo Pipes. The only place I’m currently aware of where we can look at how a particular Yahoo pipe is constructed is the Yahoo Pipes editor. The pipe is represented as a Javascript object within the page (as described in Starting to Think About a Yahoo Pipes Code Generator), but it’s effectively locked into the page.

So here’s a trick for liberating that representation…

Firstly, we need to know what the name of the object is. In the case of Yahoo Pipes, the pipe’s definition is contained in the editor.pipe.definition [NO: it’s in editor.pipe.working] object.

In order to send the object to another page on the web, the first thing we need to do is generate a text string view of it that we can POST to another web page. This serialised representation of the object can be obtained by calling the .toSource() function on it.

The following bookmarklets show what that representation looks like.

<!– *** [UPDATE: the following bookmarks don't provide a complete description of the pipe – .toSource() doesnlt appear to dig into arrays… ]*** <- WRONG…I thought the missing data is in the terminaldata but it isn’t.. hmmm… –> UPDATE – found it? editor.pipe.module_info DOUBLE UPDATE: nah… that is more the UI side of things.., so where are the actual pipe RULEs defined (e.g. the rules in a Regular Expression block
UPDATE – found the RULE data – *** UPDATE 2 – Found it… I should be using editor.pipe.working NOT editor.pipe.definition

Firstly, we can display the serialised representation in a browser alert box:

javascript:(function(){alert(editor.pipe.working.toSource())})()

Alternatively, we can view it in the browser console (for example, in Firefox, we might do this via the Firebug plugin):

javascript:(function(){console.log(editor.pipe.working.toSource())})()

The object actually contains several other objects, not all of which are directly relevant to the logical definition of the pipe (e.g. they are more to do with layout), so we can modify the console logging bookmarklet to make it easier to see the two objects we are interested in – the definitions of each of the pipe blocks (that is, the pipe editor.pipe.definition.modules), and the connections that exist between the modules (editor.pipe.definition.wires; [UPDATE: we also need the terminaldata]):

javascript:(function(){var c=console.log;var p=editor.pipe.working;c('MODULES: '+p.modules.toSource());c('WIRES: '+p.wires.toSource());c('TERMINALS: '+p.terminaldata.toSource())})()


[terminaldata not shown]

To actually send the representation to another web page, we can use a bookmarklet to dynamically create a form element, attach the serialised object to it as a form argument, append the form to the page and then submit it:

javascript:(function(){var ouseful={};ouseful=editor.pipe.working;ouseful=ouseful.toSource(); var oi=document.createElement('form');oi.setAttribute('method','post');oi.setAttribute('name','oif');oi.setAttribute('action','http://ouseful.open.ac.uk/ypdp/jsonpost.php');var oie=document.createElement('input');oie.setAttribute('type','text');oie.setAttribute('name','data');oie.setAttribute('value',ouseful);oi.appendChild(oie);document.body.appendChild(oi);document.oif.submit();})()

In this case, the page I am submitting the form to is a PHP page. The code to accept the POST serilaised object, and then republish as a javascript object wrapped in a callback function (i.e. package it so it can be copied and then used within a web page).

&lt;?php
$str= $_POST['data'];
$str = substr($str, 1, strlen($str) - 2); // remove outer ( and )
$str=stripslashes($str);
echo &quot;ypdp(&quot;.$str.&quot;)&quot;;
?&gt;

[Note that I did try to parse the object using PHP, but I kept hitting all sorts of errors with the parsing of it… The simplest approach was just to retransmit the object as Javascript so it could be handled by a browser.]

If we want to display the serialsed version of the object in another page, rather than in an alert box or the browser console, we need to pass the the serialised object within the URI using an HTTP GET to the other page, so we can generate a link to it. For long pipes, this might break..*

*(Anyone know of an equivalent to a URL shortening service that will accept HTTP POST arguments and give you a short URL that will do a POST on your behalf? [As well as the POST payload we’d need to pass the target URL (i.e. the address to which the POST data is to be sent), to the shortener. It would then give you a short URL, such that when you click on it it will POST the data to the desired target URL. I suppose another approach would be a service that will store the post data for you, give you a short URI in return, and then you call the short URI with the address of the page you want the data posted to as a key?)

PS If you do run the bookmarklet to generate a URI that contains the serialised version of the pipe, (that is, use a GET method in the form and a $_GET handler in the PHP script), you can load the object (wrapped in the ypdp() callback function) into your own page via a <script> element in the normal way, by setting the src attribute of the script to the URI that includes the serialsed version of the pipe description.

As Time Goes By, It Makes a World of Diff

Prompted by a DevCSI Developer Focus Group conference call just now, I had a quick look through the list of Bounty competition entries (and the winners to see whether there was any code that that might be fun to play with.

One app that’s quite fun is a simple app by Chris Gutteridge (Wayback/Memento Animation) that animates the history of a website using archived copies of the site from the Wayback Machine. So for example, here’s the animated history of the OU home page

And here are links to the history of the current Labour Party and Conservative Party domains: The animated history of: http://www.labour.org.uk/ and The animated history of: http://www.conservatives.com/.

The app will also animate changes from a MediaWiki wiki as this link demonstrates: Dev8D wiki changes over time.

(I can’t help thinking it needs: a) a pause button, so at least you can scroll up and down a page, if not explore the site; and b) a bookmarklet, to make it easier to get a site into the replayer;-)

The Dev8D pages also suggest a “Web Diff” app was entered in one of the challenges, but I couldn’t see a link to the app anywhere?

Diffs have been on my mind lately in a slightly different context, in particular relating to the changes made to the Digital Economy Bill on the various stages it went through as it passed through the Lords, but here again a developer challenge event turned up the goods, in this case the Rewired State: dotgovlabs held last Saturday and @1jh’s Parliamentary Bill analyser:

So for example, if we compare the Digital Economy Bill as introduced to the Lords:
http://www.publications.parliament.uk/pa/ld200910/ldbills/001/10001.i-ii.html
and the version that was passed to the Commons:
http://www.publications.parliament.uk/pa/cm200910/cmbills/089/10089.i-iii.html
here’s what we get:

Luvverly stuff :-)

PS @cogdog beats me to it again in a comment to Reversible, Reverse History and Side-by-Side Storytelling, specifically: “maybe this is like watching Memento backwards?” Which is to say, maybe the Wayback/Memento Animation should have a “play backwards” switch? And of course, this being a Chris Gutteridge production, it has. So for example, going back in time with the JISC home page

(Sob, I have no original ideas any more, and can’t even think of them before other people do, let alone implement them…;-(

Ba dum… Education for the Open Web Fellowship: Uncourse Edu

A couple of weeks ago, I started getting tweets and emails linking to a call for an Education for the Open Web Fellowship from the Mozilla and Shuttleworth Foundations.

The way I read the call was that the fellowship provides an opportunity for an advocate of open ed on the web to do their thing with the backing of a programme that sees value in that approach…

…and so, I’ve popped an (un)application in (though not helped with having spent the weekend in a sick bed… bleurrrgh… man flu ;-) It’s not as polished as it should be, and it could be argued that it’s unfinished, but that is, erm, part of the point… After all, my take on the Fellowship is that the funders are seeking to act as a patron to a person and helping them achieve as much as they can, howsoever they can, as much as it is supporting a very specific project? (And if I’m wrong, then it’s right that my application is wrong, right?!;-)

The proposal – Uncourse Edu – is just an extension of what it is I spend much of my time doing anyway, as well as an attempt to advocate the approach through living it: trying to see what some of the future consequences of emerging tech might be, and demonstrating them (albeit often in way that feels too technical to most) in a loosely educational context. As well as being my personal notebook, an intended spin-off of this blog is to try help drive down barriers to use of web technologies, or demonstrate how technologies that are currently only available to skilled developers are becoming more widely usable, and access to them as building blocks is being “democratised”. As to what the barriers to adoption are, I see them as being at least two-fold: one is ease of use (how easy the technology is to actually use); the second is attitude: many people just aren’t, or don’t feel they’re allowed to be, playful. This stops them innovating in the workplace, as well as learning for themselves. (So for example, I’m not an auto-didact, I’m a free player…;-)

The Fellowship applications are templated (loosely) and submitted via the Drumbeat project pitching platform. This platform allows folk to pitch projects and hopefully gather support around a project idea, as well as soliciting (small amounts of) funding to help run a project. (It’d be interesting if in any future rounds of JISC Rapid Innovation Funding, projects were solicited this way and one of the marking criterion was the amount of support a pitched proposal received?)

I’m not sure if my application is allowed to change, but if it doesn’t get locked by the Drumbeat platform it may well do so… (Hopefully I’ll get to do at least another iteration of the text today…) In particular, I really need to post my own video about the project (that was my undone weekend task:-(

Of course, if you want to help out producing the video, and maybe even helping shape the project description, then why not join the project? Here’s the link again: Uncourse Edu.

PS I think there’s a package on this week’s OU co-produced episode of Digital Planet on BBC World Service (see also: Digital Planet on open2) that includes an interview with Mark Shuttleworth and a discussion about some of the work the Shuttleworth Foundation gets up to… (first broadcast is tomorrow, with repeats throughout the week).

DISCLAIMER: I’m the OU academic contact for the Digital Planet.

A Few More Thoughts on GetTheData.org

As we come up to a week in on GetTheData.org, there’s already an interesting collection of questions – and answers – starting to appear on the site, along with a fledgling community (thanks for chipping in, folks:-), so how can we maintain – and hopefully grow – interest in the site?

A couple of things strike me as the most likely things to make the site attractive to folk:

– the ability to find an appropriate – and useful – answer to your question without having to ask it, for example because someone has already asked the same, or a similar, question;
– timely responses to questions once asked (which leads to a sense of community, as well as utility).

I think it’s also worth bearing in mind the context that GetTheData sits in. Many of the questions result in answers that point to data resources that are listed in other directories. (The links may go to either the data home page or its directory page on a data directory site.)

Data Recommendations
One thing I think is worth exploring is the extent to which GetTheData can both receive and offer recommendations to other websites. Within a couple of days of releasing the site, Rufus had added a recommendation widget that could recommend datasets hosted on CKAN that seem to be related to a particular question.

GetTheData.org - related datasets on CKAN

What this means is that even before you get a reply, a recommendation might be made to you of a dataset that meets your requirements.

(As with many other Q&A sites, GetTheData also tries to suggest related questions to you when you enter you question, to prompt you to consider whether or not your question has already been asked – and answered.)

I think the recommendation context is something we might be able to explore further, both in terms of linking to recommendations of related data on other websites, but also in the sense of reverse links from GetTheData to those sites.

For example:

– would it be possible to have a recommendation widget on GetTheData that links to related datasets from the Guardian datastore, or National Statistics?
– are there other data directory sites that can take one or more search terms and return a list of related datasets?
– could a getTheData widget be located on CKAN data package pages to alert package owners/maintainers that a question possibly related to the dataset had been posted on GetTheData? This might encourage the data package maintainer to answer the question on the getTheData site with a link back to the CKAN data package page.

As well as recommendations, would it be useful for GetTheData to syndicate new questions asked on the site? For example, I wonder if the Guardian Datastore blog would be willing to add the new questions feed to the other datablogs they syndicate?;-) (Disclosure: data tagged posts from OUseful.info get syndicated in that way.)

Although I don’t have any good examples of this to hand from GetTheData, it strikes me that we might start to see questions that relate to obtaining data which is actually a view over a particular data set. This view might be best obtained via a particular query onto a particular data set. such as a specific SPARQL query on a Linked Data set, or a particular Google query language request to the visualisation API against a particular Google spreadsheet.

If we do start to see such queries, then it would be useful to aggregate these around the datastores they relate to, though I’m not sure how we could best do this at the moment other than by tagging?

News announcements
There are a wide variety of sites publishing data independently, and a fractured networked of data directories and data catalogues. Would it make sense for GetTheData to aggregate news announcements relating to the release of new data sets, and somehow use these to provide additional recommendations around data sets?

Hackdays and Data Fridays
As suggested in Bootstrapping GetTheData.org for All Your Public Open Data Questions and Answers:

If you’re running a hackday, why not use GetTheData.org to post questions arising in the scoping the hacks, tweet a link to the question to your event backchannel and give the remote participants a chance to contribute back, at the same time adding to the online legacy of your event.

Alternatively, how about “Data Fridays”, on the first Friday in the month, where folk agree to check GetTheData two or three times that day and engage in something of a distributed data related Question and Answer sprint, helping answer unanswered questions, and maybe pitching in a few new ones?

Aggregated Search
It would be easy enough to put together a Google custom search engine that searches over the domains of data aggregation sites, and possibly also offer filetype search limits?

So What Next?
Err, that’s it for now…;-) Unless you fancy seeing if there’s a question you can help out on right now at GetTheData.org