Online Apps for Live Code Tutorials/Demos

With Dev8D coming up, here’s a quick round-up/reminder of some tools/techniques for hacking around with code via a browser, or running interactive coding presentations in a browser…

And if your presentation includes visits to websites, remember to share the URL via a SplashURL bookmarklet (developed at Dev8D last year; SplashURL screencast.)

PS if you know of any other apps in a similar vein, or links to videos showing really effective ways of presenting code, please add a comment below.

HTML5 presentation in HTML5

PPS On the notion of live docs/literate programming, see also:
dexy.it
– Wolfram computable document format (?)

PPPS seems someone is “monetising” interactive coding tutorials… Codecademy

PPPPS sort of related to CDF, the notion of ‘active readers and reactive documents‘ eg as implemented using Tangle Javascript Library

PPPPPS R in the cloud – eg RStudio runs as a cross platform desktop client but can also run as a web service; services such as CloudStat and Jeroen Oom’s hosted ggplot app.

Using Dabble DB in an Online Mashup Context

So it seems no-one really saw why I got so excited by Dabble DB doing the linked data thing with data tables…

…so here’s an example…

First of all, importing some data via copy-and-paste:

…and we commit it:

All so simple, right…

So let’s pull some other data in from somewhere else; as CSV from a Google spreadsheet, perhaps?

(Note that the spreadsheet could have itself imported the data by scraping a table or list from an HTML page, or grabbing it via webservice with a RESTful API.)

So we import it:

…and commit it:

I’m not sure what the cacheing/refresh policy in Dabble DB is? For example, if the Google spreadsheet data changes, will Dabble keep up with the changes, and how often? (Maybe someone from Dabble DB could post a comment to clarify this?;-)

And finally, we grab data for the third table by screenscraping a table from an HTML page – this page:

Give it the URL:

Select the table:

…and commit it:

So now I have the the tables, by different means, that I used in the previous demo.

If I do some table linking in a similar way to the previous demo, I can get a table that lists grants awarded to different HEIs, along with their postcodes. (This doesn’t actually use the HTML table scraped data, but another mashup could…I could have added the Government Office Region(-ish) data to the table, for example.)

So just to be clear, here: this table is made up from columns from two separate tables. The JISC project data comes from one table, the HEI postcode location from another. The HEI homepage URI is common to both original data tables and is used to key the combined table.

And then I can export the data…

…and shove it into a pipe – using CSV, I think?

Then we can filter on just the HEIs that have been awarded grants, and have been geocoded to somewhere in the UK:

And we can get a map:

…and the KML, geo-RSS etc…

… and maybe take the JSON output from the pipe and use it to drive a proportional symbol map, showing the number of projects awarded to each institution, for example…

In the same way that Yahoo Pipes lets you do crazy stuff with lists, so Dabble DB lets you get funky with data tables… What’s not to like (except the lack of regular expressions for data cleaning, maybe…?;-)

So there we have it:

  • some cut and pasted data in one table (HE location data), and a CSV imported table from a Google spreadsheet (the JISC project allocation data); (the HTML table scraped data is superfluous in this example);
  • linked tables in Dabble DB to reconcile the data in the two tables;
  • the mashed data table then gets exported from Dabble DB as CSV into a Yahoo pipe;
  • the pipe geocodes the postcode location data for each HEI and exports the geo-coded feed as JSON;
  • some Javascript in an HTML page pulls in the JSON, and plots proportional symbols on a Google map where the size of the symbol is proportional to the number of projects awarded to each HEI.

Job done, hopefully ;-)

PS I’ve started reflecting a little on how I pull these mashup things together, and I think it’s a bit like chess… I’ve completely assimilated various patterns or configurations of particular data representations and how they can be processed that let me “see several moves ahead”. And in messing around with Dabble DB, it’s like I’ve just learned a new configuration, like a pin or a fork, that I didn’t really appreciate for; but now it’s something I “get”, and something I can look for, something that may be “several moves ahead”, whenever I get an urge to have a tinker… And that’s why I think this post, and the previous one on the topic, are maybe gonna be important if you want to keep up over the coming months…;-) Does that make sense…?

PPS @dfflanders and Ross, if you’re reading… being able to table or list scrape from HTML (so no embedded tables), or grab *simple* XML feeds into Google spreadsheets is one one way of making data available. Fixing on some canonical URIs in a standard format for HEI homepages would also be a start… (EPSRC uses different – valid, but maybe deprecated? – URIs for homepages compared to the URIs listed in the scraped HERO database, for example? I’m not sure what sort of HEI homepage URIs JISC uses… the one that the HEIs use themselves would be a start?)

Mash/Combining Data from Three Separate Sources Using Dabble DB

Over dinner with friends a couple of nights ago, I was asked how I typically approach problem solving tasks. Thinking about it, it’s a bottom-up AND top-down approach where I attack both ends of the problem (the “what I’ve got now” end and the “ultimate vision”) at the same time, in the hope that the tiny steps taken from each end meet up somewhere in the middle…

So for example, in the dev8D Dragon’s Den I mentioned the desire to put together a thematic choropleth map depicting the funding that’s going into different UK Government office regions as a result of JISC or EPSRC project awards. Here’s how I’ve started to work out how to do that…

(What follows gets a little involved at times, so the main trick to look out for is how to create a single data table by mashing together data from three separate data tables.)

At one end, is the output surface. A quick scout around turned up no flash components or KML overlays I could use on Google maps or ThematicMapping (ffs why can’t National Statistics make some free warez available???) so I opted for the amMap interactive map instead.

To plot the map, I need to be able to sum the value of project grants over lead HEIs within particular GORs (got that?;-) So where’s the data?

All over the place, that’s where…

  • EPSRC Support By Organisation shows the total amount of current project funding awarded to each HEI by EPSRC;

    Hmm, no GOR, no geolocation data… Which means I need a mapping from HEI to GOR…

  • …but the closest I can find is a listing of the postcodes of each HEI: HERO screenscraper, and even that’s a scrape of another service…

    (Thanks @lesteph;-)

  • and finally, here’s a mapping from postcode areas to GORs: postcode area lookup table.

    There’s a warning though: please note “regions” were recorded for my own visual aid and are NOT an attempt to tie in with current UK Administrative Regions.. Hmm – okay – add that one to the caveats/risk assessment. If the maps turns out very wrong, that’s EPSRC’s problem, right, for not making the data available in a clean way?!;-)

Okay, so those are the data sources: one contains HEI names and project funding data, one contains HEI names, location data (well, postcodes) and homepage URIs, and one contains mappings from postcode towns to UK regions (which loosely relate, possibly, to GORs).

Now at this point point I’ve already decided that I want to try use Dabble DB to somehow conflate the data from these three separate sources (though I’m not totally sure how… it’s just something I seem to remember from somewhere and somewhen a long time ago that Dabble DB supports if there are common fields – and matching strings – across different data tables).

Getting the data into Dabble DB is a copy and paste operation, but I’m going to take an intermediate step, highlighting and copy the tables from the separate web pages and pasting them into a Google spreadsheet. Why? Because I already know that this works and it’ll also let me cast an eye over the data to make sure it looks about right.

Looking at the HEI names from EPSRC and the HERO screenscrape, they don’t really match though, which means that Dabble DB won’t be able to use HEI names to idenify common rows in the HE location and EPSRC project tables. However, the HERO screenscrape page does have the HEI homepage URI, and a look beneath the “Go to Site” link on the EPSRC page shows that those links point to the HEI homepage…

…which means I should be able to link items in the EPSRC projects listing to items in the HEI location table by virtue of common homepage URIs.

A quick Javascript bookmarklet hack using this bookmarklet:

javascript:(function (){var a=document.getElementsByTagName(‘a’); for (var i=0;i<a.length;i++){if (a[i].firstChild){var n=a[i].firstChild.nodeValue; if (n) if ((n.match(“site”))) a[i].innerHTML=a[i].href;}};})()

and the URIs are exposed, so I can copy and paste the table and drop it into a spreadsheet, with the HERO data and postcode/region data in separate sheets.

A quick look over the URIs from both sources in the spreadsheets shows minor differences though – some URIs end with a “/” and others don’t (there are also a few broken scrapes that I tidy by hand); now if Dabble DB uses strict string matching to relate data in one table to data in another table (which I’d guess is likely) then missed matches will presumably occur?

So just to be safe, we need a data cleaning stage. To do this, I copy the data from the URI column in each spreadsheet, drop it into my TextWrangler text editor, and just clean up all the URIs so they end with a trailing / by searching for \.uk$ and replacing it with .uk/

Then I copy the URIs from the text editor and past them back into the appropriate column in the appropriate spreadsheet.

Looking at the postcode/GOR table, I need to get one or two letter postal town identifiers from the HEI postcodes, so to do this I copy the postcode column from the spreadsheet, and paste it into my text editor. This time I do a regular expression powered search and replace using this regexp: ([A-Z]+).* and replacing with \1

So now I have three spreadsheets on Google docs, which I can scan by eye to make sure they look okay, then easily copy and paste into separate tables (known as separate categories) in the same Dabble DB project, like this:

– the EPSRC data:

EPSRC data in DabbleDB

– HERO screenscrape data:

– and the postcode/region mapping data:

Now for the fun part; each of the above tables is a separate category, with separate column fields, in a Dabble DB project. It is possible to link a column with a similar column in another category, and consequently “pair” similar items in different tables. (So a column containing a particular URI, for example, in a row in one table/category can be related to a particular row in a particular column in another category/table, if the corresponding cell there contains the same URI (Dabble DB handles the actual pairings, you just have to link the columns).

So playing blind, I linked the URI column in the EPSRC category with a new category, which I called Meta:

This created a new table/category – Meta – with a couple of columns: a ‘Name” column, containing the URIs, and a column that linked back to corresponding entries in the EPSRC project category.

And then I did the same linking for the URI column in the HEI Location table/category, which automatically added another column in the Meta table that linked across to rows in the corresponding HEI Location table:

In the Meta category view, I can now add additional columns that are derived from columns in the other, linked tables. So for example, I can add a derived column corresponding to the value of project grants that is pulled in from the linked EPSRC projects column:

So my Meta table/category now looks like this:

Which is pretty clever I think..? ;-)

But then it gets more so… Suppose I link the Postcode town column from the HEI location table with the Postcode/Regional mapping table:

If you’ve been keeping up, you might now expect the UK HEI to be linked to from the Postcode/Region table, which it is:

But the link is symmetrical… and if one category is linked to a second category that is in turn linked to a third category, the columns from the first category can be used as derived columns in the second and the third category…

…which means in the Meta category, I can pull in columns derived from the Postcode/Region category via the HEI location category, first by grabbing the postcode town column into Meta:

To give this:

Then pull in a further derived field from the postcode town column from the Postcode/Region category:

And so now we have a rather more complete Meta category view containing linked items from all three tables (one of which is actually linked indirectly via one of the others):

Clever, eh??? So now I know how to annotate data in one table using data from another table if the two tables each have a column that contains similar data :-)

Okay, so now I have a table that contains rows that contain both project funds and UK regions info – so now I’m in a position to calculate the total amount of funds flowing into each region and then plot them on the thematic map…

…but this post is already way too long, so that’ll have to be for another day…

(Plus I’m not totally sure how to do it yet… and Mission Impossible is just starting (this is a scheduled post…;-)

What Are JISC’s Funding Priorities?

I’ve just got back home from a rather wonderful week away at the JISC Developer Happiness Days (dev8D), getting a life (of a sort?!;-) so now it’s time to get back to the blog…

My head’s still full of things newly learned from the last few days, so while I digest it, here’s a quick taster of something I hope to dabble a little more with over the next week for the developer decathlon, along with the SplashURL.net idea (which reminds me of my to do list…oops…)

A glimpse of shiny things to do with JISC project data (scraped from Ross’s Simal site… [updated simal url] (see also: Prod).

Firstly, a Many Eyes tag cloud showing staffing on projects by theme:

Secondly, a many Eyes pie chart showing the relative number of projects by theme:

As ever, the data may not be that reliable/complete, because I believe it’s a best effort scrape of the JISC website. Now if only they made their data available in a nice way???;-)

Following a session in the “Dragon’s Den”, where I was told by Rachel Bruce that these charts might be used for good as a well as, err, heckling, I guess, Mark van Harmalen that I should probably pay lip service to who potential users might be, and Jim Downing’s suggestion that I could do something similar for research council projects, I also started having a play with data pulled from the the JISC website.

So for example, here’s a treemap showing current EPSRC Chemistry programme area grants >2M UKP by subprogramme area:

And if you were wondering who got the cash in the Chemistry area, here’s a bubble chart showing projects held by named PIs, along with their relative value:

If you try out the interactive visualisation on Many Eyes, you can hover over each person bubble to see what projects they hold and how much they’re worth:

PS thanks to Dave Flanders and all at JISC for putting the dev8D event on and managing to keep everything running so smoothly over the week:-) Happiness 11/10…