So it seems no-one really saw why I got so excited by Dabble DB doing the linked data thing with data tables…
…so here’s an example…
First of all, importing some data via copy-and-paste:
…and we commit it:
All so simple, right…
So let’s pull some other data in from somewhere else; as CSV from a Google spreadsheet, perhaps?
(Note that the spreadsheet could have itself imported the data by scraping a table or list from an HTML page, or grabbing it via webservice with a RESTful API.)
So we import it:
…and commit it:
I’m not sure what the cacheing/refresh policy in Dabble DB is? For example, if the Google spreadsheet data changes, will Dabble keep up with the changes, and how often? (Maybe someone from Dabble DB could post a comment to clarify this?;-)
And finally, we grab data for the third table by screenscraping a table from an HTML page – this page:
Give it the URL:
Select the table:
…and commit it:
So now I have the the tables, by different means, that I used in the previous demo.
If I do some table linking in a similar way to the previous demo, I can get a table that lists grants awarded to different HEIs, along with their postcodes. (This doesn’t actually use the HTML table scraped data, but another mashup could…I could have added the Government Office Region(-ish) data to the table, for example.)
So just to be clear, here: this table is made up from columns from two separate tables. The JISC project data comes from one table, the HEI postcode location from another. The HEI homepage URI is common to both original data tables and is used to key the combined table.
And then I can export the data…
…and shove it into a pipe – using CSV, I think?
Then we can filter on just the HEIs that have been awarded grants, and have been geocoded to somewhere in the UK:
And we can get a map:
…and the KML, geo-RSS etc…
… and maybe take the JSON output from the pipe and use it to drive a proportional symbol map, showing the number of projects awarded to each institution, for example…
In the same way that Yahoo Pipes lets you do crazy stuff with lists, so Dabble DB lets you get funky with data tables… What’s not to like (except the lack of regular expressions for data cleaning, maybe…?;-)
So there we have it:
- some cut and pasted data in one table (HE location data), and a CSV imported table from a Google spreadsheet (the JISC project allocation data); (the HTML table scraped data is superfluous in this example);
- linked tables in Dabble DB to reconcile the data in the two tables;
- the mashed data table then gets exported from Dabble DB as CSV into a Yahoo pipe;
- the pipe geocodes the postcode location data for each HEI and exports the geo-coded feed as JSON;
Job done, hopefully ;-)
PS I’ve started reflecting a little on how I pull these mashup things together, and I think it’s a bit like chess… I’ve completely assimilated various patterns or configurations of particular data representations and how they can be processed that let me “see several moves ahead”. And in messing around with Dabble DB, it’s like I’ve just learned a new configuration, like a pin or a fork, that I didn’t really appreciate for; but now it’s something I “get”, and something I can look for, something that may be “several moves ahead”, whenever I get an urge to have a tinker… And that’s why I think this post, and the previous one on the topic, are maybe gonna be important if you want to keep up over the coming months…;-) Does that make sense…?
PPS @dfflanders and Ross, if you’re reading… being able to table or list scrape from HTML (so no embedded tables), or grab *simple* XML feeds into Google spreadsheets is one one way of making data available. Fixing on some canonical URIs in a standard format for HEI homepages would also be a start… (EPSRC uses different – valid, but maybe deprecated? – URIs for homepages compared to the URIs listed in the scraped HERO database, for example? I’m not sure what sort of HEI homepage URIs JISC uses… the one that the HEIs use themselves would be a start?)