Posts Tagged ‘london datastore’
Many of the data sets I quickly looked at are being made available as CSV and XML data feeds, which is very handy :-)
Anyway, in preparation for having some new recipes to drop into conversation at News:Rewired next week, I thought I’d have a quick play with visualising some of the timeseries data in Timetric to see what sorts of “issues” it might throw up.
So how does Timetric like to import data? There are three main ways – copy’n'paste, import a spreadsheet (CSV or XLS) from your desktop, or grab the data from a URL.
Obviously, the online route appeals to me:-)
Secondly, how does Timetirc expect the data to be formatted? At the moment, quite rigidly, it seems:
To publish data in a format Timetric can understand, you should expose it in one of two formats — either CSV or Excel (.xls) format. Dates/times must be in the first column of the file, and values in the second.
For importing CSV data, the date/time should be in W3C datetime format (like 2009-09-14 17:34:00) or, optionally, Unix timestamps as floating point numbers.
Hmmm, so at the moment, I can only import on time series at a time, unless I’m a geeky hacker type and know how to “write a programme” to upload multiple sets of data from a multi-column file via the API… But OUseful.info isn’t about that, right?!;-)
Let’s look at some of the London datastore data, anyway. How about this – “Historic Census Population” data.
Let’s preview the data in a Google spreadsheet – use the formula:
Ok – so we have data for different London Boroughs, for every decade since 1801. But is the data in the format that Timetric wants?
- first no: the dates run across columns rather than down rows.
So we need to swap rows with columns somehow. We can do this in a Google spreadsheet with the TRANSPOSE formula. While we’re doing the transposition, we might as well drop the Area Code column and just use the area/borough names. In a new sheet, use the formula:
=TRANSPOSE( ‘Original Data’!A1:W )
(Note, I’d renamed the sheet containing the imported data as Original Data; typically it would be Sheet1, by default.)
NB It seems I could have combined the import and transpose formulae:
Now we hit the second no: the dates are in the wrong format.
Remember, for Timetric “the date/time should be in W3C datetime format (like 2009-09-14 17:34:00) or, optionally, Unix timestamps as floating point numbers.”
My fudge here was to copy all the data except the time data to a new sheet, and just add the time data by hand, using a default day/month/time of midnight first January of the appropriate year. Note that this is not good practice – the data in this sheet is now not just a representation of the original data, it’s been messed around with and the data field is not the original one, nor even derived from the original one (I don’t think Google spreadsheets has a regular expression search/replace formula that would allow me to do this?)
Anyway, that’s as may bee;-). To keep the correct number format (Google spreadsheets will try to force a different representation of the date), the format of the date cells needs to be set explicitly:
So now we have the data in rows, with the correct data format, the dates being added by hand. Remembering that Timetric can only import one time series at a time, let’s try with the first data set. We can grab the CSV for the first two columns as follows – from the Share Menu, “Publish as Web Page” option, choose the following settings:
(The ‘for timetric’ sheet is the sheet with the tidied date field.)
Here’s the CSV URI, that we can use to get the data in Timetric:
The upload took a good couple of minutes, with no reassuring user notifications (just the browser appearing to hang waiting for a new timetric page to load), but evntually it got there…
(And yes, that drop in population is what the data says – though for all the other boroughs you get a curve shaped more as you’d expect;-)
To import other data sets, we need to insert a new Date column, along with dat data (I copied it from the first Dat column I’d created) and then grab the CSV URI for the appropriate columns:
Anyway, there we have it – a recipe (albeit a slightly messy one) for getting CSV data out of the London datastore, into a Google spreadsheet, transposing its rows and columns, and then generating date information formatted just how Timetric likes it, before grabbing a new CSV data feed out of the spreadsheet and using it to import data into Timetric.