And a Tweet Later – Querying Shadow Cabinet Expenses on Google Spreadsheets with the Google Query Language

6 minutes past 5 this evening:

(There was a problem with the publishing/privacy settings – maybe the tech team would like to post a comment saying what, exactly – that meant while the spreadsheet was viewable in preview form and as CSV, it was impossible to run Google visualisation API query language queries against it.)

Half an hour later:

And ten minutes after that:

So what? So this:

(That’s a test query using the Google visulisation API query language on the Shadow Cabinet expenses (via Google the Shadow Cabinet’s expenses).

And with a little tweaking, we can get a summary of the expenses by Shadow Cabinet member – run a select A,C,sum(E) group by A,C query on the spreadsheet:

(If you prefer to see the full total by member, use this query: select A,sum(E) group by A)

Now take the CSV output version of the query, pipe it into a Many Eyes Wikified dta page and plot it as an interactive tree map:

Or if you prefer a bubble chart?

Or maybe a matrix chart?

(I think that Many Eyes WIkified updates its data pages from live data feeds, so hopefully the linked to visualisation should remain pretty much up to date? [All the visulisations can be reached from this Many Eyes Wikified page.])

Isn’t this fun?:-) So why don’t you have a go?????

PS for a more comprehensive review of what’s possible with the query language, I’ve posted a wide selection of examples here: Using Google Spreadsheets as a Database with the Google Visualisation API Query Language.

Making it a Little Easier to Use Google Spreadsheets as a Database (Hopefully!)

It was nice to see a couple of people picking up on my post about using Google Spreadsheets as a database (e.g. Using Google Spreadsheets as a database (no, it really is very interesting, honest) over at the Online Journalism blog), but it struck me that the URL hacking involved might still deter some people.

(Btw, the only list of keywords I’ve found to date for the query language are on the official documentation pages, and even then they aren’t complete…)

So – I spent an hour or so last night putting together a first attempt at a form based interface for writing the queries and constructing the URLs.

The form starts with a slot for the key of the spreadsheet you want to query – clicking on the preview button should display the column headings:

This preview can be used to help you select the columns you want to interrogate or return in your query, counting left-to-right: A, B, C and so on.

Next up are some hints on example queries:

and then there is the query form itself:

I’ve made a start on separating out the different bits of query, but there’s clearly lots more that could be done. For example, an optional “order by” slot could be provided (with a post-qualifying asc or desc selection), or the select entry box could be turned into a multiple selection listbox displaying the column headers, (but I only gave myself an hour, right?;-) [Note to self: lots of handy functions here – Google Visualization API Reference]

Anyway, once you make the query, links to the URIs of the HTML preview and CSV versions of the query are automatically generated, and the HTML table of results is displayed:

The CSV URI can then be used to import the data into a Many Eyes Wikified data page, for example.

Anyway, hopefully this makes it a little easier to get people started with these queries. A good place to start looking for spreadsheets is on the Guardian DataBlog.

Note that this “work” also ties in strongly to the idea of “data journalism (hashtag: #datajourn) which I’d be interested in hearing your thoughts about…

Visualising Where the Money Goes: Westminster Quangos, Part 1

My tweetwatching has been a little sporadic of late, so I haven’t really been keeping up with the data sets that keep being posted to the Guardian Datablog, but today I had a quick skim through some of the recent uploads and had my eye caught by a post about funding of UK government quangos (Every quango in Britain [spreadsheet]).

What I’ll show in this post is how to construct a query on one of the quangos data sheets that can then be visualised as a change treemap, showing at a single glance how funding over all the quangos (arranged by sponsor department) has changed over the period 2006-2007.

The quangos spreadsheet is split into several different areas and at face value is quite hard to make sense of (what’s the difference in status (and provenance of the data) between the NHS and Health quangos, for example?

But I take nothing if not a pragmatic view about all this data stuff, and whilst there may be, err, issues with doing “proper” data journalism with this spreadsheet, I think we can still get value from just seeing what sorts of technology enhanced questions we might ask of this data, such as it is (as wll as identifying any problems with the data as it is presented), and as a result maybe identifying various issues with how to present and engage with this data were it to be republished again.

As ever, my doodles don’t properly acknowledge the provenance or source of the data, nor do I try to make any sense out of the data or look for any specific ‘meaning’ within it – I’m still at the stage of sharpening my technology pencil and seeing what sorts of marks it can make – but this is something I know I don’t do, and will be something I start to look at somewhen, honest!

So let’s make a start. To provide a bit of context, the questions I set out with when doodling through this post were:

1) is the data clean enough to run summarising queries on the data (that is, queries that summed totals for different departments)?
2) is the data clean enough to not break Many Eyes Wikified if i pass it to that visualisation tool via a CSV feed?

And a matter arising:

3) how do I write a query that specifies column headings (the headings in the spreadsheet leave something to be desired, at times….)?

The spreadsheet I chose to play with was the Westminster sheet, which you can find from here: UK Quangos [Guardian Datastore] (you’ll need to select the appropriate tab).

Just by looking at the data in the spreadsheet we notice a couple of things, things that suggest certain queries we might run on the data. (I probably need to unpack that phrase at some point (“run certain queries”) but the essence of it is this: if we treat the spreadsheet as a database, so sort of reprts can we generate from it; typically, in a database environment, reports are generated by running queries using some sort of database query language, which in the case of Google spreadsheets is the SQL like Google Visualisation API Query Language.)

So, the first thing I noticed are the two columns on the left – Government departments and presumably the quangos sponsored by those departments. And what these suggested to me were that I should be able to generate reports that summarise expenditure over all quangos in each department. (Whether or not this is interesting, I don’t know; but it’s something that we should be able to do easily enough, and it may spark off other questions in our mind).

The second thing I noticed was that lots of the data straddled two years (2006 and 2007)

And finally, gross expenditure seemed like a meaningful quantity and maybe least open to contention, so I decided to pick on that as the quantity to sharpen my pencil with:

To start playing with spreadsheets, I bookmarked it so that I could play with it in my Data Store explorer (note that I needed to specify the sheet number, where the first sheet is sheet 0, the second is sheet 1, and so on; so the Westminster sheet (the third sheet form the left in the spreadsheet) is sheet 2):

When we preview the column headings, (which the API assumes are in the top row, I think?), we get – not a lot…

If we scroll down in the data store explorer, we get at least the spreadsheet column labels:

Anyway, let’s try to run a query that summarises the overall gross expenditure for 2006 (column R) and 2007 (column S) for each department (column C):

The query is encoded as:
select%20C%2Csum(R)%2Csum(S)%20group%20by%20C%20order%20by%20sum(R)%20desc

That is:
select C,sum(R),sum(S) group by C order by sum(R) desc

So we select three columns, and for each column, group the rows according to department (column C), display the summed value over those grouped rows for columns R and S, and presenting them in descending (desc) order by column sum(R):

If we click on the HTML preview link, we can view the table in its own page:

select C,sum(R),sum(S) group by C order by sum(R) desc

(A link to the CSV version is also generated.)

The data explorer doesn’t support forms for all the queries we can write yet, so the next step requires hacking the HTML table URI directly to add labels to the columns:

select C,sum(R),sum(S) group by C order by sum(R) desc label C ‘Dept’, sum(R) ‘2006 Gross Expenditure’, sum(S) ‘2007 Expenditure’

If you’re hacking the URI in a recent browser address/location bar, you don’t need to encode things like spaces as %20, or single quotes as %27, because the browser will take care of it for you:

If you then copy this URI and paste it back into the location bar, the encoded version will be generated for you, e.g. so that you can use it as a link in a web page:

So we have our basic summary report, and can now use the CSV output of the spreadsheet so that it can be transported elsewhere. There are two things we need to do now.

The first is to just change the output format of the data from an HTML table to CSV. Take this part of the URI:

http://spreadsheets.google.com/tq?tqx=out:html&tq=select C

and change it to this:

http://spreadsheets.google.com/tq?tqx=out:csv&tq=select C

If you preview the CSV, you’ll notice there’s a problem with it though:

There are rogue commas everywhere, appearing within the ‘numerical’ data, and this file is supposed to use commas to separate out different elements. To get proper numbers out, we need to set their format, which means adding something to the end of the URI:

format sum(R) ‘#’,sum(S) ‘#’

(Not that you do need to encode the #s by hand, as %23)

That is, present sum(R) and sum(S) in a proper numerical format:

So there we have summary tables showing the expenditure for each government department. Many Eyes Wikified isn’t letting m import that data directly via CSV at the moment, but it’s easy enough to download the CSV and copy the data into a Many Eyes Wikified data page:

That is:

(Casting an eye over the CSV data, there are also a couple of glitches in it that upset the grouping – eg “Home” and “Home ” (trailing space) are treated differently.)

We can now use this page to create some visualisations – so put some placeholders in place in a visualisation page:

And then generate some visualisations

One of the potentially most informative summary views for this data is a change treemap, that shows the relative gross expenditure for each department, along with whether it has increased or decreased between 2006 and 2007:

Blue is decrease from 2006 to 2007, orange is an increase.

The next step is now to create a change treemap for each quango within a department, but that’s a post for another day… [UPDATE: see it here – Visualising Where the Money Goes: Westminster Quangos, Part 2]

Where Next With The Hashtagging Twitterers List?

This post is a holding position, so it’s probably gonna be even more cryptic than usual…

In Who’s Tweeting Our Hashtag?, I described a recipe for generating a list of people who had been tweeting, twittering or whatever, using a particular hashtag.

So what’s next on my to do list with this info?

Well, first of all I thought it’d be interesting to try to plot a graph of connections between the followers of everyone on the list, to see how large the hashtag audience might be.

Using a list of about 60 or so twitterers, captured yesterday, I called the Twitter API http://twitter.com/followers/ids/USERNAME.xml function for each one to pull down an XML list of all each of their followers by ID number, and topped it up with the user info (http://twitter.com/users/show/USERNAME.xml) for each person on the original list; this info meant I could in turn spot the ID for each of the hashtagging twitterers amongst the followers lists.

It’s easy enough to map transform these lists into the dot format that can be plotted by GraphViz, but the 10,000 edges or so that the list generated from the followers lists was too much for my version of GraphViz to cope with.

So instead, I thought I’d just try to plot a subgraph, such as the graph of people who were following a minimum specified number of people in the original hashtag twittering list. So for example, the graph of people who were following at least five of the the people who’d used the particular hashtag.

I hacked a piece of code to do this, but it’s far from ideal and I’m not totally convinced it works properly… Ideally what I want is simple (efficient) utility that will accept a .dot file and prune it, removing nodes that are less than a specified degree. (If you know of such a tool, please post a link to it in the comments:-)

Here’s the first graph I managed to plot:

If my code is working, an edge points to a person if at that person is following at least, err, lots of the other people [that is: lots of other people who used the hashtag]. So under the assumption that the code is working, this graph shows one person at the centre of the graph who is following lots of people who have tweeted the hashtag. Any guesses who that person might be? People who have edges directed towards them in this sort of plot are people who are heavily following the people using a particular hashtag. If you’re a conference organiser, I’m guessing that you’d probably want to appear in this sort of graph?

(If the code isn’t working, I’m not sure what the hell it is doing, or what the graph shows?!;-)

One other thing I thought I’d look at was the people who are following lots of people on the hashtagging list who haven’t themselves used the hashtag. These are the people to whom the event is being heavily amplified.

So for example, here we have a chart that is constructed as follows. The hashtag twitterers list is constructed from a sample of the most recent 500 opened09 hashtagged tweets around about the time stamp of this post and contains people who are in that list at least 3 times.

The edges on the chart are directed towards people who are not on the hashtag list but who are following more than 13 of the people who are on the list.

Hmmmm… anyway, that’s more than enough confusion for now… I’m going to try not to tinker with this any more for a bit, becuase a holiday beckons and this could turn into a mindf**k project… However, when I do return to it, I think I’m going to have a go at attacking it with a graph/network toolkit, such as NetworkX, and see if I can do a proper bit of network analysis on the resulting graphs.

Preliminary Thoughts on Visualising the OpenEd09 Twitter Network

The wealth of activity on the #opened09 Twitterstream keeps stealing my attention, and has got me thinking about how we might start to analyse the structure of the network around the hashtag, in part so we can understand information flow through that part of the open education network better.

So what’s come to mind? Here are some early sketches that I’m using as a foil to see what sort of questions they bring to my mind about the structure of the network. Note that this is a ‘first principles’/grass roots/bare bones approach – I’m not going to use any formal social network analysis techniques because I don’t know anything about them (yet….;-) But by starting with some doodles, the questions that arise may lead me to appreciating why the formal SNA tools and approaches are so useful in practical, easy to understand, human terms, and thus provide me with the motivation to learn how to use them…

First up, is there any value in understanding the structure of the network of people who are twittering with the #opened09 hashtag? Partly because of rate limiting on the Twitter API, last night I grabbed a sample of the most recent #opened09 hashtagged tweets, and then filtered them down to the most active twitterers over that period (people using the hashtag more than three times in the sample, I think it turned out to be). Then I pulled down their follower lists from the Twitter API, and constructed a graph of who followed who in that set. Here’s the Graphviz plot of that graph:

opened09 - follower relations between active twitterers

So what questions does this bring to mind? Well, off the top of my head:

– who is the most connected person in this graph? Does that tell us anything useful? Would we expect the event organiser to be the most connected?
– is there anyone in the network who isn’t very connected to other people? Why? Are they a different ‘sort’ of user? Are they new to the network?
– does the connectedness of people within the graph change over the course of the event? (I think that the Twitter API returns a list of followers in reverse chronological order; so we could approximate the growth by comparing the above graph with one that ignored the most recent 50(?!) followers of each person?
– are all Twitterers equal? Should we treat users who only ever use the hashtag as part of an RT differently when constructing this graph?
– is there any value in representing the number of followers each person has within the above graph? Or the number of people they follower? Or some function of the two? What about the number of times they used the hashtag in the sample period (or the number of times they RT the hashtag, or a function of the two) – should that be reflected too?

What’s the POINT of asking these questions? How about this – as individuals, can we identify members of the community who we don’t know? (IE is this sort of graph a good basis as a friend recommender? Would a big poster of this sort of graph be a good thing to post in the event coffee area? What would people look for in it if we did?)

Okay, next up: there’s always talk of things like twitter being used to amplify an event to ‘virtual participants’. How big might that audience be? And who comprises it? Are there people not at the event who effectively amplify it further?

How about a plot of the simple reach of the twitters, as a treemap?

Hmm… I’m not sure about that… Just because you have a big following doesn’t mean it’s a big relevant or interested following? (The number of people RTing the hashtag in your audience might?)

instead, how about this sort of graphic to help frame some questions:

opened11-amplificationNet

This one shows, for the most active opened09 hashtag twitterers, the people who follow more than 12 (maybe?; or more than 11?) of them. The named individuals are heavy opened09 twitterers, the numbers are the Twiitter IDs of the people who are seeing the event amplified to them. (Not that these people may also have tweeted the hashtag, only not so heavily. Maybe I need a stop list that removes people from this ‘amplification graph’ who have used the hashtag? That way, we can identify ‘leaves’ on the #opened09 tree – that is, people who received some number of #opened09 tweets but who never used the hashtag?)

So what questions does it bring to mind:

– are there people receiving large numbers of opened09 tweets who are unknown to the community?
– do the opened twitterers fall into cliques or reasonably well clustered groups around sets of followers who aren’t tweeting? (Would a cluster analysis be an interesting thing to do here?)
– if we lower the sampling threshold that specifies the minimum number of heavy twitterers that a ‘listener’ is following, how does the size of the listening audience grow? Is this interesting? Does the numbr of people that a listener follows influence how likely they are to see opened09 tweets. (eg if i follow 20 opned09 heavy twitterers, and only 50 people in all, my traffic may be domintad by opened09 folk; if i follow 500, or 1000, or 2000, then that traffic is likely to be diluted?)

And the POINT? Can we get a feeling for the audience the event is being amplified to? Are there members of that audience who seem to be a member of the community but aren’t really known to the community? Can we find the lurkers and pull them in with a personal invite (and is this even ethical?)

Just by the by, looking at RT networks could also be interesting – that is, looking at patterns of RTing across the network. maybe a graph showing people who RTd hashtagged tweets, as well as the path back to the original tweet? (This brings to mind some of @mediaczar’s work looking at Twitter in a PR context – which is exaclty what event amplification is, right?)

So having got some questions in mind (if you have more, please add them as comments below), I’ve got some sort of rationale for having a look at some formal graph theory and social network analysis stuff. This looks like it could be a good place to start: M.E.J. Nwman – The mathematics of networks [PDF].

[UPDATE: I guess the heuristic I have in mind with respect to the charts and SNA is this: are there features from the visualisation that jump out at me that the SNA tools can also pick out?]

A Quick Peek at the IWMW2009 Twitter Network

This is just a quickie following up Preliminary Thoughts on Visualising the OpenEd09 Twitter Network, Where Next With The Hashtagging Twitterers List? and Who’s Tweeting Our Hashtag? with a couple of images showing some of the ‘Twitter action’ going on around IWMW2009. (I’ll post the how to when I’m not on holiday…the recipe is a little different to the approach I used in the previous posts…;-)

First up, a graph of who sent a tweet to whom whilst using the #iwmw2009 hashtag. The arrows go FROM the person sending the tweet TO the person they sent it to (I think…):

iwmw09fromtoCropped

Secondly, a graph of the RT/retweet network. The arrows go FROM a person who RTd a post TO the person who sent the original tweet (so if A RTd B, the arrow goes FROM A TO B).

iwmw2009RT

Note that if a link goes from A TO B TO C, it doesn’t necessarily mean that A RTd B’s RT of tweet from C. It just means that at some point A RTd B, and at some point B RTd C. This sort of graph might help to identify amplification networks? So for example, we see iwmw and briankelly both being amplified/RTd quite a bit, as well as jaducms. Were folks from the latter engaged in a little bit of social media PR, I wonder?!;-)

Okay – that’s enough of a teaser for now; back to being on holiday… :-)

PS I actually tweeted a links to a couple of similar graphs fro opned09 ystrday – but there was a big oops and those graphs were actually wrong, so I’ve deleted them; I’ll repost them via a blog post in a few days…

Scripted Diagrams Getting Easier

A quick heads-up on an another tool (diagrammr) that makes it easy to create network/graph diagrams like this:

Just type in a description of the graph and the diagram will be generated at the same time [video]:

[Infoskills note to self: when making a screencast with Jing, after clicking in a text box area, remember to move the mouse cursor out of the way…]

(Regular readers will know I’ve been this sort of thing for some time; for example, see Scripting Charts WIth GraphViz – Hierarchies; and a Question of Attitude, Writing Diagrams, RESTful Image Generation – When Text Just Won’t Do or Visual Gadgets: Scripting Diagrams).)

As well as creating diagrams, Diagrammr allows you to embed them, providing an image/PNG URI for your diagram; you can also edit the image (that is, edit the script that generates the image) after the fact via a shareable URI.

The URI for the editor page can be generated from the image URI, though, so without the ability to set a password on the editor page when you first crate a new image, this means that any time you embed a Diagrammr image, someone else could go and edit the image?

In an educational context, tools like this make it much easier for students to create their own diagrams (typing in a graph description is far quicker than trying to lay it out by hand in a drawing package). As you script the diagram, your attention is focussed on the local structural components/relations that define the graph, whilst at the same time the automatically generated diagram visualises the overall structure and brings alive its complexity at the network level.

(I’m not sure how the graph layouts are generated – maybe using Graphviz on the server to generate the image and return it to the browser? If so, an improved version of diagrammr might be able to return the compiled xdot version of the graph back to an interactive canviz component running in the browser?)

If you’re working in an insitutional VLE context, where the powers that be are still trying to retain control of everything, the Canviz component might offer one solution – an HTML 5 canvas library for displaying ‘compiled’ Graphviz network descriptions.

Although I haven’t tried it out, there is apparently a recipe for integrating Graphviz with Drupal (Graphviz Filter) and a suggestion for including Canviz into the mix (GraphMapping Framework (graphviz_api + graphviz_fields + graphviz_views + graphviz_filter) – has this been implemented yet by anyone, I wonder?). I’ve no idea if anyone has tried to do something similar in a Moodle environment…

PS here’s another one – a UML editor: YUML

More Thinkses Around Twitter Hashtag Networks: #JISCRI

A brief next step on from Preliminary Thoughts on Visualising the OpenEd09 Twitter Network and A Quick Peek at the IWMW2009 Twitter Network with a couple of graphs that look at the hashtag network around the JISCRI event that’s going on this week.

The sample was a taken from a search of recent #jiscri hashtagged tweets captured last night using the Hashtag Twitterers pipe.

The first chart was to look at people who the hashtag twitterers were following in large numbers who weren’t using the hashtag (I think…my experimental protocol was a bit ropey last night… oops).

The graphs were plotted using Graphviz – firstly a radial plot:

jiscrinetExtGurus

And then a circular one:

jiscrinetExtGurus2

The circular one is quite fun, I think? :-) At a glance, it shows who the “external gurus” are, as well as the differences in their influence.

The second thing I looked at was the network graph of the JISCRI hashtaggers, showing who friended whom:

jiscriTwitterNet

Here’s the circular view:

jiscriTwitterNetCircular

For a large event, I think this sort of graph could be quite fun to generate at both the start of the event and at the end of the event, to show how connections can be formed during an event.

For conferences that publish lists of attendees, popping up a poster of the delegates’ twitter network might provide an interesting discussion thing for people to chat around.

PS See also Meet @HelloApp, Making Conferences More Fun.

Treemapping Council Committees Using OpenlyLocal Data

Some time ago, I started exploring how treemap representations might be used to provide a quick overview of the make-up of the membership of local council committees (Glanceable Committee Memberships with Treemaps).

Following a lazyweb request to @countculture about expressing complete council committee membership data from Openly Local (Topical Versions of Local Council Websites… Give Us the Data and We’ll Work It For You), and the rapid fulfilment of that request ;-), here’s a proof of concept about how to use that data to power a treemap from the Javascript InfoViz Toolkit (JIT) to provide a glanceable display of the make-up of Isle of WIght Council committees, colour coded by party:

Council cttee treemap

Each committee is represented as follows:

{children: [
 {children: [],
  data: {
   $area: "3", $pcolor: "purple",members:"Jonathan Francis Bacon, Paul Fuller, Heather Humby, "}, id :"ccl12_0", name: "Independent"
 },
 {
  children: [],
  data: {
   $area: "5", $pcolor: "blue",members:"Ivan Bulwer, Susan Jane Scoccia, Albert Taylor, Jerry White, David G Williams, "},
   id :"ccl12_1",
   name: "Conservative"},
{
  children: [],
  data: {
   $area: "1", $pcolor: "darkorange",members:"Colin Richards, "},
   id :"ccl12_2",
   name: "Liberal-Democrat Group"}
], data: {$area: 9}, id: "ccl12", name: "Licensing and General Purposes Committee"}

With another quick bash at the code, I should be able generate these views on the fly from the JSON data feeds provided on the OpenlyLocal site. (At the moment, the route I take to generate the Javscript object that powers the treemap is a really clunky one:-(

What never ceases to amaze me, though, is how a tweak from one representation of a dataset (that is, the JSON data published by OpenlyLocal), to another (the JIT treemap representation) allows the creation of interactive visuliastions as if by magic :-)

If you want to play with your own treemaps in the meantime, this bit of Javascript will produce a simple representation of committee. member and party data that can be visualised within Many Eyes WIkified:

for (var i=0;i< c.committees.length;i++){
  for (var j=0;j< c.committees[i].members.length; j++){
  	document.write("\""+c.committees[i].title.replace("&", "and")+"\""+","+"\""+c.committees[i].members[j]["first_name"]+" "+c.committees[i].members[j]["last_name"]+"\""+","+"\""+c.committees[i].members[j]["party"]+"\""+"<br/>");
  }
}

(where c is the javascript object that is published as the JSON feed from a committee page on OpenlyLocal, such as this one for the Isle of Wight).

Here’s an example:

Whilst this treemap doesn’t allow you to fix the colours in the way that the JIT component does:

TM.Squarified.implement({  
   'setColor': function(json) {  
     return json.data.$pcolor; 
   }  
 });

($pcolor is a variable I set for each committee member saying what colour should be displayed for them…), the Many Eyes Wikified does allow you to manipulate the tree representation that powers the treemap, e.g. by reordering the way in which the different elements are displayed:

What I really need now is a way of creating the hierarchical JIT objects on the fly from a table based representation… Hmmm….

Tinkering With Timetric – London Datastore Borough Population Data

Earlier this week, the Greater London Authority opened up a preview/prototype of the London Datastore (news reports).

Many of the data sets I quickly looked at are being made available as CSV and XML data feeds, which is very handy :-)

Anyway, in preparation for having some new recipes to drop into conversation at News:Rewired next week, I thought I’d have a quick play with visualising some of the timeseries data in Timetric to see what sorts of “issues” it might throw up.

So how does Timetric like to import data? There are three main ways – copy’n’paste, import a spreadsheet (CSV or XLS) from your desktop, or grab the data from a URL.

Obviously, the online route appeals to me:-)

Secondly, how does Timetirc expect the data to be formatted? At the moment, quite rigidly, it seems:

To publish data in a format Timetric can understand, you should expose it in one of two formats — either CSV or Excel (.xls) format. Dates/times must be in the first column of the file, and values in the second.

For importing CSV data, the date/time should be in W3C datetime format (like 2009-09-14 17:34:00) or, optionally, Unix timestamps as floating point numbers.

Hmmm, so at the moment, I can only import on time series at a time, unless I’m a geeky hacker type and know how to “write a programme” to upload multiple sets of data from a multi-column file via the API… But OUseful.info isn’t about that, right?!;-)

Let’s look at some of the London datastore data, anyway. How about this – “Historic Census Population” data.

Let’s preview the data in a Google spreadsheet – use the formula:
=ImportData(“http://data.london.gov.uk/datafiles/demographics/census-historic-population-borough.csv&#8221;)

Ok – so we have data for different London Boroughs, for every decade since 1801. But is the data in the format that Timetric wants?

– first no: the dates run across columns rather than down rows.

So we need to swap rows with columns somehow. We can do this in a Google spreadsheet with the TRANSPOSE formula. While we’re doing the transposition, we might as well drop the Area Code column and just use the area/borough names. In a new sheet, use the formula:
=TRANSPOSE( ‘Original Data’!A1:W )
(Note, I’d renamed the sheet containing the imported data as Original Data; typically it would be Sheet1, by default.)

NB It seems I could have combined the import and transpose formulae:

Now we hit the second no: the dates are in the wrong format.

Remember, for Timetric “the date/time should be in W3C datetime format (like 2009-09-14 17:34:00) or, optionally, Unix timestamps as floating point numbers.”

My fudge here was to copy all the data except the time data to a new sheet, and just add the time data by hand, using a default day/month/time of midnight first January of the appropriate year. Note that this is not good practice – the data in this sheet is now not just a representation of the original data, it’s been messed around with and the data field is not the original one, nor even derived from the original one (I don’t think Google spreadsheets has a regular expression search/replace formula that would allow me to do this?)

Anyway, that’s as may bee;-). To keep the correct number format (Google spreadsheets will try to force a different representation of the date), the format of the date cells needs to be set explicitly:

So now we have the data in rows, with the correct data format, the dates being added by hand. Remembering that Timetric can only import one time series at a time, let’s try with the first data set. We can grab the CSV for the first two columns as follows – from the Share Menu, “Publish as Web Page” option, choose the following settings:

(The ‘for timetric’ sheet is the sheet with the tidied date field.)

Here’s the CSV URI, that we can use to get the data in Timetric:

http://spreadsheets.google.com/pub?key=tufGbjv41z8fDauvzyfY7ZA&single=true&gid=2&range=A1%3AB22&output=csv

The upload took a good couple of minutes, with no reassuring user notifications (just the browser appearing to hang waiting for a new timetric page to load), but evntually it got there…

(And yes, that drop in population is what the data says – though for all the other boroughs you get a curve shaped more as you’d expect;-)

To import other data sets, we need to insert a new Date column, along with dat data (I copied it from the first Dat column I’d created) and then grab the CSV URI for the appropriate columns:

Anyway, there we have it – a recipe (albeit a slightly messy one) for getting CSV data out of the London datastore, into a Google spreadsheet, transposing its rows and columns, and then generating date information formatted just how Timetric likes it, before grabbing a new CSV data feed out of the spreadsheet and using it to import data into Timetric.