Archive for March 2010
Quick Viz – Australian Grand Prix
I seem to have no free time to do anything this week, or next, or the week after, so this is just a another placeholder – a couple of quick views over some of Hamilton’s telemetry from the Australian Grand Prix.
First, a quick Google Earth to check the geodata looks okay – the number labels on the pins show the gear the car is in:
Next, a quick look over the telemetry video (hopefully I’ll have managed to animate this properly for the next race…)
And finally, a Google map shows the locations where the pBrakeF (brake pedal force?) is greater than 10 %.
Oh to have some time to play with this more fully…;-)
PS for alternative views over the data, check out my other F1 telemetry data visualisations.
F1 Data Junkie – Visualising the Zone
Another scheduled eye candy tease of a post… this time, visualising the braking zone ( > 5% brake force) over several tours of the Bahrain circuit:
These markers really need colouring into bins (e.g. 5-10%, 10-20%, 21-40%, 41-60%, 61-80%, >80%) or treating via a heat map to show the brake going on/coming off, if we can assume that Hamilton is doing pretty much the same thing each lap… (What we’re doing i essentially trying to create a fine degree of resolution in space by taking samples over returns to the same space over multiple laps.)
By way of comparison, here’s where Hamilton is full on with the throttle (throttle at 100%):
Gulp… remember when the Hamster tried to do that?
Sheesh… so you you hear him talking about the braking forces, here’s what it looks like in the 4G (longitudinal, very heavy braking) and above bin:
More Ways of Looking At the Mclaren F1 Telemetry Data
Okay – I know I said that the next post in this series would start looking at the stories the Mclaren F1 telemetry data is telling us, but I’m away this weekend so I thought I’d schedule another eye-candy post…
So here you go – a couple of ways of looking at the data in Google Earth by popping the data into a google spreadsheet, grabbing the CSV data out and pushing it though a Yahoo Pipe, which helpfully generates a KML file for us. Firstly, a simple tour with speed labels on the markers:
Then it struck me – if I do a couple of minor tweaks to the KML, I can produce some coloured markers. So for example, here we have a view where the marker colour represents the gear Hamilton’s car is in during a race-day single lap of the 2010 Bahrain Grand Prix circuit:
The tilted view of the first image is far more appealing, don’t you think?
(What I really want to do is a heat map, but I think that’ll take a couple of hours the first time round, which I just don’t have at the moment…)
Appreciating Games Through Learning How To Make Them
When I was undergrad, I learned to juggle; ever since then, I’ve taken far more enjoyment in watching good juggling because I have an appreciation of just what’s involved.
And going by the evidence of student feedback from the the last presentation of our Digital Worlds course, the same appears to be true of students who are also game players. That is, they have learned to see more in the games they play – and appreciate them far more – from studying how games are designed and developed, as well as marketed and sold.
Anyway, last week I gave a talk about some of the student activities in T151, which is how the course is referred to in OU-land… As ever, the slides don’t make a lot of sense without me there to talk over them, but for anyone who was at the presentation, they may serve as a reminder of some of the key points…
(Hmm – slideshare does appear to be having trouble with my title slides at the moment… There were also video clips of games created by some of the students as part of the course assessment, but I’ve removed those from the slideshare presentation.)
One of the things that seemed to go down particularly well was a video I showed of the interactive Freemind mindmap view of the course:
If you’re interested in what the course covers, peer closely at the above image! The course also includes a weekly activity session where students build their own arcade games, starting with a maze game and moving on to a platform game. The end of course assessment includes the required production of a game design document, and optionally the submission of screenshots and programming documentation for an implementation of the game. Students are also introduced to a variety of quick-to-get-started online tools, such as Xtranormal.
Long time readers may remember I’ve posted on the use of mindmaps for navigating online content before (e.g. MindMap Navigation for Online Courses) but from the talk I got the impression this idea was new to many, which is why I’m re-mentioning it now.
Another innovative characteristic of the course is the way it is structured. Each week is based around one or two topic explorations, plus a practical activity. The Topic Explorations have a regular structure based around resources that are linked to on the public web. This includes content from the Digital Worlds uncourse blog. I’ve been arguing for well over two years now that the uncourse structure is a very powerful one, but most people don’t see it (Canadians seem to be the exception!;-) Anyway, there was a good example of how the uncourse structure supports reuse over the last couple of days, inspired by a post on the Guardian Technology Games blog by Charles Arthur about tax breaks for game developers: OK, so you get tax breaks for video games – now define ‘video game’.
Hmmm, I thought… Hmmm…. There are some posts about that on Digital Worlds, tagged “what is a game”… and lo, a micro-course was born: What is a Game? microcourse, generated as a thread of posts tagged in a particular way.
(WordPress geeks will see the URL includes the WordPress trick I posted recently.)
This idea of a micro-course – a brief intro to a particular or topic, or overview of an area, drawn from OU course content and activities is one I first used years and years ago (Robotics minicourse), so it was nice to rediscover it as an idea. I also think that with a minor bit of tweaking, it could sit very nicely as a model for demonstrating how we might be able to make use of OERs in a really practical way?
Anyway, anyway, the fully blown T51 Digital Worlds course is presenting again for 10 weeks from the start of May, so if you;re interested, there’s still time to register…
So What Courses Should the OU be Offering?
So it seems as if the OU’s Platform site is not soliciting ideas for what sorts of courses we should be running: I would like to study…
So if you have an idea for a course you want to study, or other offering you think the OU should make, why not go and add your idea… or vote up one of the suggestions that has already been posted.
It’ll be interesting whether any of this, err, insight is passed on to folk tasked with developing the curriculum and, err, the OU’s “product offering”. One of the things I think I’ll do is set up an alert or too on courses in particular areas, using a limited search query of the form:
search terms site:www.open.ac.uk/platform/join-in/your-votes/question/
Now, has anyone asked for a Yahoo Pipes course I wonder?!
(By the by, Yahoo Pipes seems to be making an appearance in the assessment of an OU course (M887 Web systems integration maybe?), not that I’d know about that in any official capacity. So, if anyone from the appropriate CT in the Computing Dept reads this, it’d be good to chat about the sorts of submission you’re getting, and how you’re marking it. (I was wondering why I’d had so many “can you help me with my Yahoo Pipes” requests lately…;-))
PS There appears to be another way in to a page that looks remarkably like the one above via a URL of the form http://www.open.ac.uk/platform/category/question-category/i-would-study but I’m not sure how I managed to end up on that one?! Although this page gives you a feed, none of the links point outside the page… So best use the I would like to study… link instead…
PPS @mweller and I jokingly said that if the OU offered such a service, we’d write an open educational course on the most voted for idea in an open and public way… Err, it was a joke, wasn’t it?!
PPPS a note on the URI design, which looks like http://www.open.ac.uk/platform/join-in/your-votes/question-by-date/I%20would%20like%20to%20study… And yes, the … are part of the link. Which means if you share it, the many linkifiers and URL shorteners will ignore that part of the URL and you’ll end up on a page that seems to be very slightly broken (just enough to be annoying/confusing). Oops…
F1 Data Junkie – What Does This Data Point Refer To Again?
The count down is on to my first post unpicking some of the telemetry data grabbed from the Mclaren F1 site during the Bahrain Grand Prix, and then maybe this weekend’s race, but first, here’s another tease…
One of the problems I’ve found from a data-based (groan…) storytelling perspective is relating what the data’s telling us to what we know the car is doing is from where it is on the track. As I/we refine our data anlaysis skills we’ll be able just to look at the data and work out what the likely features of the track are at the point the data was collected; but as novice data engineers, we need all the cribs we can get. Which is why I had a little play with my Processing code and built an interactive data explorer that looks something like this:
The idea is that I can easily select a data trace, or a location on the track, and get a snapshot of the data collected at that point in the context of the other data points. That is, this data navigator allows me to expose the data collected in a single sample, in the the context of the position of the car on the track, and given the state of the other data values at the same point in time, as well as immediately before and immediately after.
I’ll post a version of this data explorer somewhere when I post the first data analysis post proper, but for now, you’ll just have to make do with the video…;-)
PS As to where the data came from, that story is described here: F1 Data Junkie – Looking at What’s There
Viewing WordPress Posts in Chronological Order
A short and sweet blog post this one… if you want to share a list of posts by tag or category, or the results from a search on a WordPress blog in the order in which they were posted, just add ?orderby=ID&order=ASC to the end of the URL.
Like this:
http://digitalworlds.wordpress.com/category/what-is-a-game/?orderby=ID&order=ASC
What this means is that you can share tagged posts in a chronological view, rather than the default reverse chronological review. Which means your reader can read them in the right order without having to go through any grief…
[UPDATE: as Simon Dickson points out in a comment below, the above actually returns the order in which posts were created. For the order in which they were published, use ?orderby=date&order=ASC]
PS it works for feeds too…
PPS I just added this hack to my blog sidebar too – as a “View these posts in chronological order” link:
:-)
PPPS a whole host of other ordering parameters appear to be available too:
* orderby=author
* orderby=date
* orderby=title
* orderby=modified
* orderby=menu_order Note: Only works with Pages.
* orderby=parent
* orderby=ID
* orderby=rand
* orderby=meta_value Note: A meta_key=keyname must also be present in the query.
* orderby=none – no order (available with Version 2.8)
* orderby=comment_count – (available with Version 2.9)
On wordpess.com blogs at least, the pagination parameters other than order don’t appear to work though? (nopaging=true (i.e. display all corresponding posts), posts_per_page=, paged=)
Multi-Dimensional and Multiple-Perspective Storytelling
In Reversible, Reverse History and Side-by-Side Storytelling I linked to an example of side-by-side video storie where two videos are played next to each other. @cogdog then shared a link in the comments to the HBO video cube which adds a whole other dimension!
(I seem to remember the BBC experimenting with similar forms in the past – e.g. co-broadcasting different perspectives of the same story on different channels, or even different media (such as radio and television [anyone got an links to write-ups of those?).
Anyway, today, via GigaOm, I came across something related - a virtual choir:
For some reason, that video also reminded me of this?!
Anyway, it struck me that backchannel commentary is another approach that demonstrates this mutliple perspective idea. Which is why I think that the Twitter video captions idea has a long way to run yet. (By the by, Martin Hawksey has a done a great job posting a version of Gordon Brown’s Building Britain’s Digital Future announcement with twitter subtitles. If anyone wants to volunteer or help fund us to develop this app a little further, we have the ideas but just need to secure the developer time…:-)
A Rosetta Stone for Guardian Datastore UK Higher Education Data
In Does Funding Equal Happiness in Higher Education?, I described a couple of interactive visualisations that are built up around a dataset that pools data from several of the Guardian datastore Higher Education datasets. In this post, I’ll show how that aggregated dataset was put together, and review some of the problems that the approach I took has.
The first thing to note is that the data I wanted to combine existed in several different spreadsheets, which we might think of as several different databases. Notwithstanding some of the “issues” I have with some of the more puritanical elements of the Linked Data world view, publishing data from different sources that is ostensibly about the same things (i.e. Higher Education institutions) in the seemingly arbitrary way that it has been published on the Guardian Datastore ires me even more… (I’ve written about this before (e.g. The Guardian OpenPlatform DataStore – Just a Toy, or a Trusted Resource?) so I’m not causing any new offence by saying this;-)
So what’s the problem? In short, this sort of thing (the column contents are taken from several of the HE related spreadsheets):
Given the name of a university from one spreadsheet, it’s all but impossible to match the data to a corresponding university in one of the other spreadsheets.
Ideally, each spreadsheet should use a common identifier for a particular institution, such as a UCAS institution code; but that hasn’t happened, which makes relating one data set to another – such as comparing drop out rates to student satisfaction scores – difficult.
One possible way around this is to use a “Rosetta Stone” spreadsheet (e.g. The Guardian rosetta: the Datablog reference guide to nearly everything) which contains synonyms for the same entity as used across several spreadsheets. (I kept meaning to demo how this could be used, but never got round to it, so I’ll give one possible take on it in a quick tutorial below…) The translations for the HEIs has been barely attempted in the current version of the Guardian Datastore Rosetta sheet, though, so I spent a couple of hours last night addressing that and creating my own Rosetta Stone spreadsheet for the HE data.
So how does it work? The spreadsheet essentially defines a set of sameAs relations within a row using the principle: one row, one object. The columns correspond to separate datasheets within the Guardian datastore. Each cell corresponds to the identifier used within a particular datasheet (that is, within a particular spreadsheet that we are using as a database) to describe a particular thing.
I was going to say that this contrasts to the Linked Data principle of “one thing, one identifier”, but that principle is not explicitly one of the four Linked Data rules, is it…?
So how can we use the Rosetta Table? One way is to use a =QUERY() formula, building on the ideas explored in Using Google Spreadsheets Like a Database – The QUERY Formula.
Recall that the QUERY formula has the form: =QUERY(RANGE,DATAQUERY). Now here comes the important bit:
if we know the name of (that is, the identifer for) an HEI in one particular datastore spreadsheet, we can look up the identifier used for the same institution in another datastore spreadsheet using the Rosetta sheet as a stepping stone.
For example, if we have the Rosetta data in a sheet called “Mapping”, column B contains the UCAS codes and column F contains the name of a university for a datasheet that we are currently interested in, (that is, one from which we have the name of the institution) we can use a query of the following form to grab the UCAS code (the formula will also return the name of the institution we are looking up the code for):
=query(‘Mapping’!B:F,”select B,F where F contains ‘The City University’”)
So for example:
gives:
If we want to look up the name of an institution in a spreadsheet whose identifiers are listed in column C of the Rosetta sheet, using the name of an institution as described using an identifier taken from a spreadsheet corresonding to Rosetta column F, we can use a formula of the form:
=query(‘Mapping’!C:F,”select C,F where F contains ‘The City University’”)
If we are pulling in the name of the institution we want to look up a UCAS code or synonym for from another cell (say, B1), we can use a formula of the form:
=query(‘Mapping’!C:F,CONCATENATE(“select C,F where F contains ‘”,B1,”‘”))
(Note that in this case, C and F are columns C and F in the “Mapping” sheet, and B1 refers to column B in the current sheet.)
So for example:
which gives:
Once we have looked up the identifier for an instituion in one datastore sheet that corresponds to an institution mentioned in another datastore sheet, we can use that identifier to lookup data from that sheet. In other words, we can create a spreadsheet whose rows contain data for a particular institution pulled from separate datastore spreadsheets. The method is a little clunky, as I’ll show below, but it works. (I’ll try to post a more efficent way in a few days.)
The recipe is as follows:
- populate a sheet with the names of universities as identified in one particular sheet. You might do this by using an =ImportRange() formula, like this one:
=ImportRange(“tr8_2VPY0bfJQgf29KRz9sg”,”ALL UNIVERSITIES!A1:B132″)
that pulls in data from the 2010 funding spreadsheet.
- look up the synonym for each institution as used in a different spreadsheet (e.g. the student satisfaction spreadsheet) using the Rosetta table loaded into a separate sheet (I called mine “Mapping”); e.g. =query(‘Mapping’!C:F,CONCATENATE(“select C,E where E contains ‘”,A2,”‘”))
If we drag that cell formula down the column, we get the other synonyms too:
- now we can run a QUERY to pull in the student satisfaction data for the corresponding institution into the appropriate row, to give us rows that contain funding AND student satisfaction data. There is just one issue though. Whilst the spreadsheet documentation suggests that the RANGE for a QUERY() should be okay as range of cells imported from another spreadsheet using an =importRange() formula, it doesn’t actually appear to work… Instead, we only seem to be able to run a QUERY over a range of cells contained in a sheet within the current spreadsheet. Which means we need to copy the student satisfaction data into another local sheet and then call on that sheet when we run our query. such as:
=Query(‘Satisfaction’!B:F,concatenate(“select B,C,D,E,F where B matches ‘”,C2,”‘”),”")
Drag the cell down, and we get the satisfaction data, though we need to complete the column headings ourself using the above formula:
(If you put:
=Query(‘Satisfaction’!B:F,concatenate(“select B,C,D,E,F where B matches ‘”,C2))
into cell E1 this will pull the headings into row 1 and the appropriate data into row 2. Why? Because we removed the end of the no headers (“”) argument at the end of the query.)
Okay – that’s more than enough of that, for now. Hopefully you should have a reasonable idea of: a) how to use a Rosetta sheet to look up the name of an HEI in the appropriate format for a particular datastore spreadsheet given it’s name as taken from another datastore spreadsheet; and b) how to use that name to lookup data in a local copy of a datastore spreadsheet.
Before I sign off, though, it’w worth reviewing some of the problems with this approach.
Firstly, there’s the matter of compiling the Rosetta Stone spreadsheet itself. I hand-crafted this spreadsheet for several reasons: firstly, to let my fingers get a feel for the sorts of process that I really should have tried to automate; secondly, to get a feel for just what sorts of differences there were in the way the same institutions were represented across different spreadsheets; and thirdly, to see whether those difference were regular in any way, because if they are, we might be able to use heuristics to guess with a reasonable degree of success the mapping from one identifier on to another. (A couple of pointers about how possibly to approach this are described by @kitwallace here: A data mashup to explore Brian Kelly’s Linked Data challenge.)
Another class of problems relate to knowing what the data in each row, column or cell is about. So for example:
- if you pull data in from one spreadsheet into another one in without bringing in the column heading, you can lose track of what the data you have pulled in is;
- identifying what column to pull in from another spreadsheet is difficult; if you pull columns in by column number (A, B, C) if for any reason the column ordering changes in the spreadsheet you’re pulling data from, you lose the desired linkage; ideally, what you want to do is pull in columnar data by at least column heading, i.e. some descriptor that is used to identify a column in a meaningful way rather than an arbitrary way like column number;
- the link between the contents of a cell and what it refers to is only a weak one. If, by convention, we always use column 1 to hold the identifier of the thing being talked about, and the row 1 column headings as the identifiers that describe the properties of the the thing, then the co-ordinates of a cell can be used to identify the particular property of the particular thing being talked about. But if the table is not situated regularly within a spreadsheet (e.g. it starts at cell D7), things get a little bit more arbitrary (unless we have another convention, such as having padding cells around the row/column headings containing what amounts to punctuation to syntactically identify them as such).
(There’s a whole range of other problems about whether we can sensibly compare data from one spreadsheet with data from another… e.g. comparing funding in 2010 with drop out rates from 2001.)
If we unpick this, we see we really want two sorts of identifier: a set of unique identifiers for the HEIs (e.g. UCAS number) that are used consistently across different spreadsheets; and a set of unique identifiers for the properties of the HEIs (e.g. Average_NSS_Student_Satisfaction_Score, or 2010_HEFCE_funding_change) that can be used to uniquely identify a set of properties in one sheet so that they can be referenced explicitly from another.
Any Linked Data folk reading this will probably, at this point, be yelling “We told you so”, but as a pragmatist I think we have to find a way to make data work in the real’n'messy world…;-)
As Time Goes By, It Makes a World of Diff
Prompted by a DevCSI Developer Focus Group conference call just now, I had a quick look through the list of Bounty competition entries (and the winners to see whether there was any code that that might be fun to play with.
One app that’s quite fun is a simple app by Chris Gutteridge (Wayback/Memento Animation) that animates the history of a website using archived copies of the site from the Wayback Machine. So for example, here’s the animated history of the OU home page
And here are links to the history of the current Labour Party and Conservative Party domains: The animated history of: http://www.labour.org.uk/ and The animated history of: http://www.conservatives.com/.
The app will also animate changes from a MediaWiki wiki as this link demonstrates: Dev8D wiki changes over time.
(I can’t help thinking it needs: a) a pause button, so at least you can scroll up and down a page, if not explore the site; and b) a bookmarklet, to make it easier to get a site into the replayer;-)
The Dev8D pages also suggest a “Web Diff” app was entered in one of the challenges, but I couldn’t see a link to the app anywhere?
Diffs have been on my mind lately in a slightly different context, in particular relating to the changes made to the Digital Economy Bill on the various stages it went through as it passed through the Lords, but here again a developer challenge event turned up the goods, in this case the Rewired State: dotgovlabs held last Saturday and @1jh’s Parliamentary Bill analyser:
So for example, if we compare the Digital Economy Bill as introduced to the Lords:
http://www.publications.parliament.uk/pa/ld200910/ldbills/001/10001.i-ii.html
and the version that was passed to the Commons:
http://www.publications.parliament.uk/pa/cm200910/cmbills/089/10089.i-iii.html
here’s what we get:
Luvverly stuff :-)
PS @cogdog beats me to it again in a comment to Reversible, Reverse History and Side-by-Side Storytelling, specifically: “maybe this is like watching Memento backwards?” Which is to say, maybe the Wayback/Memento Animation should have a “play backwards” switch? And of course, this being a Chris Gutteridge production, it has. So for example, going back in time with the JISC home page
(Sob, I have no original ideas any more, and can’t even think of them before other people do, let alone implement them…;-(

























