A Tinkerer’s Toolbox…

A couple of days ago, I ran a sort of repeated, 3 hour, Digital Sandbox workshop session to students on the Goldsmiths’ MA/MSc in Creating Social Media (thanks to @danmcquillan for the invite and the #castlondon students for being so tolerant and engaged ;-)

I guess the main theme was how messy tinkering can be, and how simple ideas often don’t work as you expect them to, often requiring hacks, workarounds and alternative approaches to get things working at all, even if not reliably (which is to say: some of the demos borked;-)

Anyway… the topics covered were broadly:

1) getting data into a form where we can make it flow, as demonstrated by “my hit”, which shows how to screenscrape tabular data from a Wikipedia page using Google spreadsheets, republish it as CSV (eventually!), pull it into a Yahoo pipe and geocode it, then publish it as a KML feed that can be rendered in a Google map and embedded in an arbitrary web page.

2) getting started with Gephi as a tool for visualising and interactively having a conversation with a network represented data set.

To support post hoc activities, I had a play with a Delicious stack as a way of aggregating a set of tutorial like blog posts I had laying around that were related to each of the activities:

Delicious stack

I’d been quite dismissive of Delicious stacks when they first launched (see, for example, Rediscovering playlists), but I’m starting to see how they might actually be quite handy as a way of bootstrapping my way into a set of uncourses and/or ebooks around particular apps and technologies. There’s nothing particularly new about being able to build ordered sets of resources, of course, but the interesting thing for me is that even if I don’t get as far as editing a set of posts into a coherent mini-guide, a well ordered stack may itself provide a useful guide to a particular application, tool, set of techniques or topic.

As to why a literal repackaging of blog posts around a particular tool or technology as an ebook may not be such a good idea in and of itself, see Martin Belam’s post describing his experiences editing a couple of Guardian Shorts*: “Who’s Who: The Resurrection of the Doctor”: Doctor Who ebook confidential and Editing the Guardian’s Facebook ebook

* One of the things I’ve been tracking lately is engagement by the news media in alternative ways of trying to sell their content. A good example of this is the Guardian, who have been repackaging edited collections of (medium and long form) articles on a particular theme as “Guardian Shorts“. So for example, there are e-book article collection wrappers around the breaking of the phone hacking story, or investigating last year’s UK riots. If you want a quick guide to jazz or an overview of the Guardian datastore approach to data journalism, they have those too. (Did I get enough affiliate links in there, do you think?!;-)

This rethinking of how to aggregate, reorder and repackage content into saleable items is something that may benefit content producing universities. This is particularly true in the case of the OU, of course, where we have been producing content for years, and recently making it publicly available through a variety of channels, such as OpenLearn, or, err, the other OpenLearn, via iTunesU, or YouTube, OU/BBC co-productions and so on. It’s also interesting to note how the OU is also providing content (under some sort of commercial agreement…?) to other publishers/publications, such as the New Scientist:

OU youtube ads being in New Scientist context

There are other opportunities too, of course, such as Martin Weller’s suggestion that it’s time for the rebirth of the university press, or, from another of Martin’s posts, the creation of “special issue open access journal collections” (Launching Meta EdTech Journal), as well as things like The University Expert Press Room which provides a channel for thematic content around a news area and which complements very well, in legacy terms, the sort of model being pursued via Guardian Shorts?

TV Critic/Reviewer, or TV Scheduler?

Having a tinker with a couple of Yahoo Pipes that pull down RDF and XML versions of BBC series and programme pages for programmes that the OU had a hand in co-producing, an earlier post on Some Thoughts on My Changing TV Habits came to mind, and in particular the thought that:

a lot of TV related PR activity (which we go in for at the OU because of our co-pro arrangement with the BBC) is aimed at getting previews of programmes into the press. But from my own viewing habits, a large part of my viewing (particularly over iPlayer content) is guided by post hoc reviews appearing in the weekend press of programmes broadcast over the previous seven days, as well as “last week’s” Radio Times, and (occasionally) social media comments from people I follow relating to programmes they have recently watched themselves. From a PR point of view, there may be an increasing benefit in considering “after-TX” PR opportunities to exploit the fact that content remains viewable over a 7 or 28 day period (or even longer for series linked content or content that is rebroadcast on other BBC channels).

In particular, as more and more content is available on a catchup basis, might we see media players who publish television review columns, and particularly those who do so on a weekly basis in the weekend papers, publishing “easy viewing” tools that make it easy to watch in one go the programmes reviewed in that week’s review column?

That is, might we see reviewers reviewing programmes from throughout the previous week becoming de facto schedulers of content for readers to watch one evening later in the next week? Read the review, then watch for yourself..

(Thinking back, maybe that thought was also influenced by this post on YouTube Leanback Brings Personalized Channels To Your TV, which I remember reading earlier today; in particular the “notion of leaning back and just watching is something that will take some getting used to. That said, YouTube reports that Leanback users are consuming 30 minutes at a time — twice as much as they do using the normal site — so obviously it’s working for some people.”)

PS as to that pipework:

BBC/OU co-pro programmes currently available on iPLayer
clips from OU/BBC co-pro programmes currently available on iPlayer

PPS I also did a cutdown demo pipe, especially for Liam… episodes of Coast currently available on iPlayer

Data Portability Policies in HEIs

It’s all very academics in favour of openness ranting about the walled garden approaches of Facebook and Apple, and the paywall perimeters put up by embattled media organisations such as News International, but how do our own institutional systems fare on the openness and data portability front?

A recent initiative to encourage sites to publish a data portability policy has made available this handy Portability Policy Generator which covers issues such as:
Identity and Authentication: e.g. do people need to create a new identity for the local site, or can they use an existing one?
Working with Things Stored Somewhere Else: e.g. Must people import things into this product, or can the product refer to things stored someplace else?
Watching For Updates: e.g. Can this site watch for updates that people make on other sites?
Broadcasting Changes Made Here: e.g. If person updates something here, is that change stored only locally or can it notify another product?
Access from Other Products: e.g. Can the person allow other sites to use the things they’ve created or updated?
Backing Up: e.g. Can the person download or remotely access a copy of everything they’ve provided to the service?
Public Data: e.g. Can the person download or remotely access information that others have provided to the product?
Closing an Account: e.g. Will the site delete an account and all associated data upon a user’s request?
Where things are: e.g. Do you disclose where personal data is being kept in the real world?

A quick look at the OU’s policies page links to the computing code of conduct, FOI and data protection policies for example, but not a data portability policy… (I’d be surprised in any UK HEIs have one yet?)

So with HEIs increasingly encouraging students to make use of university provided e-portfolios and contribute to online forums, and researchers to contribute personal information to researcher profiles and media directories, as well as depositing their papers into institutional repositories, is it time they started at least publishing a data portability policy, and maybe looking at ways of supporting data liberation?

See also: Time for data.ac.uk? Or a local data.open.ac.uk?

The University Expert Press Room – COP15

Chatting just now to @paulafeery, I learned about something that completely passed me by at the time – the OU COP15 Press Room (as you might expect, the site has disappeared… Internet Archive copy of sorts here).

Built on WordPress (yay!-) using the Studiopress Lifestyle theme, the site provided a single point of access to content and several OU academics with relevant expertise in the area in order to “support” journalists writing around issues raised over the course of the COP15 Climate Talks last year.

The site makes good use of categories to partition content into several areas (each, of course, with its own feed:-) So for example, there are categories for News, Research and Opinion, the latest items from which are also highlighted on the front page:

The site also syndicated a feed from an OU Audioboo site where OU academics were posting audio commentaries on related matters:

I don’t think there was a COP15 channel on the OU Boxee TV channel though, although there was an OU COP15 Youtube playlist:

(It strikes me that it might have been good to put a playlist player in an obvious or obviously linked to place on the COP15 press room front page? I also wonder how we might best guarantee OU exposure from any video material we publish and what sort of form it needs to be in, and under what sort of licensing conditions, in order for news outlets to run with it? e.g. How the Ian Tomlinson G20 video spread The Guardian brand across the media, Video Journalism and Interactive Documentaries and to a lesser extent The OU on iPlayer (Err? Sort of not…).)

Anyway, this thematic press room seems like a great idea to me – though I’d have also liked to see a place for 200-500 word CC (attribution) licensed explanatory posts of the sort that could be used to populate breakout factual explanation boxes (with attribution) in feature articles, for example.

Compared to the traditional press release site (which apparently serves as much as an OU timeline/corporate memory device as anything, something that hadn’t occurred to me before…) this topical press room offers another perspective on the whole “social media press release” thang (e.g. Social Media Releases and the University Press Office).

If you want to look back over the COP15 Press Room, you can find it here: OU COP15 Press Room [on Internet Archive]

PS If I was as diligent as Martin Belam at this sort of critique, I’d have probably have done a comparison of the OU Press Room site and example output as appearing on the Guardian COP15 topic page:

or the BBC COP15 topic page:

in order to see what sorts of content fit there might be going from copy on the OU Press Room to the material that is typically published on news media sites. If the content doesn’t fit, no-one will re-use it, right?

Maybe next time?!;-) (If you know of such a comparative critique, please post a link back to here or add a comment below;-)

[See also: UK Nordic Baltic Summit 2011 discussion site.]

Related: Social Media Releases and the University Press Office

Scheduling Content Round the Edges – Supporting OU/BBC Co-Productions

Following the broadcast of the final episode of The Virtual Revolution, the OU/BBC co-produced history of the web, over the weekend, and the start today of the radio edit on BBC World Service, here are a few thoughts about how we might go about building further attention traps around the programme.

Firstly, additional content via Youtube playlists and a Boxee Channel – how about if we provide additional programming around the edges based on curating 3rd party content (including open educational video resources) as well as OU produced content?

Here’s a quick demo channel I set up, using the DeliTV way of doing things, and a trick I learned from @liamgh (How to build a basic RSS feed application for Boxee):

I opted for splitting up the content by programme:

Whilst the original programme is on iPlayer, we should be able to watch it on Boxee. I also created and bookmarked a Youtube playlist for each episode:

So for example, it’s easy to moderate or curate content that is posted on Youtube via a programme specific playlist.

Here’s the channel definition code:

<name>Virtual Revolution, Enhanced</name>
<description>Watch items related to the BBC/OU Virtual Revolution.</description>
<copyright>Tony Hirst</copyright>

[This needs to be saved as the file descriptor.xml in a folder named bbcRevolution in the location identified in Liam’s post… alternatively, I guess it should be possible to prescribe the content you want to appear in the channel literally, e.g. as a list of “hard coded” links to video packages? Or a safer middle way might be to host a custom defined and moderated RSS feed on the open.ac.uk domain somewhere?]

Anyway, here’s where much of the “programming” of the channel takes place in the DeliTV implementation:

(Note that the Youtube playlist content is curated on the Youtube site using Youtube playlists, partly because there appeared to be a few pipework problems with individual Youtube videos bookmarked to delicious as I was putting the demo together!;-)

Secondly, subtitle based annotations, as demonstrated by Martin Hawksey’s Twitter backchannel as iPlayer subtitles hack. The hack describes how to create an iPlayer subtitle feed (I describe some other ways we might view “timed text” here: Twitter Powered Subtitles for BBC iPlayer Content c/o the MASHe Blog).

With The Virtual Revolution also being broadcast in a radio form on the BBC World Service, it strikes me that it could be interesting to consider how we might use timed text to supplement radio broadcasts as well, with either commentary or links, or as Martin described, using a replay of a backchannel from the original broadcast, maybe using something like a SMILtext player alongside the radio player? (Hmmm, something to try out for the next co-pro of Digital Planet maybe..?;-)

Broadcast Support – Thinking About Virtual Revolution

Watching the OU/BBC co-produced Virtual Revolution programme over the weekend, with Twitter backchannel enabled around the #bbcrevolution hashtag, I started mulling over the support we give to OU/BBC co-produced broadcast material.

Although I went to one of the early planning meetings for the series, where I suggested OU academics participate with elevated rights and credentials on the discussion boards as well as blogging commentary and responses to the production team’s work in progress, I ended up not contributing at all because I took time out for the Arcadia Fellowship; (although I have a scattergun approach to topics I cover, I tend to cover them obsessively – and so didn’t want to risk spending the Arcadia time chasing Virtual Revolution leads!)

Anyway, as I watched the broadcast on Saturday, I started wondering about ‘live annotation’ or enrichment of the material as it was broadcast via the backchannel. Although I hadn’t seen a preview of the programme, I have mulled over quite a few of the topics covered by the programme in previous times, so it was easy enough to drop resources in to the twitter feed. So for example, I tweeted a video link to Hal Varian, Google’s Chief Economist, explaining how Google ad auctions work, a tweet that was picked up by one of the production team who was annotating the programme with tweets in real time:

I’ve also written a few posts about privacy on this blog (e.g. Why Private Browsing Isn’t… and serendipitously earlier that day Just Because You Don’t Give Your Personal Data to Google Doesn’t Mean They Can’t Acquire It) so I shamelessly plugged those as well.

And when mention was made about the AOL release of (anonymised) search data, I dropped in to a post I’d written about that affair at the time, which included links to the original news stories about it (When Your Past Comes to Haunt You). Again, my original tweet got amplified:

It struck me that with an hour or so lead time, I could have written a better summary post about the AOL affair, and also weaved in some discussion about the latest round of deanonymisation fears to do with browser history attacks and social profiling. I still could, of course… and probably should?!;-) That said, within an hour or so of the programme ending, I had popped up a post on Google Economics, but I’d obviously missed the sweet spot of tweeting it at the appropriate point in the programme. (To have got this post to appear on the open2.net blog would have taken a couple of days….)

Just as a further aside – I have no evidence that tweeting links is beneficial; it might be seen as a distraction from the programme, for example. The mechanic I imagine is folk see the tweet, open the link, skim it, and then maybe leave it open in a tab for a more detailed read later, after the programme has finished? It’d be good to know if anyone’s looked at this in more detail…?

Now I know that most people who read this blog know how Twitter works, and I also know that the Twitter audience is probably quite a small one; but social viewing in the form of live online communications are still evolving, and I suspect audience involvement with them reflects an elevated level of engagement compared to the person who’s just passively watching. (It may be that discussion was also going on in various Facebook groups, or isolated instant messaging or chat rooms, via SMS, and so on). And that elevated level of active, participatory engagement is one of the things we try to achieve, and capitalise on, with our web support of broadcast programming.

So how best can we engage that audience further? Or how do we make the most of that audience?

One of the tools I’ve been playing with on and off displays a list of people who have been using a hashtag, and their relationship to you (Personal Twitter Networks in Hashtag Communities). These people have demonstrated a high level of engagement with the programme, and to be blunt about it, may represent weakly qualified leads in a marketing sense?

So as the dog looks up at me, hopeful of a walk, I’m going to ponder these two things: 1) how might we engage in realtime backchannel activity around broadcasts in order to maximise reach into an engaged population; and 2) how might that activity, and post hoc analysis of the engaged community, be used to drive sales of OU warez?

PS here’s another interesting possibility – caption based annotations to iPlayer replays of the programme via Twitter Powered Subtitles for BBC iPlayer Content c/o the MASHe Blog (also check out the comments…)

Using Google Spreadsheets as a Database with the Google Visualisation API Query Language

Wouldn’t it be handy if we could treat all the public spreadsheets uploaded to Google docs as queryable tables in a database? Well, it appears that you can do so, at least at an individual spreadsheet level: Introducing the Google Visualization API.

Over the weekend, I started exploring the Google Visualisation API Query Language, which is reminiscent of SQL (if that means anything to you!). This language provides a way of interrogating a data source such as a public online Google spreadsheet and pulling back the results of the query as JSON, CSV, or an HTML table.

Got that? I’ll say it again: the Google Visualisation API Query Language lets you use a Google spreadsheet like a database (in certain respects, at least).

Google query languages are defined on a spreadsheet in the following way:


Although defined, by default, to return JSON data from a query, wrapped in a pre-defined (and fixed?) callback function (google.visualization.Query.setResponse()), it is also possible to display the results of a query as an HTML table (which is “useful”, as the documentation says, “for debugging”). The trick here is to add another argument to the URL: tqx=out:html, so for example a query would now be defined along the lines of:

Using the Guardian datastore’s MPs expenses spreadsheet 2007-8 as an example, we can write quite a wide variety of queries, which I’ll show below in their ‘HTML preview’ form.

(In a ‘real’ situation, you are more likely to retrieve the data as JSON and then process it as an object. Or, as I will also demonstrate, take the results of the query as CSV output (tqx=out:csv rather then tqx=out:html) and pull it directly into a service such as Many Eyes WIkified.)

The generic URL is of the form: http://spreadsheets.google.com/tq?tqx=out:html&tq=QUERY&key=phNtm3LmDZEObQ2itmSqHIA.

In the examples, I will just show the unencoded select statement, but the link will be the complete, well-formed link.

So here we go:

  • show everything – fetch the whole table: select * (in a lot of computer languages, ‘*’ often refers to ‘everything and anything’);
  • just show some particular columns, but again for everyone: fetch just columns B (surname), C (first name) and I (total additional costs allowance): select B,C,I
  • only show the names of people who have claimed the maximum additional costs allowance (£23,083): fetch just columns B, C and I where the value in column I is 23083: select B,C,I where I=23083 (column I is the additional costs allowance column);
  • How many people did claim the maximum additional costs allowance? Select the people who claimed the maximum amount (23083) and count them: select count(I) where I=23083
  • So which people did not claim the maximum additional costs allowance? Display the people who did not claim total additional allowances of 23083: select B,C,I where I!=23083 (using <> for ‘not equals’ also works); NB here’s a more refined take on that query: select B,C,I where (I!=23083 and I>=0) order by I
  • search for the name, party (column D) and constituency (column E) of people whose first name is Jane or is recorded as John (rather than “Mr John”, or “Rt Hon John”): select B,C,D,E where (C contains ‘Joan’ or C matches ‘John’)
  • only show the people who have claimed less than £100,000 in total allowances : select * where F<100000
  • what is the total amount of expenses claimed? Fetch the summed total of entries in column I (i.e. the total expenses claimed by everyone): select sum(I)
  • So how many MPs are there? Count the number of rows in an arbitrary column: select count(I)
  • Find the average amount claimed by the MPs: select sum(I)/count(I)
  • Find out how much has been claimed by each party (column D): select D,sum(I) where I>=0 group by D (Setting I>0 just ensures there is something in the column)
  • For each party, find out how much (on average) each party member claims: select D,sum(I)/count(I) where I=0 group by D

To create your own queries, just hack around the URIs.

Many Eyes WIkified is no more…One other trick is to grab a CSV output, rather than an HTML output, and pull it into Many Eyes Wikified, and then visualise it within that environment – so we grab the data (in this case, using select D,sum(I) where I>=0 group byD, i.e. the total amount of additional costs allowance claims by party):

to give this:

and then visualise it in an appropriate way:

So to recap this final case, then, we are running a query on the original spreadsheet that calculates the total additional costs allowance claims per party, and emits the results as CSV. These results are imported into Many Eyes Wikified, and displayed therein.

Now I’m pretty sure that Many Eyes Wikified will continue (how often?) to synch data from a potentially changing data source, which means we should be able to use a similar approach to plot a running total of claims from the Shadow Cabinet Expenses spreadsheet

…but, at the time of writing at least, it seems as if the publication/privacy settings on that spreadsheet are set such that access via th query language is denied…:-(

Anyway – that was a quick intro to the Google Visualisation API Query Language – so go play… ;-)

PS so what other spreadsheets might make for some interesting queries?

PPS @adrianshort has made a valuable point about how easy it is for a publisher to change the order of rows in a spreadsheet, and hence make a nonsense of your query. (Also, I think the approach I’m taking sort of assumes a simple, regular spreadsheet where row 1 is for headers, then the data, and ideally no other text e.g. in cells below the table describing the data in the table.) So always check… ;-)

PPPS If the first row in the table defines column headings, then there are intervening lines (maybe spaces) before the data starts, putting offset N (where N is a number) will skip that many rows before displaying the data.

Something else I noticed on the order by setting, this can be of the form order by COL asc (to sort in ascending order, which is the default) or order by COL desc ( to sort in descending order).