Archive for March 2009
Twitter Powered Subtitles for Conference Audio/Videos on Youtube
Last week, I wrote a post on hHow people Tweeted Through Carter on “Delivering Digital Britain” at NESTA, where I created a slideshow of tweets posted in response to the NESTA Delivering Britain event/video stream and used them to illustrate the audio recording of Lord Carter’s presentation.
Chatting to @liamgh last week, I mentioned how i was stumped for an easy way to do this. He suggested creating a subtitles feed, and then uploading it to Youtube, along with the audio recording (doh!).
So – here’s a proof of concept result of doing just that:
(Actually, I’m not sure that subtitles/closed captions work in the embedded movies, so you may have to click through: proof of concept use of Tweets as subtitles: Carter at ‘Delivering Digital Britain’, part 1. NB it is possible force embedded videos to show captions, as
Learn More: Showing captions by default describes. Simply add &cc_load_policy=1 to the embed code; so e.g. in WordPress, you would use something like [youtube=http://www.youtube.com/watch?v=tBmFzF8szpo&cc_load_policy=1 ].)
And here’s how I did it…
The first thing to do was check out how Youtube handled subtitles: Getting Started: Adding / Editing captions.
The trick is to upload a textfile with lines that look something like this:
0:03:14.159
Text shown at 3 min 14.159 sec for an undefined length of time.
0:02:20.250,0:02:23.8
Text shown at 2 min 20.25 sec, until 2 min 23.8 sec
Secondly – getting the list of tweets hashtagged with #carter over the period Lord Carter was speaking (i.e. the period covered by the video). For the original proof of concept, I used the tweets from the spreadsheet of scraped tweets that @benosteen grabbed for me, though it later occurred to me I could get the tweets direct from a Twitter search feed (as I’ll show in a minute).
The question now was how to get the timecode required for the subtitles file from the timestamp associated with each tweet. Note here that the timecode is the elapsed time from the start of the video. The solution I came up with was to convert the timestamps to universal time (i.e. seconds since midnight on January 1st 1970) and then find the universal time equivalent of the first tweet subtitle; subtracting this time from the universal time of all the other tweets would give the number of seconds elapsed from the first tweet, which I could convert to the timecode format.
As to how I got the universal time values – I used a Yahoo pipe (Twitter CSV to timeingsd):
At this point, it’s probably worth pointing out that I didn’t actually need to call on @benosteen’s tweetscraper – I could just use the Twitter search API (i.e. the Twitter advanced search feed output) to grab the tweets. How so? Like this:
Looking at the results of this query, we see the timing is a little off – we actually need results from 8.30am, the actual time of the event:
Which is where this comes into play – searching for “older” results:
If you click on “Older” you’ll notice a new argument is introduced into the search results page URL – &page=:
…which means that by selecting appropriate values for rpp= and page= we can tunnel in on the results covering from a particular time by looking at “older” results pages, and grabbing the URL for the page of results covering the time period we want:
Then we can grab the feed:
which gives us something like this:
http://search.twitter.com/search.atom?q=+%23carter+since%3A2009-02-24+until%3A2009-02-24
which we hack a little to get to the right period:
http://search.twitter.com/search.atom?q=+%23carter+since%3A2009-02-24+until%3A2009-02-24&page=4&rpp=100
(For more info on the Twitter search API, see the Twitter Search API Documentation wiki.)
NB while I’m at it, note that there’s a corollary hack here that might
come in useful somewhere, or somewhen, else – getting a Twitter search feed into a Google spreadsheet (so we can, for example,process it as a CSV file published from the spreadsheet):
=ImportFeed("http://search.twitter.com/search.atom?rpp=100&page=4&q=+%23carter+since%3A2009-02-24+until%3A2009-02-24")
That is:
Okay – back to the main thread – and a tweak to the pipe to let us ingest the feed, rather than the spreadsheet CSV:
Just by the by, we can add a search front end to the pipe if we want:
and construct the Twitter search API URI accordingly:
(The date formatter converts the search date to the format required by the Twitter search API; it was constructed according to PHP: strftime principles.)
Ok – so let’s recap where we’re at – we’ve now got a pipe that will give us universal timecoded tweets (that’s not so far for such a long post to here, is it?!) If we take the JSON feed from the pipe into an HMTL page, we can write a little handler that will produce the subtitle file from it:
Here’s the code to grab the pipe’s JSON output into an HTML file:
var pipeUrl="http://pipes.yahoo.com/pipes/pipe.run?_id=Dq_DpygL3hGV7mFEAVYZ7A&aqs";
function ousefulLoadPipe(url){
var d=document;
var s=d.createElement('script');
s.type='text/javascript';
var pipeJSON=url+"&_render=json&_callback=parseJSON";
s.src=pipeJSON;
d.body.appendChild(s);
}
ousefulLoadPipe(pipeUrl);
Here’s the JSON handler:
function parseJSON(json_data){
var caption; var timestamp=0; var mintime=json_data.value.items[0]['datebuilder'].utime;
for (var i=0; itimestamp) mintime=timestamp;
}
for (var j=json_data.value.items.length-1; j>=0; j--) {
caption=json_data.value.items[j]['title'];
timestamp=1*json_data.value.items[j]['datebuilder'].utime;
if (j>0) timeEnd=(1*json_data.value.items[j-1]['datebuilder'].utime)-3; else timeEnd=10+1*json_data.value.items[j]['datebuilder'].utime;
if (timeEnd<timestamp) timeEnd=timestamp+2;
timecode=getTimeCode(timestamp-mintime);
timeEnd=getTimeCode(timeEnd-mintime);
var subtitle=timecode+","+timeEnd+"
"+caption+"<br/><br/>";
document.write(subtitle);
}
}
Here’s the timecode formatter:
//String formatter from: http://forums.devshed.com/javascript-development-115/convert-seconds-to-minutes-seconds-386816.html
String.prototype.pad = function(l, s){
return (l -= this.length) > 0 ? (s = new Array(Math.ceil(l / s.length) + 1).join(s)).substr(0, s.length) + this + s.substr(0, l - s.length) : this; };
function getTimeCode(seconds){
var timecode="00:"+(Math.floor(seconds / 60)).toFixed().pad(2, "0") + ":" + (seconds % 60).toFixed().pad(2, "0")+".0";
return timecode;
}
(I generated the timecodes in part using a string formatter from http://forums.devshed.com/javascript-development-115/convert-seconds-to-minutes-seconds-386816.html.)
Copy and paste the output into a text file and save it with the .sub suffix, to give a file which can then be uploaded to Youtube.
So that’s the subtitle file – how about getting the audio into Youtube? I’d already grabbed an audio recording of Carter’s presentation using Audacity (wiring the “headphones out” to the “microphone in” on my laptop and playing the recording from the NESTA site), so I just clipped the first 10 minutes (I think Youtube limits videos to 10 mins?) and saved the file as a wav file, then imported it into iMovie (thinking I might want to add some images, e.g. from photos of the event on flickr). This crib – iMovie Settings for Upload to YouTube – gave me the settings I needed to export the audio/video from my old copy of iMovie to a file format I could upload to Youtube (I think more recent versions of iMovie support a “Share to Youtube” option?).
I then uploaded this file, along with the subtitles file:
So there we have it: Twitter subtitle/annotations (pulled from a Twitter search feed) to the first part of Lord Carter’s presentation at Delivering Digital Britain…
PS Also on the Twitter front, O’Reilly have started watching Twitter for links to interesting stories, or into particular debates: Twitscan: The Debate over “Open Core”.
Chatting to @cheslincoln the other night, we got into a discussion about whether or not Twitter could be used to support a meaningful discussion or conversation, given the immediacy/short lived nature of tweets and the limited character count. I argued that by linking out to posts to support claims in tweets, “hyper-discussions” were possible. By mining “attention trends” (a term I got from misreading a tweet of Paul Walk’s that scaffold a conversation, it’s possible to create a summary post of a conversation, or argument, like the O’Reilly one?
See also this post from Paul Walk: Anything you quote from Twitter is always out of context.
HEFCE Grant Funding, in Pictures
Somehow earlier today I managed to pop open a tab in browser pointing to the HEFCE funding allocation spreadsheets for 2009/2010 (maybe from twitter? It’s been one of those days where losing track has been the norm!): HEFCE Core funding/operation (Allocation of funds, Recurrent grants for 2009-10).
So I thought, like you do, how much nivcer it would have been if they’d published the data in visualisation environment… So here’s the HEI data, republished in some Many Eyes Wikified pages:
And here are some sample interactive visualisations you can use to explore the data (click through to get to the actual interactive demo):
There’s a full list of demo thumbnails available on this Wikified page: HEFCE Viz Test.
Feel free to create your own pages/discussion around the charts (it is a wiki, after all).
In order to pull the data in to your own wiki page, use the following “data include” commands in your wiki page (one for each visualisation; the visualisation page name musn’t contain any spaces (I think??)):
- Table 1: {{ousefulTestboard/HEFCE_Funding1test:MyVizPage1}} – creates a placeholder for a new visualisation page titled “MyVizPage1″, using data from Table 1, which lives at http://manyeyes.alphaworks.ibm.com/wikified/ousefulTestboard/HEFCE_Funding1test;
- {{ousefulTestboard/HEFCEData2:MyVizPage2}}
- {{ousefulTestboard/HEFCEDATA3:MyVizPage3}}
You’ll notice I was a little careless in naming the three data pages, which consequently have inconsistent URIs.
Enjoy! … and don’t forget, you can create your own wiki pages using the data, and add text/discussion into them too.
Tweetmash – How people Tweeted Through Carter on “Delivering Digital Britain” at NESTA
Last week, NESTA hostedDelivering Digital Britain, a panel presentation on the Digital Britain interim report. The event was live streamed and audio and video replays are available at http://www.nesta.org.uk/delivering-digital-britain.
Just after the event, I persuaded @benosteen to reuse the Twitter scraping tools he used to deconstruct dev8d (Tracking conferences (at Dev8D) with python, twitter and tags) and turn them on the #carter hashtag that was being used to track the event. The results can be found here: tweets through Delivering Digital Britain.
Here’s the obligatory word cloud, courtesy of Wordle:
As yet another displacement activity, I took time last night out to do a first pass attempt at adding a Twitter commentary to Lord Carter’s presentation at the event – you can see it here (First Draft: Carter – Delivering Digital Brtitain at Nesta, tweetmash):
The tweets were pasted into Keynote, (which I forgot produces an ppt output that Slidenote despises), and the audio was grabbed using Audacity and a little piece of wire connecting the phones out to the microphone in on my laptop, as I played the Carter piece from the NESTA website…
I got the lead in and lead out timings wrong on the audio – there are quite a few slides dangling (and unseeable?) at the end of the presentation, but with 2am approaching as I finished off last night wasn’t minded to start over…
…although I did lay awake for a while wondering whether SMIL might be a way forward, and whether I could use the timestamp of each tweet (which is recorded in the spreadsheet) to synch the tweets against the audio, maybe even reusing a little pipework from Tweetshow/Twittershow along the way…?
Anyway, that’s for another day. What the above does show is that maybe there’s a way for people twittering to actually create post hoc “slides” for presentations that don’t have any (which may or may not be a good thing)? Twitter powered subtitles, maybe?!;-)
PS see also How to Present While People are Twittering , which got me thinking: just because the event is over doesn’t mean the legacy of the backchannel has to come to an end?
PPS here’s another in a similar vein to the previous PS – If You Are Doing An Event, Bring Twitter Into The Room.
The Fake Digital Britain Report
Jumping on the “Fake” bandwagon, we’ve decided to do a little experiment over on WriteToReply, by providing t’community who complained bitterly about the Digital Britain Interim report an opportunity to come up with something better…
And so, I’d like to announce the The Fake Digital Britain Report wiki.
So if you think that we need 2Gbps rather than 2Mbps broadband access, then argue your case on the wiki pages…
The initial section headings are taken form the original WTR republication of the report (“Digital Britain Interim Report” on WriteToReply although of course, they are subject to change… (A lot of people were complaining that the UK games industry was not well represented in the interim report, so now they have an opportunity to add in the missing section…;-)
As ever, a feed is available from the fake report in the form of a changes feeds to the wiki: Recent changes to “The Fake Digital Britain Report” feed.
Another thing we’re trying to do with the Fake Digital Britain report is find a way of supporting the wiki activity by pulling in comments made to the report on WriteToReply to the “Fake Digital Britain Report” discussion page:
This is achieved using the MediaWIki Extension:RSS:
The re-use of the original section headings in the wiki page means that there’s also a sensible mapping to the comments in the discussion page, which are pulled in at the section level from WTR.
PS We’re also going to have a look at the WIki Article Feeds Extension to see if we can do anything interesting with that… In the meantime, we’ve already got a demonstration of how to pull a mediwiki page into WordPress page here: Guidelines for re-publishers (scraped from the wiki) (uses the Append WIki page plugin (I think?).
Who knew that blikis could be so much fun…?;-)


























