Trying to find useful things to do with emerging technologies in open education and data journalism. Snarky and sweary to anyone who emails to offer me content for the site.
The story so far… a long time ago now, I built a crude proof of concept showing how to annotate Youtube videos with captions extracted from hashtagged Twitter feeds. And now, every time I look at Martin Hawksey’sRSC MASHe blog, he’s pushed the idea on further…
A great attraction of this service is that it allows a viewer to watch the video at any time, and yet drop twitter captions into the video at the appropriate point. (The original demo grabbed captions from a live hashtag stream to add to video recordings of live presentations, and set the zero time to the start time of the event/recording.)
uTitle integrates with Twapperkeeper, a Twitter archiving service that I think has received some amount of support from JISC, so it’ll be interesting to see if the uTitle use case helps drive innovation on that front as well as in video annotation. (So for example, at the moment, uTitle uses a Youtube video ID hashtag, as well as a time stamp, to identify tweets that are captioning a particular video. As Twitter opens up its annotation service, it’ll be interesting to see if the identifier can be pushed down to the annotation layer (maybe replaced by a blanket #utitle hashtag in the main tweet?) and Twapperkeeper support extended to include annotations. (I’d also be keen to see Twapperkeeper supporting the archiving of timestamped friends/followers lists, to allow for visualisations and analysis of the growth of networks over time. This may go against Twitter ToS of course (I haven’t checked…)).
Playing with the service just now, it struck me that if I was “live tweeting” along to a video I was watching, by the time I had written a tweet, the time stamp would have moved on. So by the time I post a tweet, it will appear as a caption maybe 10 or 20 seconds after the point in the video it refers to. A simple trick might be have a setting that would stop the timer in the tweet when someone starts typing a new tweet, so that on playback the tweet appears at the time in the video when the commenter started to write the tweet, rather than when it was finished and posted?
(Of course, it’s also possible to pause the video, and even move the playhead back to set the timestamp as required; but I think the above approach is more elegant?)
Another possibly useful tool might be something like the iPod “30s rewind” button, that just nudges the playhead back a few seconds (this might be useful for example if you’re typing a comment as the video plays, and you miss something you want to listen to again…)
There are probably lots of other “freeze time” options that make sense when capturing “live” comments against a recording, but none spring to mind just at the moment!;-)
PS As to where Martin might push uTitle next, I can’t wait to see…:-) Maybe Google will add the idea to Youtube along with Google Moderator and the new Youtube video editor? Or maybe martin will find some API dangly bits around the Youtube Replay it service that’s just started rolling out as a Google live search feature, and which allows you to “zoom to any point in time and “replay” what people were saying publicly about a topic on Twitter.”
Don’tcha just love it when a complementary posts happen along within a day or two of each other? Earlier this week, Martin posted on the topic of Academic output as collateral damage suggested that “you can view higher education as a long tail content production system. And if you are producing this stuff as a by-product of what you do anyway then a host of new possibilities open up. You can embrace unpredictability”.
And then today, other Martin comes along with a post – Presentation: Twitter for in-class voting and more for ESTICT SIG – linking to a recording of a presentation he gave yesterday, but one that includes twitter backchannel captions from the presentation that were tweeted by the presentation that in turn itself, as well as the (potentially extended/remote) audience.
Brilliant… I love it…I’m pretty much lost for words…
What we have here, then, is the opening salvo in a presentation capture and amplification strategy where the side effects of the presentation create a legacy in several different dimensions – an audio-visual record, for after the fact; a presentation that announces it’s own state to a potentially remote Twitter audience, and that in turn can drive backchannel activity; a recording of the backchannel, overlaid as captions on the video recording; and a search index that provides timecoded results from a search based on the backchannel and the tweets broadcast by the presentation itself. (If nothing else, capturing just the tweets from the presentation provides a way of deep searching in time into the presentation).
The OU’s VC, Martin Bean, gave the opening keynote, and I have to admit it really did make me feel that the OU is the best place for me to be working at the moment :-)
… though maybe after embedding that, my days are numbered…? Err…
Anyway, I feel like I’ve not really been keeping up with other Martin’s efforts, so here’s a quick hack a placemarker/waypoint in one of the directions I think the captioning could go – deep search linking into video streams (where deep linking is possible).
Rather than search the content, we’re going to filter captions for a particular video, in this case the twitter caption file from Martin (other, other Martin?!) Bean’s #JISC10 opening keynote. The pipework is simple – grab the URL of the caption file and a “search” term, parse the captions into a feed with one item per caption, then filter on the caption content. I added a little Regular Expression block just to give a hint as to how you might generate a deeplink into content based around the tart time of the caption:
One thing to note is that it may take some time for someone to tweet what someone has said. If we had a transcript caption file (i.e. a timecoded transcript of the presentation) we might be able to work out the “mean time to tweet” for a particular event/twitterer, in which case we could backdate timestamps to guess the actual point in the video that a person was tweeting about. (I looked at using auto-genearated transcript files from Youtube to trial this, but at the current time, they’re rubbish. That said, voice search on my phone was rubbish a year ago, but by Christmas it was working pretty well, so the Goog’s algorithms learn quickly, especially where error signals are available. So bear in mind that if you do post videos to Youtube, and you can upload a caption file, as well as helping viewers, you’ll also be helping train Google’s auto-transcription service (because it’ll be able to compare the result of auto-transcription with your captions file…. If you’re the Goog, there are machine learning/supervised learning cribs everywhere!))
(Just by the by, I also wonder if we could colour code captions to identify in a different colour tweets that refer to the content of an earlier tweet/backchannel content, rather than the foreground content of the speaker?)
Unfortunately, caption files on Youtube, which does support deep time links into videos, only appear to be available to video owners (Youtube API: Captions), so I can’t do a demo with Youtube content… and I so should be doing other things that I don’t have the time right now to look at what would be required deeplinking elsewhere…:-(
I don’t often do posts where I just link to or re-present content that appears elsewhere on the web, but I’m going to make an exception in this case, with a an extended preview to a link on Martin Hawksey’s MASHe blog…
Anyway, whilst I was watching Virtual Revolution over the weekend (and pondering the question of Broadcast Support – Thinking About Virtual Revolution) I started thinking again about replaying twitter streams alongside BBC iPlayer content, and wondering whether this could form part of a content enrichment strategy for OU/BBC co-productions.
which leads to a how to post on Twitter powered subtitles for BBC iPlayer in which Martin “come[s] up with a way to allow a user to replay a downloaded iPlayer episode subtitling it with the tweets made during the original broadcast.”
This builds on my Twitter powered subtitling pattern to create a captions file for downloaded iPlayer content using the W3C Timed Text Authoring Format. A video on the Martin’s post shows the twitter subtitles overlaying the iPlayer content in action.
AWESOME :-)
This is exactly it’s worth blogging half baked ideas – because sometimes they come back better formed…
So anyway, the next step is to work out how to make full use of this… any ideas?
PS I couldn’t offhand find any iPlayer documentation about captions files, or the content packaging for stuff that gets downloaded to the iPlayer desktop – anyone got a pointer to some?
Chatting to @liamgh last week, I mentioned how i was stumped for an easy way to do this. He suggested creating a subtitles feed, and then uploading it to Youtube, along with the audio recording (doh!).
The trick is to upload a textfile with lines that look something like this: 0:03:14.159
Text shown at 3 min 14.159 sec for an undefined length of time.
0:02:20.250,0:02:23.8
Text shown at 2 min 20.25 sec, until 2 min 23.8 sec
Secondly – getting the list of tweets hashtagged with #carter over the period Lord Carter was speaking (i.e. the period covered by the video). For the original proof of concept, I used the tweets from the spreadsheet of scraped tweets that @benosteen grabbed for me, though it later occurred to me I could get the tweets direct from a Twitter search feed (as I’ll show in a minute).
The question now was how to get the timecode required for the subtitles file from the timestamp associated with each tweet. Note here that the timecode is the elapsed time from the start of the video. The solution I came up with was to convert the timestamps to universal time (i.e. seconds since midnight on January 1st 1970) and then find the universal time equivalent of the first tweet subtitle; subtracting this time from the universal time of all the other tweets would give the number of seconds elapsed from the first tweet, which I could convert to the timecode format.
At this point, it’s probably worth pointing out that I didn’t actually need to call on @benosteen’s tweetscraper – I could just use the Twitter search API (i.e. the Twitter advanced search feed output) to grab the tweets. How so? Like this:
Looking at the results of this query, we see the timing is a little off – we actually need results from 8.30am, the actual time of the event:
Which is where this comes into play – searching for “older” results:
If you click on “Older” you’ll notice a new argument is introduced into the search results page URL – &page=:
…which means that by selecting appropriate values for rpp= and page= we can tunnel in on the results covering from a particular time by looking at “older” results pages, and grabbing the URL for the page of results covering the time period we want:
NB while I’m at it, note that there’s a corollary hack here that might
come in useful somewhere, or somewhen, else – getting a Twitter search feed into a Google spreadsheet (so we can, for example,process it as a CSV file published from the spreadsheet):
Okay – back to the main thread – and a tweak to the pipe to let us ingest the feed, rather than the spreadsheet CSV:
Just by the by, we can add a search front end to the pipe if we want:
and construct the Twitter search API URI accordingly:
(The date formatter converts the search date to the format required by the Twitter search API; it was constructed according to PHP: strftime principles.)
Ok – so let’s recap where we’re at – we’ve now got a pipe that will give us universal timecoded tweets (that’s not so far for such a long post to here, is it?!) If we take the JSON feed from the pipe into an HMTL page, we can write a little handler that will produce the subtitle file from it:
Here’s the code to grab the pipe’s JSON output into an HTML file:
var pipeUrl="http://pipes.yahoo.com/pipes/pipe.run?_id=Dq_DpygL3hGV7mFEAVYZ7A&aqs";
function ousefulLoadPipe(url){
var d=document;
var s=d.createElement('script');
s.type='text/javascript';
var pipeJSON=url+"&_render=json&_callback=parseJSON";
s.src=pipeJSON;
d.body.appendChild(s);
}
ousefulLoadPipe(pipeUrl);
Here’s the JSON handler:
function parseJSON(json_data){
var caption; var timestamp=0; var mintime=json_data.value.items[0]['datebuilder'].utime;
for (var i=0; itimestamp) mintime=timestamp;
}
for (var j=json_data.value.items.length-1; j>=0; j--) {
caption=json_data.value.items[j]['title'];
timestamp=1*json_data.value.items[j]['datebuilder'].utime;
if (j>0) timeEnd=(1*json_data.value.items[j-1]['datebuilder'].utime)-3; else timeEnd=10+1*json_data.value.items[j]['datebuilder'].utime;
if (timeEnd<timestamp) timeEnd=timestamp+2;
timecode=getTimeCode(timestamp-mintime);
timeEnd=getTimeCode(timeEnd-mintime);
var subtitle=timecode+","+timeEnd+" "+caption+"<br/><br/>";
document.write(subtitle);
}
}
Copy and paste the output into a text file and save it with the .sub suffix, to give a file which can then be uploaded to Youtube.
So that’s the subtitle file – how about getting the audio into Youtube? I’d already grabbed an audio recording of Carter’s presentation using Audacity (wiring the “headphones out” to the “microphone in” on my laptop and playing the recording from the NESTA site), so I just clipped the first 10 minutes (I think Youtube limits videos to 10 mins?) and saved the file as a wav file, then imported it into iMovie (thinking I might want to add some images, e.g. from photos of the event on flickr). This crib – iMovie Settings for Upload to YouTube – gave me the settings I needed to export the audio/video from my old copy of iMovie to a file format I could upload to Youtube (I think more recent versions of iMovie support a “Share to Youtube” option?).
I then uploaded this file, along with the subtitles file:
So there we have it: Twitter subtitle/annotations (pulled from a Twitter search feed) to the first part of Lord Carter’s presentation at Delivering Digital Britain…
PS Also on the Twitter front, O’Reilly have started watching Twitter for links to interesting stories, or into particular debates: Twitscan: The Debate over “Open Core”.
Chatting to @cheslincoln the other night, we got into a discussion about whether or not Twitter could be used to support a meaningful discussion or conversation, given the immediacy/short lived nature of tweets and the limited character count. I argued that by linking out to posts to support claims in tweets, “hyper-discussions” were possible. By mining “attention trends” (a term I got from misreading a tweet of Paul Walk’s that scaffold a conversation, it’s possible to create a summary post of a conversation, or argument, like the O’Reilly one?