Crowd-Sourced Social Media Subtitling of the F1 Video Archive – Or Not…

As a team entry with Martin Hawksey, we put in an entry to the third Tata F1 Connectivity Prize Challenge, which was to catalogue the F1 video archive. We didn’t win (prizewinning entries here), but here’s the gist of our entry… You may recognise it as something we’d bounced ideas around with before…

Social Media Subtitling of F1 Races

Every race weekend, a multitude of F1 fans informally index live race coverage. Along with broadcasters and F1 teams, audiences use Twitter and other social media platforms to generate real-time metadata which could be used to index video footage. The same approach can be used to index the 60,000 hours of footage dating back to 1981.

We propose an approach that harnesses the collection of race commentaries with the promotion of the race being watched, an approach referred to as social media subtitling. Social media style updates collected while a race is being watched are harvested and used to provide commentary-like subtitles for each race. These subtitles can then be used to index and search into each race video.


Annotating Archival Footage

Audiences log in to an authenticated area using an account linked to one or more of their linked social media profiles. They select a video to watch and are presented with a DRM-enabled embedded streaming video player. An associated text editor can be used to create the social media subtitles. On starting to type into the text editor, a timestamp is grabbed from the video for a few seconds before the typing started (so a replay of the commented event can be seen) and associated with the text entry. On posting the subtitle, it is placed into timestamped comment database. Optionally, the comment can be published via a public social media account with an appropriate hashtag and a link to the timestamped part of the video. The link could lead to an authentication page to gain access to the video and the commentary client, or it may lead to a teaser video clip containing a second or two of the commented upon scene.


Examples of the evolution of the original iTitle Twitter subtitler by M. Hawksey, showing: timestamped social media subtitle editor linked to video player; searching into a video using timestamped social media updates; video transcript from social media harvested updates collected in realtime.

The subtitles can be searched and used to act as a timestamped text index describing the video. Subtitles can also be used to generate commentary transcripts.

If a fan watches a replay of a race and comments on it using their own social media account, a video start time tweet could be sent from the player when they start watching the video (“I’ve just started watching the XXXX #F1 race on [LINK]”). This tweet then acts to timestamp their social media updates relative the corresponding video timestamp as well as publicising the video.

Subtitle Quality Control

The quality of subtitles can be controlled in several ways:

  • a stream of subtitles can be played alongside the video and liked (positively scored) or disliked (negatively scored) by a viewer. (There is a further opportunity here for liked comments to be shared to public social media (along with a timestamped link into the video).) This feedback can also be used to generate trust ratings for commenters (someone whose comments are “liked” by a wide variety of people may be seen as providing trusted commentary);
  • text mining / topic modeling of aggregated comments around the same time can be used to identify crowd consensus topic or keywords.

If available, historical race timing data may be used to confirm certain sorts of information. For example, from timing sheets we can get data about pitstops, or the laps on which cars exited a race from an accident or mechanical failure. This information can be matched to the racetime timestamp of a comment; if comment topics match timing data identified events at about the right time, those comments can automatically be rated positively.

Making Use of Public Social Media Updates

For current and future races, logging social media updates around live races provides a way of bootstrapping the comment database. (Timestamps would be taken as realtime updates, although offsetting mechanisms to account for several second delays in digital TV feeds, for example, would need to be accounted for.) Feeds from known F1 journalists, race teams etc. would be taken as trusted feeds. Harvesting hashtagged feeds from the wider F1 audience would allow the collection of race comment social media updates more widely.


Social media updates can also be harvested in real time around live races or replayed races if we know the video start time.

For recent historical races, archived social media updates, as for example collected by Datasift, could be purchased and used to bootstrap the social media subtitle database.

Race Club

Social media subtitling provides a great opportunity for social activity. Groups of individuals can choose to watch a race at the same time, commenting to each other either through the bespoke subtitler client or by using public social media updates and an appropriate hashtag. If a user logs in to the video playback area, timestamps of concurrent updates from their linked public social media accounts can be reconciled with timestamps associated with the streamed video they are watching in the authenticated race video area.

In the off-season, or in the days leading up to a particular race “Historic Race Weekend” videos could be shown, perhaps according to a streamed broadcast model. That is, a race is streamed from within the authenticated area at a particular set time. Fans watch this scheduled event (under authentication) but comment on it in public using social media. These updates are harvested and the timestamps reconciled with the streamed video.


Social media subtitling draws on the idea that social media updates can be used to provide race commentary. Live social media comments collected around live events can be used to bootstrap a social media commentary database. Replayed streamed events can be annotated by associating social media update timestamps with known start/stop times of video replays. A custom client tied to a video player can be used to enter commentary directly to the database as well as issuing it as a social media update.

Team entry: Tony Hirst and Martin Hawksey

PS Rather than referring to social media subtitles and social media subtitling, I think social media captions and social media captioning is more generic?

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

2 thoughts on “Crowd-Sourced Social Media Subtitling of the F1 Video Archive – Or Not…”

Comments are closed.

%d bloggers like this: