Over the last couple of years, you’ve probably noticed that data has become a Big Thing in commerce (Big Data for business advantage) as well as in the openness/transparency community, with governments and the media joining the party particularly in the context of the latter. But if you’re looking to develop data journalism skills, it’s probably also worth remembering the area of sports journalism, and the wealth of data produced around sporting events.
Part of the attraction of developing learning activities around sports data is that there’s a good chance that it’ll keep on delivering… If you develop a way of analysing or displaying sports data that pulls out interesting features or story elements from a set of sports data, you should be able to keep on using it… To set the scene, here’s a example: Driven By Data: Data Journalism in Sports. For a peek at my own fumblings, I’ve started exploring the automatic creation of F1DataJunkie Stats Graphics reports (still a lot to be done, but it’s a start…)
In the extreme case, you might be able to generate story outlines, or even canned prose… For example, in certain computer games in the sports genre, you might find you’re playing a game along to a “live commentary”, generated from the data being produced by the game. Automatic commentary generation is a form of sports journalism. And automated article generation is already here, as @RobbieAllen describes in How I automated my writing career, a brief overview of Automated Insights, a company that specialises in computer generated visualisations and prose.
See also: Automated Storytelling in Sports: A Rich Domain to Be Explored, Automated Event Recognition for Football Commentary Generation, Three RoboCup Simulation League Commentator Systems, and so on…
Getting hold of data is always an issue, of course, but I suspect that many larger newsrooms will take a subscription to the Press Association sports data feeds, for example…
Anyway, as an exercise, here’s some data to start with, from the Guardian datastore: Premier League’s top scorers: who is scoring the most goals? Is there a correlation with age, perhaps? (Where would you find the age data…?)
As well as sports reporting, I think we’re also likely to see an increase in what Head of Digital at Manchester City FC, Richard Ayers, referes to as datatainment: “where you use data as the primary source of entertainment. You might choose to make the visualisation of raw data entertaining or perhaps use data visualisation as part of the process of entertainment – but there’s definitely a strong editorial control which is focussed on entertaining the audience rather than exposing data.” (Data? Entertainment? You need Datatainment and Defining Data Visualisation, Data Journalism & Data Entertainment).
Devices such as FanVision already blend video and audio streams with data feeds, for example, more and more sports have “live stats apps” associated with them, and it’s not hard to imagine the data crunching that goes on under the hood in things like Optiplay making an appearance on sports analysis and review sites?
I also think that the “data as entertainment” line might work well as a second screen activity. Things like the F1 Live Timing app already demonstrate this:
On the other hand, there’s an opportunity for data focussed sites that go into deep analysis for the hardcore fan. Again looking at Formula One, the Intelligent F1 blog features a data-powered model developed by a rocket scientist that provides engagment oaround a particular race over an extended period, from predicting Sunday race behaviour based on Friday practice data and previous outings, through analysis of practice and qualifying data, to a detailed series of post-race analyses. (Complement this with technical analyses applied to the cars on the Scarbs F1, and you have the ultimate F1 geeks paradise!;-)
PS This also caught my eye: Gametime [Assistant]: Girls’ Lacrosse Game Data, which steps through the design of a “datatainment” app…
PPS as the Lacrosse app suggests, the data collection thing can also improve engagement with a live event. For example, my own doodlings around a motorsport lapcharting app (Thoughts on a Couple of Possible Lap Charting Apps, initial code experiment)
PPPS Seems like the algorithmic story generation thing is a itself a story that keeps coming round again… So for example, a couple of pieces that both appeared around the same time in April 2012: Can the Computers at Narrative Science Replace Paid Writers? in The Atlantic, and Can an Algorithm Write a Better News Story Than a Human Reporter? in Wired.