Opportunities for Doing the Civic Thing With Open and Public Data

I’ve been following the data thing for a few years now, and it’s been interesting to see how data related roles have evolved over that time.

For my own part, I’m really excited to have got the chance to work with the Parliamentary Digital Service [PDS] [blog] for a few days this coming year. Over the next few weeks, I hope to be starting to nose around Parliament and the Parliamentary libraries getting a feel for the The Life of Data there, as well as getting in touch with users of Parliamentary data more widely (if you are one, or aspire to be one, say hello in the commentsto see if we can start to generate more opportunities for coffee…:-)

I’m also keen to see what the Bureau of Investigative Journalism’s Local Data Lab, headed up by Megan Lucero, starts to get up to. There’s still a chance to apply for starting role there if you’re a “a journalist who uses computational method to find stories, an investigative or local journalist who regularly uses data, a tech or computer science person who is interested in local journalism or a civic tech person keen to get involved”, and the gig looks like it could be a fun one:

  • We will take on datasets that have yet to be broken down to a local level, investigate and reveal stories not yet told and bring this to local journalists.
  • We will be mobile, agile and innovative. The team will travel around the country to hear the ideas and challenges of regional reporters. We will listen, respond and learn so to provide evidence-based solutions.
  • We will participate in all parts of the process. Every member will contribute to story ideas, data wrangling, problem solving, building and storytelling.
  • We will participate in open journalism. We will publish public interest stories that throw light on local and national issues. We will open our data and code and document our shortcomings and learnings. We will push for greater transparency. We will foster collaboration between reporters and put power into regional journalism.

I’m really hoping they start a fellowship model too, so I can find some way of getting involved and maybe try to scale some of the data wrangling I will be doing around Isle of Wight data this year to wider use. (I wonder if they’d be interested in a slackbot/datawire experiment or two?!) After all, having split the data out for one local area, it’s often trivial to change the area code and do the same for another:

(It’ll also be interesting to see how the Local Data Lab might complement things like the BBC Local Reporting Scheme,  or feed leads into the C4CJ led “representative network for community journalism”.)

Data journalism job ads are still appearing, too. A recent call for a Senior Broadcast Journalist (Data), BBC Look North suggests the applicant should be able:

  • To generate ideas for data-driven stories and for how they might be developed and visualized.
  • To explore those ideas using statistical tools – and present them to wider stakeholders from a non-statistical background.
  • To report on and analyse data in a way that contributes to telling compelling stories on an array of news platforms.
  • To collaborate with reporters, editors, designers and developers to bring those stories to publication.
  • To use statistical tools to identify significant data trends.

The ad suggests that required skills include good knowledge of Microsoft Excel, a strong grasp of how to clean, parse and query data as well as database management*, [and] demonstrable experience of visualising data and using visualisation tools such as SPSS, SAS, Tableau, Refine and Fusion Tables.

* I’m intrigued as to what this might mean. As an entry level, I like to think this is getting data into something like SQLite and then running SQL queries over it? It’s also worth remembering that Google Sheets also exposes an SQL like interface that you can query (example, about).

When I started pottering around in the muddy shores of “data journalism” as it became a thing, Google Sheets, Fusion Tables and Open (then Google) Refine were the tools I tried to promote because I saw them as a relatively easy way in to working with data. But particularly with the advent of accessible working environments like RStudio and Jupyter notebooks, I have moved very much more towards the code side. This is perceived as a much harder sell – it requires learning to code – but it’s amazing what you can do with a single line of code, and in many cases someone has already written that line, so all you have to do is copy it; environments like Jupyter notebooks also provide a nicer (simpler) environment for trying out code than scary IDEs (even the acronym is impenetrable;-). As a consequence of spending more time in code, it’s also made me think far more about reproducible and transparent research (indeed, “reproducible data journalism”), as well as the idea of literate programming, where code, text and, particularly in research workflows, code outputs, together form a (linear) narrative that make it easier to see and understand what’s going on…

As well as the data journalism front, I’ve also kept half an eye on how academic libraries have been engaging with data issues, particularly from an “IT skills” development perspective. Generally, they haven’t, although libraries are often tasked with supporting research data management projects, as this job ad posted recently by the University of Michigan (via @lorcanD) for a data workflows specialist shows:

The Research Data Workflows Specialist will advance the library’s mission to create and sustain data services for the c­ampus that support the mission of the University of Michigan researchers through Research Data Services (RDS), a new and growing initiative that will build the next generation of data curation systems. A key focus of this position will be to understand and connect with the various disciplinary workflows on campus in order to inform the development of our technical infrastructure and data services.

I suspect this is very much associated with research data management. It seems to me that there’s still a hole when it comes to helping people put together their own reproducible research toolchains and technology stacks together (as well as working out what sort of toolchain/stack are actually required…).

Finally, I note that NotWestminster is back later next month in Huddersfield (I managed to grab a ticket last night). I have no idea what to expect from the event, but it may generate a few ideas for what I can usefully do with Island data this year…

PS just spotted another job opportunity in a related opendata area: Data Labs and Learning Manager, 360 Giving.