So WTF is Data Scholarship?

A tweet just passed me by from @andypowell at today’s Linked Data: The Future of Knowledge Organization on the Web event:

“need to introduce data literacy into education in order to create data literate citizens” closing remarks by nigel shadbolt at #isko

With the OU’s infoskills short course Beyond Google: working with information online in it’s last week of registration for its last presentation, it may be that there’ll be a slot open in the short course programme in a year or two for an OU course on data literacy (and visualisation…?!;-), but in the meantime, to justify some of things I’m getting up to, I suspect I’m going to have to try to persuade folk that there’s some merit in figuring out what sorts of tools make sense in the world of unlimited open data and data scholarship

Now I have to admit that I’m not sure at all sure what data scholarship is, or might be (same with data literacy… sigh…) but here are a few possible starters for ten…

I first came across something close to the phrase whilst at the Repository Fringe, searching for papers relating to referencing data, in a preprint from Peter Murray Rust – Open Data in Science: “Recent initiatives such as the JISC/NSF report on cyberscholarship have emphasized the critical importance of data-driven scholarship.” Digging around the phrase turned up one or two references to data citability as being a key requirement for data(-driven) scholarship, a point also touched on by Kevin Ashley in his closing keynote at the RepoFringe. In particular Kevin referenced Peter Buneman‘s work on that very topic, which in a roundabout way led me to finding a paper by Bruce Barkstorm on Digital libraries and data scholarship, which again looks at some of the issues involved in referencing data. (I’ll do a post or two on data referencing – something I need to improve in my own practice – at some point…)

So for example, the abstract to Barkstrom’s paper begins: “In addition to preserving and retrieving digital information, digital libraries need to allow data scholars to create post-publication references to objects within files and across collections of files” before going on to discuss referencing matters. So implicitly, data scholarship must be something to do with poring through other peoples’ old data…

I’m still not sure I know what a data scholar might actually do though, or why, although it seemingly requires ability to reference data, so I took a sideways step to review what a digital scholar might be… Martin Weller has posted about this previously (e.g. Thoughts on digital scholarship), relating the idea to Boyer’s notions of scholarship (discovery, integration, application, teaching).

A short unit on Connexions (What Is Digital Scholarship?) by the American Council of Learned Societies Commission on Cyberinfrastructure for the Humanities & Social Sciences suggests:

In recent practice, “digital scholarship” has meant several related things:

– Building a digital collection of information for further study and analysis*
– Creating appropriate tools for collection-building
– Creating appropriate tools for the analysis and study of collections
– Using digital collections and analytical tools to generate new intellectual products
– Creating authoring tools for these new intellectual products, either in traditional forms or in digital form

* like this 500 page bibliography on digital scholarship [via @jfj24]. My response: “is the idea that i read those 500 pages of citations and from titles alone form a coherent view about what’s involved?!” Heh heh ;-)

The piece goes on:

It may seem odd to some that creating collections and the tools to use them should be counted as scholarship, but humanities and social science research has always required collections of appropriate information, and throughout history, scholars have often been the ones to assemble those collections, as part of their scholarship. Moreover, scholars have been building tools since the first index, the first concordance, the first scholarly edition. So, while it is reasonable to regard (d) as the core meaning and ultimate objective of “digital scholarship,” it is also important to recognize that in the early digital era, leadership may well consist of collection-building or tool-building. In addition, tool-building is dependent on the existence of collections, and both collections and tools get better and more general as there is more use of digital information. If we hope to see new intellectual products, we should give high priority to building tools and collections. Finally, it is worth noting that although (a), (b), (c), and (e) require a great deal of cooperation, it is still imaginable that (d) can be the work of a single individual.

Remember, I am in part looking for a definition of data scholarship to justify spending time on OUseful things, so maybe here we have something like it…? Because I think I can argue that identifies/discovers useful tools, integrates them within an information processing context that includes other tools and services, applies them to particular “real world” examples, and then teaches on (sort of!) how to do the same (so that’s Boyer’s boxes ticked). In addition, some of the integrations I come up with could be classed as the development of new tools in their own right, and as far as collections go: I’ve always been keen on trying to make “discovered context” tangible, as with the discovered search engines I’ve blogged about recently.

PS quickly skimming the above, it seems to me that scholarship maybe has a couple of facets: firstly, the development and identification of tools and techniques that allow “scholars” to do what they do; secondly, the use of those tools and techniques to make sense and meaning of things produced by others beyond the sense and meaning that they themselves have extracted. Recalling the idea that the most interesting thing that will be done with your data will be done by someone else, maybe that’s what scholars do?