So WTF is Data Scholarship?

A tweet just passed me by from @andypowell at today’s Linked Data: The Future of Knowledge Organization on the Web event:

“need to introduce data literacy into education in order to create data literate citizens” closing remarks by nigel shadbolt at #isko

With the OU’s infoskills short course Beyond Google: working with information online in it’s last week of registration for its last presentation, it may be that there’ll be a slot open in the short course programme in a year or two for an OU course on data literacy (and visualisation…?!;-), but in the meantime, to justify some of things I’m getting up to, I suspect I’m going to have to try to persuade folk that there’s some merit in figuring out what sorts of tools make sense in the world of unlimited open data and data scholarship

Now I have to admit that I’m not sure at all sure what data scholarship is, or might be (same with data literacy… sigh…) but here are a few possible starters for ten…

I first came across something close to the phrase whilst at the Repository Fringe, searching for papers relating to referencing data, in a preprint from Peter Murray Rust – Open Data in Science: “Recent initiatives such as the JISC/NSF report on cyberscholarship have emphasized the critical importance of data-driven scholarship.” Digging around the phrase turned up one or two references to data citability as being a key requirement for data(-driven) scholarship, a point also touched on by Kevin Ashley in his closing keynote at the RepoFringe. In particular Kevin referenced Peter Buneman‘s work on that very topic, which in a roundabout way led me to finding a paper by Bruce Barkstorm on Digital libraries and data scholarship, which again looks at some of the issues involved in referencing data. (I’ll do a post or two on data referencing – something I need to improve in my own practice – at some point…)

So for example, the abstract to Barkstrom’s paper begins: “In addition to preserving and retrieving digital information, digital libraries need to allow data scholars to create post-publication references to objects within files and across collections of files” before going on to discuss referencing matters. So implicitly, data scholarship must be something to do with poring through other peoples’ old data…

I’m still not sure I know what a data scholar might actually do though, or why, although it seemingly requires ability to reference data, so I took a sideways step to review what a digital scholar might be… Martin Weller has posted about this previously (e.g. Thoughts on digital scholarship), relating the idea to Boyer’s notions of scholarship (discovery, integration, application, teaching).

A short unit on Connexions (What Is Digital Scholarship?) by the American Council of Learned Societies Commission on Cyberinfrastructure for the Humanities & Social Sciences suggests:

In recent practice, “digital scholarship” has meant several related things:

– Building a digital collection of information for further study and analysis*
– Creating appropriate tools for collection-building
– Creating appropriate tools for the analysis and study of collections
– Using digital collections and analytical tools to generate new intellectual products
– Creating authoring tools for these new intellectual products, either in traditional forms or in digital form

* like this 500 page bibliography on digital scholarship [via @jfj24]. My response: “is the idea that i read those 500 pages of citations and from titles alone form a coherent view about what’s involved?!” Heh heh ;-)

The piece goes on:

It may seem odd to some that creating collections and the tools to use them should be counted as scholarship, but humanities and social science research has always required collections of appropriate information, and throughout history, scholars have often been the ones to assemble those collections, as part of their scholarship. Moreover, scholars have been building tools since the first index, the first concordance, the first scholarly edition. So, while it is reasonable to regard (d) as the core meaning and ultimate objective of “digital scholarship,” it is also important to recognize that in the early digital era, leadership may well consist of collection-building or tool-building. In addition, tool-building is dependent on the existence of collections, and both collections and tools get better and more general as there is more use of digital information. If we hope to see new intellectual products, we should give high priority to building tools and collections. Finally, it is worth noting that although (a), (b), (c), and (e) require a great deal of cooperation, it is still imaginable that (d) can be the work of a single individual.

Remember, I am in part looking for a definition of data scholarship to justify spending time on OUseful things, so maybe here we have something like it…? Because I think I can argue that identifies/discovers useful tools, integrates them within an information processing context that includes other tools and services, applies them to particular “real world” examples, and then teaches on (sort of!) how to do the same (so that’s Boyer’s boxes ticked). In addition, some of the integrations I come up with could be classed as the development of new tools in their own right, and as far as collections go: I’ve always been keen on trying to make “discovered context” tangible, as with the discovered search engines I’ve blogged about recently.

PS quickly skimming the above, it seems to me that scholarship maybe has a couple of facets: firstly, the development and identification of tools and techniques that allow “scholars” to do what they do; secondly, the use of those tools and techniques to make sense and meaning of things produced by others beyond the sense and meaning that they themselves have extracted. Recalling the idea that the most interesting thing that will be done with your data will be done by someone else, maybe that’s what scholars do?

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

7 thoughts on “So WTF is Data Scholarship?”

  1. I think you answered your question well in your final paragraph. As far as I’m aware most uses of the term focus on the second aspect – doing research without minting any new data, just reusing stuff that’s already there. But good toolsets, and good data collections, are key to making that happen.

  2. Interesting topic. I think the 5-point list you quoted from the AC_of_LSC_on_C_forthe_H&SS captures it nicely — especially:

    “- Using digital collections and analytical tools to generate new intellectual products”

    It’s not really a new idea (except for the online aspect). Johannes Kepler was a good example of a “data scholar”. The data were collected by Tycho Brahe, but only Kepler was capable of revealing their secrets.

    1. @lauriej “Lab-Coated Librarian” is new to me… if you find any written ramblings around this, I’d appreciate a link :-)

      @kc lc – ah, Kepler… good one:-)

      @kevin so finding a rich subset of a dataset, or combining two datasets to make a richer one doesn’t count as minting new data (though it would possibly result in minting of new identifiers for subsets/aggregates?)

    2. Weakly, there is:

      “Close the science library building and move to departments

      There is no need for science libraries – they may be nice quiet places to work but there’s nothing special in their design or management. Human librarians should wear white coats and sit next to scientists and becomes authors on their papers. ”


      but it might be what I’ve heard more from Peter in non-public videos or in person. Alas for video search, there’s no way for me to dig through video archives quickly and find this reference…

  3. “… so finding a rich subset of a dataset, or combining two datasets to make a richer one doesn’t count as minting new data ”

    I wasn’t really trying to say that – clearly it can do. It certainly counts as enriching the original data/metadata. There’s a continuum of some sorts from slightly improving something to creating something new.

    My intent was to draw a distinction between a model of research that always begins with experiments or surveys to gather new data, and one that begins by assuming that the data needed exists already. Astronomy is certainly full of historical examples of this; @kclc has mentioned Kepler, but many other Western theorists were dependent on large and accurate data gathered by earlier arab and persian astronomers. More recently, Hubble and similar instruments have transformed much of astronomy. Most astronomical research used to begin with making observations; now most begins with reuse of other people’s observations. It’s much easier to do that now than it used to be.

    Sometimes you don’t make new data, but you derive a new theory or equation from the data. Sometimes you definitely make new data – galaxyzoo being one well-known example which takes raw, unclassified data and turns it into something much richer.

    1. @kevin I need to think about this! Work “that begins by assuming that the data needed exists already” certainly has a nice feel to it, although may need to be qualified with something like “although it may be hidden in one large daaset or spread across several datasets”;-)

Comments are closed.