Idle Thoughts on “Data Literacy” in the Library…

In part for a possible OU Library workshop, in part trying to mull over possible ideas for an upcoming ILI2015 workshop with Brian Kelly, I’ve been pondering what sorts of “data literacy” skills are in-scope for a typical academic library.

As a starting point, I wonder if this slicing is useful, based on the ideas of data management, discovery, reporting and sensemaking.

OU_Library_data_questions_pptx

It identifies four different, though interconnected, sorts of activity, or concern:

  • Data curation questions – research focus – covering the management and dissemination of research data, as well as dissemination issues. This is mainly about policy, but begs the question about who to go to for the technical “data engineering” issues, and assumes that the researcher can do the data analysis/data science bits.
  • Data resourcing – teaching focus – finding and perhaps helping identify processes to preserve data for use in teaching context.
  • Data reporting – internal process focus – capturing, making sense of/analysing, and communicating data relating to library related resources or activities; to what extent should each librarian be able to use and invoke data as evidence relating to day job activities. Could include giving data to course teams about resource utilisation, research teams to demonstrate impact ito tracking downloads and use of OU published resources.
  • Date sensemaking – info skills focus – PROMPT in a data context, but also begging the question about who to go to for “data computing” applications or skills support (cf academic/scientific computing support, application training); also relates to ‘visual literacy’ in sense of interpreting data visualisations, methods for engaging in data storytelling and academic communication.

Poking in to each of those areas a little further, here’s what comes to mind at first thought…

Data Curation

The library is often the nexus of activity around archiving and publishing research papers as part of an open access archive (in the OU, this is via ORO: Open Research Online). Increasingly, funders (and publishers) require that researchers make data available too, often under an open data license. Into this box I’m thinking of those activities related to supporting the organisation, management, archiving, and publication of data related to research. It probably makes sense to frame this in the context of a formal lifecycle of a research project and either the various touchpoints that the lifecycle might have with the library, or those areas of the lifecycle where particular data issues arise. I’m sure such things exists, but what follows is an off-the-of-my-head informal take on it…!

Initial questions might relate to putting together (and costing) a research data management plan (planning/bidding, data quality policies, metadata plans etc). There might also be requests for advice about sharing data across research partners (which might extend privacy or data protection issues over and above any immediate local ones). In many cases, there may be concerns about linking to other datasets (for example, in terms of licensing or permissions, or relating to linked or derived data use; mapping is often a big concern here), or other, more mundane, operational issues (how do I share large datafiles that are too big to email?). Increasingly, there are likely to be publication/dissemination issues (how/where/in what format do I publish my data so it can be reused, how should I license it?) and legacy data management issues (how/where can I archive my data? what file formats should I use?). A researcher might also need support in thinking through consequences – or requirements – of managing data in a particular way. For example, particular dissemination or archiving requirements might inform the choice of data management solution from the start: if you use an Access database, or directory full of spreadsheets, during the project with one set of indexing, search or analysis requirements, you might find a certain amount of re-engineering work needs to be done in the dissemination phase if there is a requirement that the data is published at record level on a public webpage with different search or organisational requirements.

What is probably out of scope for the library in general terms, although it may be in scope for more specialised support units working out of the library, is providing support in actual technology decisions (as opposed to raising technology specification concerns…) or operations: choice of DBMS, for example, or database schema design. That said, who does provide this support, or whom should the library suggest might be able to provide such support services?

(Note that these practical, technical issues are totally in scope for the forthcoming OU course TM351 – Data management and analysis…;-)

Data resourcing

For the reference librarian, requests are likely to come in from teaching staff, students, or researchers about where to locate or access different sources of data for a particular task. For teaching staff, this might include identifying datasets that can be used in the context of a particular course, possibly over several years. This might require continuity of access via a persistent URL to different sorts of dataset: a fixed (historical) dataset, for example, or a current, “live” dataset, reporting the most recent figures month on month or year on year. Note that there may be some overlap with data management issues, for example, ensuring that data is both persistent and provided in a format that will remain appropriate for student use over several years.

Researchers too might have third party data discovery or access requests, particularly with respect to accessing commercial or privately licensed data. Again, there may be overlaps with data management concerns, such as how to managing secondary data/third party data appropriately so it doesn’t taint the future licensing or distribution of first party or derived data, for example.

Students, like researchers, might have very specific data access requests – either for particular datasets, or for specific facts – or require more general support, such as advice in citing or referencing sources of secondary data they have accessed or used.

Data reporting

In the data reporting bin, I’m thinking of various data reporting tasks the library might be asked to perform by teaching staff or researchers, as well data stuff that has to be done as internally within the library, by librarians, for themselves. That is, tasks within the library that require librarians to employ their own data handling skills.

So for example, a course team might want to know what library managed resources referenced from course material are being when and by how many students. Or learning analytics projects may request access to data to help build learner retention models.

A research team might be interested in number of research paper or data downloads from the local repository, or citation analyses, or other sources of bibliometric data, such as journal metrics or altmetrics, for assessing the impact of a particular project.

And within the library, there may be a need for working with and analysing data to support the daily operations of the library – staffing requirements on the helpdesk based on an analysis of how and when students call on it, perhaps – or to feed into future planning. Looking at journal productivity, for example, (how often journals are accessed, or cited, within the institution) when it comes to renewal (or subscription checking) time; or at a more technical level, building recommendation systems on top of library usage data. Monitoring the performance of particular areas of the library website through website analytic, or even linking out to other datasets and looking at the impact of library resource utilisation by individual students on their performance.

Date sensemaking

In this category, I’m lumping together a range of practical tools and skills to complement to the tools and skills that a library might nurture through information skills training activities (something that’s also in scope for TM351…). So for example, one are might be providing advice about how to visualise data as part of a communication or reporting activity, both in terms of general data literacy (use a bar chart, not a pie chart for this sort of data; switch the misleading colours off; sort the data to better communicate this rather than that, etc) as well as tool recommendations (try using this app to generate these sorts of charts, or this webservice to plot that sort of map). Another might be how to read, interpret, or critique a data visualisation (looking at crappy visualisations can help here!;-), or rate the quality of a dataset in much the same way you might rate the quality of an article.

At a more specialist level, there may be a need to service requests about what tools to use to work with a particular dataset, for example, a digital humanities researcher looking for advice on a text mining project?

I’m also not sure how far along the scale of search skills library support needs to go, or whether different levels of (specialist?) support need to be provided for undergrads, postgrads and researchers? Certainly, if your data is in a tabular format, even just as a Google spreadsheet, you become much more powerful as a user if you can frame complex data queries (pivot tables, any one?) or start customising SQL queries. Being able to merge datasets, filter them (by row, or by column), or facet them, cluster them or fuzzy join them are really powerful dataskills to have – and that can conveniently be developed within a single application such as OpenRefine!;-)

Note that there is likely to be some cross-over here also between the resource discovery role described above and helping folk develop their own data discovery and criticism skills. And there may also be requirements for folk in the library to work on their own data sensemaking skills in order to do the data reporting stuff…

Summary

So, is that a useful way of carving up the world of data, as the library might see it?

The four different perspectives on data related activities within the library described above cover not only data related support services offered by the library to other units, but also suggest a need for data related skills within the library to service its own operations.

What I guess I need to do is flesh out each of the topics with particular questions that exemplify the sort of question that might be asked in each context by different sorts of patron (researcher, educator, learner). If you have any suggestions/examples, please feel free to chip them in to the comments below…;-)

4 comments

  1. libwebrarian

    Hi Tony
    That’s a good breakdown of the different facets. It gets away from viewing the roles from the point of view of services to different caregories of user, e.g. What data services might a library offer to researchers. So it’s really useful to think of it from the roles. It is also good to think in terms of a curation lifecycle, there’s a lot a data creation going on without a lot of thought about what happens to it in the long term

    I wonder how far librarians need to understand the tools to make use of data, should they be providing advice? There might be a parallel with supporting or not supporting reference management tools.

    Richard

    • Tony Hirst

      @richard
      The tools thing is a tricky one, I think…

      Is it the place of the librarian to be able to recommend tools or know how to use them as a power user, or is it just enough that they know what sorts of tools there are and what sorts of things you can do with them, or what sort of tool might be appropriate for a particular task?

      So for example – should I use Autocad or SolidWorks or Google SketchUp? Or is it enough to know that they are all 3D tools, know what sorts of drawing processes you can do with them, know what file formats they support and which are best to use, and perhaps have an idea about how they rank in the sense of being professional tools?

      In a teaching context, it may be that there are ‘recommended apps’ or the same app used across several courses in several faculties, and in that case it might make sense to have some level of training ability in that tool or application?

      Should a librarian know how to code, or just know that coding may be appropriate for particular sorts of thing? Should a librarian know that it’s possible to run a wide range of open source apps via a browser in an “easy-installation way” or using a cloud host ( eg https://blog.ouseful.info/2015/06/24/running-rstudio-on-digital-ocean-aws-etc-using-tutum-and-docker-containers/ ) even if they don’t know how to use the apps that can be run in that way?

      I’m not sure if this works as a corollary, but the legal librarian or medical librarian would be able to handle a certain class of questions but hold back from offering actual legal or medical advice (for example, “Handling Legal Questions at the Reference Desk and Beyond” http://southernlibrarianship.icaap.org/content/v06n03/barnes_n01.htm ).

  2. libwebrarian

    For a librarian it is generally information rather than advice or guidance. But we stray into that area with info and digital literacy skills stuff. and you could argue that we are recommending a resource to a course team, academic or student, based on a librarians view of it’s suitability

    With tools there might be a similarity with librarians having to learn about cdroms or Dialog search strategies if you want to go back further. So you need to be able to use a tool to look at the dataset to evaluate whether it is suitable for the purpose. How much do you need to be able to coneptualise what an academic or student might need to do with the data?, mash it up, visualise it, how far do you go to assess suitability?

    In an academic library I think there is a difference in that unlke a public library setting (and maybe that no longer exists now) you have intermediaries, the teacher or the learning technologist who will assess the pedagogic suitability or technology needed

    • Tony Hirst

      @richard
      When it comes to tool recommendation, part of what the adviser needs to ascertain is what the person asking needs to do, but also their skill level and the context of what they’re doing. With the info literacy stuff, I’d argue that you are providing training?

      If we think about research student training, then one of the bits of training that the library might traditionally provide could be bibliography development and citation based research, which might include the recommendation of a particular bibliographic tool as well as practical training in how to use it.

      I suspect that research student training around reproducible research and open notebook practice is perhaps not currently in scope for the library – but why not? It’s good digital scholarly practice (eg as argued here: http://rstudio-pubs-static.s3.amazonaws.com/14247_4703e332c133404b9765f61082dd54cc.html#/ -ish/via @electricarchaeo). It can help folk think through a workflow that supports reproducibility and replicability and help them manage their information resources more effectively. Embedded in that training might be recommendations of examples of particular tools or strategies to operationalise the practice but the idea has value in itself as a framework for guiding how the researchers might more efficiently use their own preferred tools.

      When it comes to data, how far does the librarian’s skill or knowledge need to extend. If I want to examine road traffic accident data in Milton Keynes, what would you advise..? Yes, you!:-) (If the dataset you find has UK coverage, comes in as a 1GB file, and no example rows, how would you – yes, you again;-) – open it to preview some of the rows to check they contain the information required, particularly if there isn’t a separate metadata file describing the large file contents. How would you recommend I get just the rows I want (for MK) from the national file? Is that in scope? That’s essentially a query on a dataset. But it splits several ways: 1) given a flat file, for example, how do I query it to retrieve several thousand lines of data; 2) once I do figure out how to query it, how do I phrase the query to just return the data for MK? Suppose I now want to map that data – what would you suggest, both in terms of “what sort of map should I use?” (a simple marker map? or would a choropleth map make sense?) and what tool could I use to help me actually do that? Or suppose I want to try to recreate something like Snow’s cholera map and do the Voronoi diagram thing- what tool do you suggest for that? In scope, or not? And if not, who would you recommend I do talk to?