More Recognition/Identification Service APIs – Microsoft Cognitive Services

A couple of months ago, I posted A Quick Round-Up of Some *-Recognition Service APIs that described several off-the-shelf cloud hosted services from Google and IBM for processing text, audio and images.

Now it seems that Microsoft Cognitive Services (formally Project Oxford, in part) brings Microsoft’s tools to the party with a range of free tier and paid/metered services:

Microsoft_Cognitive_Services

So what’s on offer?

Vision

  • Computer Vision API: extract semantic features from an image, identify famous people (for some definition of “famous” that I can’t fathom), and extract text from images; 5,000 free transactions per month;
    https___www_microsoft_com_cognitive-services_en-us_computer-vision-api
    Microsoft_Cognitive_Services3
    Microsoft_Cognitive_Services5
  • Emotion API: extract emotion features from a photo of a person; photos – 30,000 free transactions per month;
    https___www_microsoft_com_cognitive-services_en-us_computer-vision-api2
  • Face API: extract face specific information from an image (location of facial features in an image); 30,000 free transactions per month;
    https___www_microsoft_com_cognitive-services_en-us_computer-vision-api3
  • Video API: 300 free transactions per month per feature.

Speech

Language

  • Bing Spell Check API: 5,000 free transactions per month
  • Language Understanding Intelligent Service (LUIS): language models for parsing texts; 100,000 free transactions per month;
  • Linguistic Analysis API: NLP sentence parser, I think… (tokenisation, parts of speech tagging, etc.) It’s dog slow and, from the times I got it to sort of work, this seems to be about the limit of what it can cope with (and even then it takes forever):
    Microsoft_Cognitive_Services6
    5,000 free transactions per month, 120 per minute (but you’d be luck to get anything done in a minute…);
  • Text Analytics API: sentiment analysis, topic detection and key phrase detection, language extraction; 5,000 free transactions;
  • Web Language Model API: “wordsplitter” – put in a string of words as a single string with space characters removed, and it’ll try to split the words out; 100,000 free transactions per month.

Knowledge

Search

There’s also a gallery of demo apps built around the APIs.

It’s seems then that we’ve moved into an era of commodity computing at the level of automated identification and metadata services, though many of them are still pretty ropey… The extent to which they will be developed and continue to improve will be the proof of just how useful they will be as utility services.

As far as the free usage caps on the Microsoft services, there seems to be a reasonable amount of freedom built in for folk who might want to try out some of these services in a teaching or research context. (I’m not sure if there are blocks for these services that can be wired in to the experiment flows in the Azure Machine Learning studio?)

I also wonder whether these are just the sorts of service that libraries should be aware of, and perhaps even work with in an informationista context…?!;-)

PS from the face, emotion and vision APIs, and perhaps entity extraction and sentiment analysis applied to any text extracted from images, I wonder if you could generate a range of stories automagically from a set of images. Would that be “art”? Or just #ds106 style playfulness?!

PPS Nov 2016 for photo-tagging, see also Amazon Rekognition.

A New Role for the Library – Gonzo Librarian Informationista

At the OU’s Future of Academic Libraries a couple of weeks ago, Sheila Corrall introduced a term and newly(?!) emerging role I hadn’t heard before coming out of the medical/health library area: informationist (bleurghh..).

According to a recent job ad (h/t Lorcan Dempsey):

The Nursing Informationist cultivates partnerships between the Biomedical Library and UCLA Nursing community by providing a broad range of information services, including in-depth reference and consultation service, instruction, collection development, and outreach.

Hmm… sounds just like a librarian to me?

Writing in the Journal of the Medical Library Association, The librarian as research informationist: a case study (101(4): 298–302,October, 2013), Lisa Federer described the  role in the following terms:

“The term “informationist” was first coined in 2000 to describe what the authors considered a new health sciences profession that combined expertise in library and information studies with subject matter expertise… Though a single model of informationist services has not been clearly defined, most descriptions of the informationist role assume that (1) informationists are “embedded” at the site where patrons conduct their work or need access to information, such as in a hospital, clinic, or research laboratory; and (2) informationists have academic training or specialized knowledge of their patrons’ fields of practice or research.”

Federer started to tighten up the definition in relation to research in particular:

Whereas traditional library services have generally focused on the “last mile” or finished product of the research process—the peer-reviewed literature—librarians have expertise that can help researchers create better research output in the form of more useful data. … The need for better research data management has given rise to a new role for librarians: the “research informationist.” Research informationists work with research teams at each step of the research process, from project inception and grant seeking to final publication, providing expert guidance on data management and preservation, bibliometric analysis, expert searching, compliance with grant funder policies regarding data management and open access, and other information-related areas.

This view is perhaps shared in a presentation on The Informationist: Pushing the Boundaries by Director of Library Services, Elaine Martin, in a presentation dated on Slideshare as October 2013:

The_Informationist__Pushing_the_Boundaries

Associated with the role are some competencies you might not normally expect from library staffer:

The_Informationist-3__Pushing_the_Boundaries

So – maybe here is the inkling of the idea that there could be a role for librarians skilled in working with information technologies in a more techie way than you might normally expect. (You’d normally expect a librarian to be able to use Boolean search, search limits and advanced search forms. You might not expect them to write their own custom SQL queries, or even build and populate their own databases that they can then query? But perhaps you’d expect a really techie informationist to?) And maybe also the idea that the informationist is a participant in a teaching or research activity?

The embedded nature of the informationist also makes me think of gonzo journalism, a participatory style of narrative journalism written from a first person perspective, often including the reporter as part of the story. Hunter S. Thompson is often held up as some sort of benchmark character for this style of writing, and I’d probably class Louis Theroux as a latter-day exemplar. The reporter as naif participant in which the journalist acts as a proxy for everyman’s – which is to say, our own – direct experience of the reported situation, is also in the gonzo style (see for example Feats of gonzo journalism have lost their lustre since George Plimpton’s pioneering days as a universal amateur).

So I’m wondering: isn’t the informationist actually a gonzo librarian, joining in with some activity and bring the skills of a librarian, or wider information scientist (or information technologist/technician) to the party…?

Another term introduced by Sheila Corrall and again, new to me, was “blended librarian”. According to Steven J. Bell and John Shank writing on The blended librarian in College and Research Libraries News, July/August 2004, pp 3722-375:

We define the “blended librarian” as an academic librarian who combines the traditional skill set of librarianship with the information technologist’s hardware/software skills, and the instructional or educational designer’s ability to apply technology appropriately in the teaching-learning process.

The focus of that paper was in part on defining a new role in which the skills and
knowledge of instructional design are wedded to our existing library and information technology skills
, but that doesn’t quite hit the spot for me. The paper also described six principles of blended librarianship, which are repeated on the LIS Wiki :

  1. Taking a leadership position as campus innovators and change agents is critical to the success of delivering library services in today’s “information society”.
  2. Committing to developing campus-wide information literacy initiatives on our campuses in order to facilitate our ongoing involvement in the teaching and learning process.
  3. Designing instructional and educational programs and classes to assist patrons in using library services and learning information literacy that is absolutely essential to gaining the necessary skills (trade) and knowledge (profession) for lifelong success.
  4. Collaborating and engaging in dialogue with instructional technologists and designers which is vital to the development of programs, services and resources needed to facilitate the instructional mission of academic libraries.
  5. Implementing adaptive, creative, proactive, and innovative change in library instruction can be enhanced by communicating and collaborating with newly created Instructional Technology/Design librarians and existing instructional designers and technologists.
  6. Transforming our relationship with faculty to emphasize our ability to assist them with integrating information technology and library resources into courses, but adding to that traditional role a new capacity to collaborate on enhancing student learning and outcome assessment in the area of information access, retrieval and integration.

Again, the emphasis on being able to work with current forms of instructional technology falls short of the mark for me.

But there is perhaps a glimmer of light in the principle associated with “assist[ing faculty] with integrating information technology and library resources into courses“, if we broaden that principle to include researchers as well as teachers, and then add in the idea that the informationist should also be helping explore, evaluate, advocate and teach on how to use emerging information technologies (including technologies associated with information and data processing, analysis an communication (that is, presentation; so things like data visualisation).

So I propose a new take on the informationist, adopting the term proposed in a second take tweet from Lorcan Dempsey: the informationista (which is far more playful, if nothing else, than informationist).

The informationista is someone like I, who tries share contemporary information skills (such as these), through participatory as well as teaching activities, blending techie skills with a library attitude. The informationista is also a hopeful and enthusiastic amateur (in the professional sense…) who explores ways in which new and emerging skills and technologies may be applied to the current situation.

At last, I have found my calling!;-)

See also: Infoskills for the Future – If You Can’t Handle Information, Get Out of the Library (this has dated a bit but there is still quite a bit that can be retrieved from that sort of take, I think…)

PS see also notes on embedded librarians in the comments below.

Full Text Is [Not] Available…

Whenever I go to a library conference, I come away re-motivated. At The Future of Academic Libraries Symposium held at the OU yesterday, which also hosted the official launch of the OU Digital Archive and a celebration of the career of ever-welcoming Librarian Nicky Whitsed as she heads off to pastures new, I noticed again that I’m happiest when thinking about the role of the Library and information professional, and what it means in a world where discovery of, access to, and processing of information is being expanded every day (and whether what’s newly possible is part of the Library remit).

I’ll post more thoughts on the day later, but for now, a bit of library baiting…! [Hmm, thinks.. maybe this is when I happiest?!;-)]

The OU Library recently opted in to a new discovery system. Aside from the fact that the authentication doesn’t always seem to work seemlessly, there seems to be a signalling issue with the search results:

IEEE_Xplore_Abstract_-_Exploring_New_Roles_for_Librarians_The_Research_Informationist_and_Library_Search___The_Open_University_-_Exploring_New_Roles_for_Librarians__The_Research_Informationist_

When is available not available? When does green mean go? If it said “Full text available” but had a red indicator, I might get the idea that the thing exists in a full text online version, but that I don’t have access to it. But with the green light? That’s like saying the book is on-shelf but it being on a shelf in a bookshop adjunct to the library.

Here’s another example, from the OU repository, where the formally published intellectual academic research outputs of members of the University are published:

I_Scratch_and_Sense_But_Can_I_Program___An_Investigation_of_Learning_with_a_Block_Based_Programming_Language__Educational_IS_T_Journal_Article___IGI_Global_and_Library_Search___The_Open_University_-_I_Scratch_and_Sense_But_Can_I_Program___A

As you can see, this particular publication is not available via the repository, due to copyright restrictions and the publishing model of particular journal involved, but neither does the Library subscribe to the journal. (Which got me wondering – if we did an audit of just the records in the repository and looked up the journal/conference publication details for each one, how many of those items would the OU Library have a subscription to?)

One of the ways I think Libraries have arguably let down their host institutions is in allowing the relationship with the publishers to get into the state it currently is. Time was when the departmental library would have copies of preprints or offprints of articles that had been published in journals (though I don’t recall them also being delivered to the central library?) As it is, we can still make a direct request of the author for a copy of a paper. But the Library – whilst supporting discovery of outputs from the OU academic community – is not able to deliver those actual outputs directly? Which seems odd to me…

Enjoy your retirement, Nicky!:-)

Unthinkable Thinking Made Real?

A few days ago was one of the highlights of my conference year, Internet Librarian International, which started on Monday when I joined up with Brian Kelly once again for a second (updated) outing of our Preparing for the Future workshop (Brian’s resources; my reference slides (unannotated for now, will be updated at some point)).

I hope to post some reflections on that over the next few days, but for now would like to mention one of the presentations on Tuesday – Thinking the unthinkable: a library without a catalogue by Johan Tilstra (@johantilstra) from Utrecht University Library. This project seems to have been in progress for some time, the main idea being that discovery happens elsewhere: the university library should not be focussing on providing discovery services, but instead should be servicing the delivery of content surfaced or discovered elsewhere. To support this, Utrecht are developing a browser extension – UU Easy Access – that will provide full text access to remote resources. As the blurb puts it, [the extension] detects when you are on a website which the Utrecht University Library has a subscription to. This makes it easy for you to get access through the Library Proxy.

This reminded me of an old experiment back from the days I hassled the library regularly, the OU Library Traveller extension (actually, a Greasemonkey script; remember Greasemonkey?;-)

It seems I only posted fragmentary posts about this doodle (OU Library Traveller – Title Lookup and OU Library Traveller – eBook Greedy, for example) but for those without long memories, here’s a brief recap: a long time ago, Jon Udell published a library lookup bookmarklet that would scan the URL of a web page you were on to see if it contained an ISBN (a universal book number), and if so, it would try to open up the page corresponding to that book on your local library catalogue.

I forget the various iterations involved, or related projects in the area (such as scripts that looked for ISBNs or DOIs in a webpage and rewrote them as links to a search on that resource via a library catalogue, or dereferencing of the doi through a doi lookup and libezproxy service), but at some point I ended up with a Greasemonkey script that would pop up a floating panel on a page that contained an ISBN show whether that book was in the OU Library, or available as a full text e-book. (Traffic light colour coded links also showed if the resource was available, owned by the library but currently unavailable, or not avaialble.) I also had – still have, still use regularly – a bookmarklet that will rewrite the URL for subscription based content, such as an academic journal paper, so it goes via the OU library and (hopefully) provides me with full text access: OU libezproxy bookmarklet (see also Arcadia project: bookmarklets; I think some original, related “official-ish, not quite, yet, in testing” OU Library bookmarklets are still available here).

So the “Thinking the Unthinkable” presentation got me thinking that perhaps I had also been thinking along similar lines, as well as that perhaps I should revisit the code to provide an extension that would automatically enhance pages that contained somewhere about them an ISBN or DOI or web domain recognised by the OU’s libezproxy. (If any OU library devs are reading this, (Owen?!;-) it’d be really useful to have a service that could take a URL and then return a boolean flag to say whether or not the OU libezproxy service could do something useful with that URL… or provide me with a list of domains that the OU libezproxy service likes so I could locally decide whether to try to reroute a URL through it…) Hmm….

As I dug through old blog posts, I was also reminded of a couple of other things. Firstly, another competition hack that tried to associate courses with books using a service published by Dave Patten at the University of Huddersfield. Hmm… Thinks… Related content… or alternative content, maybe… so if I’m on a journal page somewhere, maybe I could identify whether it’s OA available from a university repository..? (Which I guess is what Google Scholar often does when it links to a PDF copy of a paper?)

Secondly, I was reminded of another presentation I gave at ILI six years ago (the slides are indecipherable and without annotation) on “The Invisible Library” (which built on from a similarly titled internal OU presentation a few weeks earlier).

The original idea was that libraries could provide invisible helpdesk support through monitoring social media channels, but also included elements of providing locally mediated access to remotely discovered items in an invisible way through things like the OU Library Traveller. It also seems to refer to “contentless” libraries, (eg as picked up in this April Fool), and perhaps foreshadows the idea of an open access academic library.

So I wonder – time to revisit this properly, and try to recapture the (unthinkable?) thinking I was thinking back then?

PS I also notice that around that time I was experimenting with Google Custom search engines. This is the second time in as many months I’ve rediscovered my CSE doodles (previously with Creating a Google Custom Search Engine Over Hyperlocal Sites). Maybe it’s time I revisited them again, too…?

Idle Thoughts on “Data Literacy” in the Library…

In part for a possible OU Library workshop, in part trying to mull over possible ideas for an upcoming ILI2015 workshop with Brian Kelly, I’ve been pondering what sorts of “data literacy” skills are in-scope for a typical academic library.

As a starting point, I wonder if this slicing is useful, based on the ideas of data management, discovery, reporting and sensemaking.

OU_Library_data_questions_pptx

It identifies four different, though interconnected, sorts of activity, or concern:

  • Data curation questions – research focus – covering the management and dissemination of research data, as well as dissemination issues. This is mainly about policy, but begs the question about who to go to for the technical “data engineering” issues, and assumes that the researcher can do the data analysis/data science bits.
  • Data resourcing – teaching focus – finding and perhaps helping identify processes to preserve data for use in teaching context.
  • Data reporting – internal process focus – capturing, making sense of/analysing, and communicating data relating to library related resources or activities; to what extent should each librarian be able to use and invoke data as evidence relating to day job activities. Could include giving data to course teams about resource utilisation, research teams to demonstrate impact ito tracking downloads and use of OU published resources.
  • Date sensemaking – info skills focus – PROMPT in a data context, but also begging the question about who to go to for “data computing” applications or skills support (cf academic/scientific computing support, application training); also relates to ‘visual literacy’ in sense of interpreting data visualisations, methods for engaging in data storytelling and academic communication.

Poking in to each of those areas a little further, here’s what comes to mind at first thought…

Data Curation

The library is often the nexus of activity around archiving and publishing research papers as part of an open access archive (in the OU, this is via ORO: Open Research Online). Increasingly, funders (and publishers) require that researchers make data available too, often under an open data license. Into this box I’m thinking of those activities related to supporting the organisation, management, archiving, and publication of data related to research. It probably makes sense to frame this in the context of a formal lifecycle of a research project and either the various touchpoints that the lifecycle might have with the library, or those areas of the lifecycle where particular data issues arise. I’m sure such things exists, but what follows is an off-the-of-my-head informal take on it…!

Initial questions might relate to putting together (and costing) a research data management plan (planning/bidding, data quality policies, metadata plans etc). There might also be requests for advice about sharing data across research partners (which might extend privacy or data protection issues over and above any immediate local ones). In many cases, there may be concerns about linking to other datasets (for example, in terms of licensing or permissions, or relating to linked or derived data use; mapping is often a big concern here), or other, more mundane, operational issues (how do I share large datafiles that are too big to email?). Increasingly, there are likely to be publication/dissemination issues (how/where/in what format do I publish my data so it can be reused, how should I license it?) and legacy data management issues (how/where can I archive my data? what file formats should I use?). A researcher might also need support in thinking through consequences – or requirements – of managing data in a particular way. For example, particular dissemination or archiving requirements might inform the choice of data management solution from the start: if you use an Access database, or directory full of spreadsheets, during the project with one set of indexing, search or analysis requirements, you might find a certain amount of re-engineering work needs to be done in the dissemination phase if there is a requirement that the data is published at record level on a public webpage with different search or organisational requirements.

What is probably out of scope for the library in general terms, although it may be in scope for more specialised support units working out of the library, is providing support in actual technology decisions (as opposed to raising technology specification concerns…) or operations: choice of DBMS, for example, or database schema design. That said, who does provide this support, or whom should the library suggest might be able to provide such support services?

(Note that these practical, technical issues are totally in scope for the forthcoming OU course TM351 – Data management and analysis…;-)

Data resourcing

For the reference librarian, requests are likely to come in from teaching staff, students, or researchers about where to locate or access different sources of data for a particular task. For teaching staff, this might include identifying datasets that can be used in the context of a particular course, possibly over several years. This might require continuity of access via a persistent URL to different sorts of dataset: a fixed (historical) dataset, for example, or a current, “live” dataset, reporting the most recent figures month on month or year on year. Note that there may be some overlap with data management issues, for example, ensuring that data is both persistent and provided in a format that will remain appropriate for student use over several years.

Researchers too might have third party data discovery or access requests, particularly with respect to accessing commercial or privately licensed data. Again, there may be overlaps with data management concerns, such as how to managing secondary data/third party data appropriately so it doesn’t taint the future licensing or distribution of first party or derived data, for example.

Students, like researchers, might have very specific data access requests – either for particular datasets, or for specific facts – or require more general support, such as advice in citing or referencing sources of secondary data they have accessed or used.

Data reporting

In the data reporting bin, I’m thinking of various data reporting tasks the library might be asked to perform by teaching staff or researchers, as well data stuff that has to be done as internally within the library, by librarians, for themselves. That is, tasks within the library that require librarians to employ their own data handling skills.

So for example, a course team might want to know what library managed resources referenced from course material are being when and by how many students. Or learning analytics projects may request access to data to help build learner retention models.

A research team might be interested in number of research paper or data downloads from the local repository, or citation analyses, or other sources of bibliometric data, such as journal metrics or altmetrics, for assessing the impact of a particular project.

And within the library, there may be a need for working with and analysing data to support the daily operations of the library – staffing requirements on the helpdesk based on an analysis of how and when students call on it, perhaps – or to feed into future planning. Looking at journal productivity, for example, (how often journals are accessed, or cited, within the institution) when it comes to renewal (or subscription checking) time; or at a more technical level, building recommendation systems on top of library usage data. Monitoring the performance of particular areas of the library website through website analytic, or even linking out to other datasets and looking at the impact of library resource utilisation by individual students on their performance.

Date sensemaking

In this category, I’m lumping together a range of practical tools and skills to complement to the tools and skills that a library might nurture through information skills training activities (something that’s also in scope for TM351…). So for example, one are might be providing advice about how to visualise data as part of a communication or reporting activity, both in terms of general data literacy (use a bar chart, not a pie chart for this sort of data; switch the misleading colours off; sort the data to better communicate this rather than that, etc) as well as tool recommendations (try using this app to generate these sorts of charts, or this webservice to plot that sort of map). Another might be how to read, interpret, or critique a data visualisation (looking at crappy visualisations can help here!;-), or rate the quality of a dataset in much the same way you might rate the quality of an article.

At a more specialist level, there may be a need to service requests about what tools to use to work with a particular dataset, for example, a digital humanities researcher looking for advice on a text mining project?

I’m also not sure how far along the scale of search skills library support needs to go, or whether different levels of (specialist?) support need to be provided for undergrads, postgrads and researchers? Certainly, if your data is in a tabular format, even just as a Google spreadsheet, you become much more powerful as a user if you can frame complex data queries (pivot tables, any one?) or start customising SQL queries. Being able to merge datasets, filter them (by row, or by column), or facet them, cluster them or fuzzy join them are really powerful dataskills to have – and that can conveniently be developed within a single application such as OpenRefine!;-)

Note that there is likely to be some cross-over here also between the resource discovery role described above and helping folk develop their own data discovery and criticism skills. And there may also be requirements for folk in the library to work on their own data sensemaking skills in order to do the data reporting stuff…

Summary

So, is that a useful way of carving up the world of data, as the library might see it?

The four different perspectives on data related activities within the library described above cover not only data related support services offered by the library to other units, but also suggest a need for data related skills within the library to service its own operations.

What I guess I need to do is flesh out each of the topics with particular questions that exemplify the sort of question that might be asked in each context by different sorts of patron (researcher, educator, learner). If you have any suggestions/examples, please feel free to chip them in to the comments below…;-)

What Would An Open Access Academic Library Look Like, and What Would an Open Access Academic Librarian Do?

In the context of something else, I mooted whether a particular project required an “open access academic library” as a throwaway comment, but it’s a phrase that’s been niggling at me, along with the associated “open access academic librarian”, so I’ll let my fingers do the talking and see what words come out…

Traditional academic libraries provide a range a services: they’re a home to physical content, and an access point to online subscription content; they provide managed collections that support discovery and retrieval of “quality” content; they promote skills development that allow folk to discover and retrieve content, and rate its quality, as well as providing expert levels of support for discovery and retrieval. They support teaching by forcing reading lists out of academics and making sure corresponding items are available to students. They have a role to play in managing a university’s research knowledge outputs, maintaining repositories of published papers and, in previous years, operating university presses. They are looking to support the data management needs of researchers, particularly with respect to the data publication requirements being placed on researchers by their funders. If they were IT empire builders, they’d insist that all academics can only engage with publishers through a library system that would act as an intermediary with the academic publishers and could automate the capture of pre-prints and supporting data; but they’re too gentle for that, preferring to ask politely for a copy, if we may… And they do cake – at least, they do if you go to meetings with the librarians on a regular basis.

To a certain extent, libraries are already wide-open access institutions, subject to attack, offering few barriers to entry, at least to their members, though unlikely to turn anyone with a good reason away, providing free-at-the-point of use access to materials held, or subscribed to, and often a peaceful physical location conducive to exploring ideas.

But what if the library needed to support an fully open-access student body, such as students engaged in an open education course of study, or an open research project, for a strict, rather than openwashed, definition of open? Or perhaps the library serves a wider community of people with problems that access to appropriate “academic” knowledge might help them solve? What would – could – the role of the library be, and what of the role of the librarian?

First, the library would have to be open to everyone. An open course has soft boundaries. A truly open course has no boundaries.

Secondly, the library would need to ensure that all the resources it provided a gateway to were openly licensed. So collections would be built from items listed on the Directory of Open Access Journals (DOAJ), perhaps? Indeed, open access academic librarians could go further and curate “meta-journal” readers of interest to their patrons (for example, I seem to remember Martin Weller experimenting with just such a thing a few years ago: Launching Meta EdTech Journal).

Thirdly, the open access academic library should also offer a gateway to good quality open textbook shelves and other open educational resources. As I found to my cost only recently, searching for useful OERs is not a trivial matter. Many OERs come in the form of lecture or tutorial notes, and as such are decontextualised, piecemeal trinkets. If you’re already at that part of the learning journey, another take on the “Mech Eng Structures, week 7” lecture might help. If you want to know out of nowhere how to work out the deflection of a shaped beam, finding some basic lecture notes – and trying to make sense of them – only gets you so far; other pieces (such as the method of superposition) seem to be required. Which is to say, you also need the backstory and a sensible trail that can walk you up to that resource so that you can start to make sense of it. And you might also need other bits of knowledge to answer the question you have to hand. (Which is where textbooks come in again – they embed separate resources in a coherent knowledge structure.)

Fourthly, to mitigate against commercial constraints on its activities, the open access library should explore open sustainability. Such as being built on, and contributing to, the development of open infrastructure (see also Principles for Open Scholarly Infrastructures; I don’t know whether things like Public Knowledge Project (PKP) would count as legitimate technology parts of such an infrastructure? Presumably things like CKAN and EPrints would?).

Fifthly, the open access librarian should offer open access librarian support, perhaps along the lines of invisible support or being an influential friend?

Sixthly, the open access digital library could provide access to online applications or online digital workbenches (of which, more in another post). For example, I noticed the other day that Bryn Mawr College provide student access to Jupyter (IPython) notebooks. Several years ago, the OU’s KMI made RStudio available online to researchers as part of KMI Crunch, and so on. You might argue that this is not really the role of the library – but physical academic libraries often provide computer access points to digital services and applications subscribed to by the university on behalf of the students, student desktops replete the software tools and applications the student needs for their courses. If I’m an open access learner with a netbook or a tablet, I couldn’t install desktop software on my computer even if I wanted to.

Seventhly, there probably is a seventh, and eigth, and maybe even a ninth and tenth, but my time’s up for this post.. (If only there were room in the margin of my time to write this post properly…;-)

Whither the Library?

As I scanned my feeds this morning, a table in a blog post (Thoughts on KOS (Part 3): Trends in knowledge organization) summarising the results from a survey reported in a paywalled academic journal article – Saumure, Kristie, and Ali Shiri. “Knowledge organization trends in library and information studies: a preliminary comparison of the pre-and post-web eras.” Journal of information science 34.5 (2008): 651-666 [pay content] – really wound me up:

lib_trends

My immediate reaction to this was: so why isn’t cataloguing about metadata? (Or indexing, for that matter?)

In passing, I note that the actual paper presented the results in a couple of totally rubbish (technical term;-) pie charts:

wtf_infoProfessionalsmyRS

More recently (that was a report from 2008 on a lit review going back before then), JISC have just announced a job ad for a role as Head of scholarly and library futures to “provide leadership on medium and long-term trends in the digital scholarly communication process, and the digital library.“. (They didn’t call… You going for it, Owen?!;-)

The brief includes “[k]eep[ing] a close watch on developments in the library and research support communities, and practices in digital scholarship, and also in digital technology, data, on-line resources and behavioural analytics” and providing:

Oversight and responsibility for practical projects and experimentation in that context in areas such as, but not limited to:

  • Digital scholarly communication and publishing
  • Digital preservation
  • Management of research data
  • Resource discovery infrastructure
  • Citation indices and other measures of impact
  • Digital library systems and services
  • Standards, protocols and techniques that allow on-line services to interface securely

So the provision of library services at a technical level, then (which presumably also covers things like intellectual property rights and tendering – making sure the libraries don’t give their data and organisation’s copyrights to the commercial publishers – but perhaps not providing a home for policy and information ethical issue considerations such as algorithmic accountability?), rather than identifying and meeting the information skills needs of upcoming generations (sensemaking, data management and all the other day to day chores that benefit from being a skilled worker with information).

It would be interesting to know what a new appointee to the role would make of the recently announced Hague Declaration on Knowledge Discovery in the Digital Age (possibly in terms of a wider “publishing data” complement to “management of research data”), which provides a call for opening up digitally represented content to the content miners.

I’d need to read it more carefully, but at the very briefest of first glances, it appears to call for some sort of de facto open licensing when it comes to making content available to machines for processing by machines:

Generally, licences and contract terms that regulate and restrict how individuals may analyse and use facts, data and ideas are unacceptable and inhibit innovation and the creation of new knowledge and, therefore, should not be adopted. Similarly, it is unacceptable that technical measures in digital rights management systems should inhibit the lawful right to perform content mining.

The declaration also seems to be quite dismissive of database rights. A well-put together database makes it easier – or harder – to ask particular sorts of question and to a certain respect reflects the amount of creative effort involved in determining a database schema, leaving aside the physical effort involved in compiling, cleaning and normalising the data that secures the database right.

Also, if I was Google, I think I’d be loving this… As ever, the promise of open is one thing, the reality may be different, as those who are geared up to work at scale, and concentrate power further, inevitably do so…

By the by, the declaration also got me thinking: who do I go to in the library to help me get content out of APIs so that I can start analysing it? That is, who do I go to get help with with “resource discovery infrastructure” and perhaps more importantly in this context, “resource retrieval infrastructure”? The library developer (i.e. someone with programming skills who works with librarians;-)?

And that aside from the question I keep asking myself: who do I go to to ask for help in storing data, managing data, cleaning data, visualising data, making sense of data, putting data into a start where I can even start to make sense of it, etc etc… (Given those pie charts, I probably wouldn’t trust the library!;-) Though I keep thinking: that should be the place I’d go.)

The JISC Library Futures role appears silent on this (but then, JISC exists to make money from selling services and consultancy to institutions, right, not necessarily helping or representing the end academic or student user?)

But that’s a shame; because as things like the Stanford Center for Interdisciplinary Digital Research (CIDR) show, libraries can act as a hub and go to place for sharing – and developing – digital skills, which increasingly includes digital skills that extend out of the scientific and engineering disciplines, out of the social sciences, and into the (digital) humanities.

When I started going into academic libraries, the librarian was the guardian of “the databases” and the CD-ROMs. Slowly access to these information resources opened up to the end user – though librarian support was still available. Now I’m as likely to need help with textmining and making calendar maps: so which bit of the library do I go to?

What did you notice for the first time today?

A week late on posting this, catching up with Brian’s notes on the ILI 2013: Future Technologies and Their Applications Workshop workshop we ran last week, and his follow up – What Have You Noticed Recently? – inspired by not properly paying attention to what I had to say, here are few of my own reflections on what I heard myself saying at the event, along with additional (minor) comments around the set of ‘resource’ slides I’d prepped for the event, though I didn’t refer to many of them…

  • slides 2-6 – some thoughts on getting your eye into some tech trends: OU Innovating Pedagogy reports (2012, 2013), possible data-sources and reports;
  • slides 6-11 – what can we learn from Google Trends and related tools? A big thing: the importance of segmenting your stats; means are often meaningless. The Mothers’ Day example demonstrates two signal causes (in different territories – i.e. different segments) for the compound flowers trend. The Google Correlate example show how one signal may lead – or lag – another. So the question: do you segment your library data? Do you look for leading or lagging indicators?
  • slides 12-18 – what role should/does/could the library play in developing the reputation of the organisation’s knowledge producers/knowledge outputs, not least as a way of making them more discoverable; this builds on the question of whose role it is to facilitate access to knowledge (along with the question: facilitate access for whom?)? – my take is this fits in the role librarians often take of organising an institution’s knowledge.
  • slides 19-27 – what is a library for? Supporting discovery (of what, by whom)? (Helping others) organise knowledge, and gain access to information? Do research?
  • slides 28-30 – the main focus of my own presentation during the main ILI2013 conference (I’ll post the slides/brief commentary in another post): if the information we want to discover is buried in data, who’s there to help us extract or discover the information from within the data?
  • slides 31-32 – sometimes reframing your perception of an organisation’s offerings can help you rethink the proposition, and sometimes using an analogy helps you switch into that frame of mind. So if energy utilities provide “warm house” and “clean, dry clothes” service, rather than gas or electricity, what shift might libraries adopt?
  • slides 33-39 – a few idle idea prompts around the question of just what is it that libraries do, what services do they provide?
  • slide 40 – one of the items from this slide caused a nightmare tangent! The riff started with a trivial observation – a telling off I received for trying to use the phone on my camera to take a photo of a sign saying “no cameras in the library”, with a photocopier as a backdrop (original story). The purpose of this story was two-fold: 1) to get folk into the idea of spotting anachronisms or situations where one technology is acceptable where an equivalent or alternative is not (and then wonder why/what fun can be had around that thought;-); 2) to get folk into wondering how users might appropriate technology they have to hand to make their lives easier, even if it “goes against the rules”.
  • slide 41 – a thought experiment that I still have high hopes for in the right workshop setting…! if you overheard someone answer a question you didn’t hear with the phrase “did you try the library?”, what might the question be? You can then also pivot the question to identify possible competitors; for example, if a sensible answer to the same question is “did you try Amazon?”, Amazon might be a competitor for the delivery of that service.
  • slide 42 – this can lead on from the previous slide, either directly (replace “library” with “Amazon” or “Google”), or as way of generating ideas about how else a service might be delivered.

Slide not there – a riff on the question of: what did you notice for the first time today? This can be important for trend spotting – it may signify that something is becoming mainstream that you hadn’t appreciated before. To illustrate, I’ve started trying to capture the first time I spot tech in the wild with a photo, such as this one of an Amazon locker in a Co-Op in Cambridge, or a noticing from the first time I saw video screens on the Underground.

As with many idea generating techniques, things can be combined. For example, having introduced the notion of Amazon lockers, we might then ask: so what use might libraries make of such a system, or thing? Or if such things become commonplace, how might this affect or influence the expectations of our users??

via OER-DISCUSS – Notes on Copyright

I though this was handy on the OER-DISCUSS mailing list:

Our copyright officer writes:

… US Copyright ‘Fair Use’ or S29 copying for non-commercial research and private study which allows copying but the key word here is ‘private’. i.e. the provisos are that you don’t make the work or copies available to anyone else.

Although there are UK Exceptions for education, they are very limited or obsolete.
S.32 (1) and (2A) do have the proviso “is not done by reprographic process” which basically means that any copying by any mechanical means is excluded, i.e. you may only copy by hand.

S36 educational provision in law for reprographic copying is
a) only applicable to passages in published works i.e. books journals etc and
b) negated becauses licences are now available S.36 (3)

S.32 (2) permits only students studying courses in making Films or Film soundtracks to copy Film, broacasts or sound recordings.

The only educational exception students can rely on is s.32(3) for Examination athough this also is potentially restrictive. For the exception to apply, the work must count towards their final grade/award and any further dealing with the work after the examination process, becomes infringement.

I’m not sure how they are using Voicethread, but if the presentations are part of their assessed coursework and only available to students, staff and examiners on the course, they may use any Copyright protected content, provided it’s all removed from availability after the assessment (not sure how this works with cloud applications though)

There is also exception s.30 for Criticism or Review, which is a general exception for all, and the copying is necessary for a genuine critique or review of it.

If the students can’t rely on the last 3 exceptions, using Copyright free or licenced material (e.g. Creative Commons), would be highly recommended.

Kate Vasili – Copyright Officer, Middlesex University, Sheppard Library

Open Research Data Processes: KMi Crunch – Hosted RStudio Analytics Environment

One of the possible barriers to widespread adoption of open notebook science is knowing where to start. Video reports of lab experiments hosted on Youtube can be easily embedded in a hosted WordPress blog; a MediaWiki wiki can be used to provide one page per experiment, with change tracking/history on each page and a shadow page for commentary and discussion; Github can be used to provide a version control environment for software code, results data, project pages and documentation. For tabulated data, Google Spreadsheets provides a hosting environment and an API that lets you treat the data as a database and also explore it dashboard style via a range of interactive visual filtering and charting components. Alternatively, a CKAN instance (such as is used to run thedatahub.org) offers data management and preview tools.

Keeping track of data analysis in an open way is also getting easier. In An R-chitecture for Reproducible Research/Reporting/Data Journalism, I briefly mentioned RPubs.com, a site that can be used to 1-click publish HTML reports of statistical analyses executed within the RStudio environment (I really need to do a proper post about this). But now there’s an example of another hosted solution from Fridolin Wild of the OU’s KMi: Crunch.

Crunch offers a hosted RStudio environment (so you can access RStudio via a browser) with public and private areas. The public areas allow you to post datasets, run scripts as a service, or publish results (Sweave generated PDFs, or knitr generated HTML reports, for example).

Crunch also incorporates a MySQL database for each user. (Scheduling and pipelining are also on the cards…)

Whilst developed as an application to support learning analytics (I think?), Crunch also provides a great demonstration of a more general open research data workbench. You can store – and publish – data sets, along with analysis scripts and reports generated by executing those scripts over your data set. Version control isn’t available at the moment (I think?) but RSTudio does have git/github support, so that may be coming. The provision of a MySql database means that data collections can be managed within a database environment. (From a data journalism, rather than an open/reproducible research, perspective, I did wonder whether it would be possible to situate something like Scraperwiki on the same platform and replace its SQLite support with MySQL support, so a Scraperwiki scraper could be used to scrape data into a MySQL database that was then accessed from RStudio? Being able to wire MySQL read/write access into Google Refine on the same platform could also be interesting..;-)

I’m not sure about the extent to which the OU LIbrary is taking an interest in the development of Crunch, but providing best practice support and advice in the orchestration of information and data handling tools seems to me to be in-scope for the academic research librarian, in much the same way as advising on the use of bibliography data management tools used to be…? (For a recent take on this, see Dorothea Salo’s recent Ariadne article Retooling Libraries for the Data Challenge.)