Earlier today, I came across BioShaDock: a community driven bioinformatics shared Docker-based tools registry (BioShadock registry). This collects together a wide range of containerised applications and tools relevant to the bioinformatics community. Users can take one or more applications “off-the-shelf” and run them, without having to go through any complicated software installation process themselves, even if the installation process is a nightmare confusion of dependency hell: the tools are preinstalled and ready to run…
The container images essentially represent reference images that can be freely used by the community. The application containers come pre-installed and ready to run, exact replicas of their parent reference image. The images can be versioned with different versions or builds of the application, so you can reference the use of a particular version of an application and provide a way of sharing exactly that version with other people.
So could we imagine this as a specialist reference shelf in a Digital Library? A specialist reference application shelf, with off-the-shelf, ready-to-run run tools, anywhere, any time?
Another of the nice things about containers is that you can wire them together using things like Docker Compose or Panamax templates to provide a suite of integrated applications that can work with each other. Linked containers can pass information between each other in isolation from the rest of the world. One click can provision and launch all the applications, wired together. And everything can be versioned and archived. Containerised operations can also be sequenced too (eg using DRAY docker pipelines or OpenAPI).
Sometimes, you might want to bundle a set of applications together in a single, shareable package as a virtual machine. These can be versioned, and shared, so everyone has access to the same tools installed in the same way within a single virtual machine. Things like the DH Box, “a push-button Digital Humanities laboratory” (DHBox on github); or the Data Science Toolbox. These could go on to another part of the digital library applications shelf – a more “general purpose toolbox” area, perhaps?
As a complement to the “computer area” in the physical library that provides access to software on particular desktops, the digital library could have “execution rooms” that will actually let you run the applications taken off the shelf, and access them through a web browser UI, for example. So runtime environments like mybinder or tmpnb. Go the the digital library execution room (which is just a web page, though you may have to authenticate to gain access to the server that will actually run the code for you..), say which container, container collection, or reference VM you want to run, and click “start”. Or take the images home you with (that is, run them on your own computer, or on a third party host).
Some fragments relating to the above to try to help me situate this idea of runnable, packaged application shelves with the context of the library in general…
- libraries have been, and still are, one of the places you go access IT equipment and learn IT skills;
- libraries used to be, and still are, a place you could go to get advice on, and training in, advanced information skills, particularly discovery, critical reading and presentation;
- libraries used to be, and still, a locus for collections of things that are often valuable to community or communities associated with a particular library;
- libraries provide access to reference works or reference materials that provide a common “axiomatic” basis for particular activities;
- libraries are places that provide access to commercial databases;
- libraries provide archival and preservation services;
- libraries may be organisational units that support data and information management needs of their host organisation.
Some more fragments:
- the creation of a particular piece of work may involve many different steps;
- one or more specific tools may be involved in the execution of each step;
- general purpose tools may support the actions required perform a range of tasks to a “good enough” level of quality;
- specialist tools may provide a more powerful environment for performing a particular task to a higher level of quality
Some questions:
- what tools are available for performing a particular information related task or set of tasks?
- what are the best tools for performing a particular information related task or set of tasks?
- where can I get access to the tools required for a particular task without having to install them myself?
- how can I effectively organise a workflow that requires the use of several different tools?
- how can I preserve, document or reference the workflow so that I can reuse it or share it with others?
Some observations:
- Docker containers provide a way of packaging an application or tool so that it can be “run anywhere”;
- Docker containers may be linked together in particular compositions so that they can interoperate with each other;
- docker container images may be grouped together in collections within a subject specific registry: for example, BioShaDock.