As well as offering digital application shelves, should libraries offer, or act as instituional sponsors of, digital workbenches?
I’ve previously blogged about things like SageMathCloud, and application based learning environment, and the IBM Data Scientist Workbench, and today came across another example: DHBox, CUNY’s digital humanities lab in the cloud (wiki), which looks like it may have been part of a Masters project?
If you select the demo option, a lab context is spawned for you, and provides access to a range of tools: staples, such as RStudio and Jupyter notebooks, a Linux terminal, and several website creation tools: Brackets, Omeka and WordPress (though the latter two didn’t work for me).
(The toolbar menu reminded me of Stringle / DockLE ;-)
There’s also a file browser, which provides a common space for organising – and uploading – your own files. Files created in one application are saved to the shared file area and available for use on other applications.
The applications are being a (demo) password authentication scheme, which makes me wonder if persistent accounts are in the project timeline?
Once inside the application, you have full control over it. If you need additional packages in RStudio, for example, then just install them:
They work, too!
On the Jupyter notebook front, you get access to Python3 and R kernels:
In passing, I notice that RStudio’s RMarkdown now demonstrates some notebook like activity, demonstrating the convergence between document formats such as Rmd (and ipymd) and notebook style UIs [video].
Code for running your own DHBox installation is available on Github (DH-Box/dhbox), though I haven’t had a chance to give it a try yet. One thing it’d be nice to see is a simple tutorial showing how to add in another tool of your own (OpenRefine, for example?) If I get a chance to play with this – and can get it running – I’ll try to see if I can figure out such an example.
It also reminded me that I need to play with my own install of tmpnb, not least because of the claim that “tmpnb can run any Docker container”. Which means I should be able to set up my own tmpRStudio, or tmpOpenRefine environment?
If visionary C. Titus Brown gets his way with a pitched for MyBinder hackathon, that might extend that project’s support for additional data science applications such as RStudio, as well as generalising the infrastructure on which myBinder can run. Such as Reclaimed personal hosting environments, perhaps?!;-)
That such combinations are now popping up all over the web makes me think that they’ll be a commodity service anytime soon. I’d be happy to argue this sort of thing could be used to support a “technology enhanced learning environment”, as well as extending naturally into“technology enhanced research environments”, but from what I can tell, TEL means learning analytics and not practical digital tools used to develop digital skills? (You could probably track the hell of of people using such environments if you wanted to, though I still don’t see what benefits are supposed to accrue from such activity?)
It also means I need to start looking out for a new emerging trend to follow, not least because data2text is already being commoditised at the toy/play entry level. And it won’t be VR. (Pound to a penny the Second Life hipster, hypster, shysters will be chasing that. Any VR campuses out there yet?!) I’d like to think we might see inroads being made into AR, but suspect that too will always be niche, outside certain industry and marketing applications. So… hmmm… Allotments… that’s where the action’ll be… and not in a tech sense…
For the two new first year computing and IT courses in production (due out October 2017), I’ve been given the newly created slacker role of “course visionary” (or something like that?!). My original hope for this was that I might be able to chip in some ideas about current trends and possibilities for developing our technology enhanced learning that would have some legs when the courses start in October 2017, and remain viable for the several years of course presentation, but I suspect the reality will be something different…
However it turns out, I thought that one of the things I’d use fragments of the time for would be to explore different possible warp threads through the courses. For example, one thread might be to take a “View Source” stance towards various technologies that would show students something of the anatomy of the computing related stuff that populates our daily lives. This is very much in the spirit of the Relevant Knowledge short courses we used to run, where one of the motivating ideas was to help learners make sense of the technological world around them. (Relevant Knowledge courses typically also tried to explore the social, political and economic context of the technology under consideration.)
So as a quick starter for ten, here are some of the things that could be explored in a tech anatomy strand.
The Anatomy of a URL
Learning to read a URL is a really handy to skill to have for several reasons. In the first place, it lets you hack the URL directly to find resources, rather than having to navigate or search the website through its HTML UI. In the second, it can make you a better web searcher: some understanding of URL structure allows you make more effective use of advanced search limits (such as site:, inurl:, filetype:, and so on); third, it can give you clues as to how the backend works, or what backend is in place (if you can recognise a WordPress installation as such, you can use knowledge about how the URLs are put together to interact with the installation more knowledgeably. For example, add ?feed=rss2&withoutcomments=1 to the end of a WordPress blog URL (such as this one) and you’ll get a single item RSS version of the page content.)
The Anatomy of a Web Page
(If you click the mobile phone icon, you can see what the page looks like on a selected class of mobile device.)
I also often look at the resources that have been loaded into the page:
Again, additional tools allow you to set the bandwidth rate (so you can feel how the page loads on a slower network connection) as well as recording a series of screenshots that show what the page looks like at various stages of its loading.
The Anatomy of a Tweet
As well as looking at how something like tweet is rendered in a webpage, it can also be instructive to see how a tweet is represented in machine terms by looking at what gets returned if you request the resource from the Twitter API. So for example, below is just part of what comes back when I ask the Twitter API for a single tweet:
You’ll see there’s quite a lot more information in there than just the tweet, including sender information.
The Anatomy of an Email Message
How does an email message get from the sender to the receiver? One thing you can do is to View Source on the header:
Again, part of the reason for looking at the actual email “data” is so you can see what your email client is revealing to you, and what it’s hiding…
The Anatomy of a Powerpoint File
Filetypes like .xlsx (Microsoft Excel file), .docx (Microsoft Word file) and .pptx (Microsoft Powerpoint file) are actually compressed zip files. Change the suffix (eg pptx to zip and you can unzip it:
Once you’re inside, you can have access to individual image files, or other media resources, that are included in the document, as well as the rest of the “source” material for the document.
The Anatomy of an Image File
Image files are packed with metadata, as this peek inside a photo on John Naughton’s blog shows:
We can also poke around with the actual image data, filtering the image in a variety of ways, changing the compression rate, and so on. We can even edit the image data directly…
Showing people how to poke around inside in a resource has several benefits: it gives you a strategy for exploring your curiosity about what makes a particular resource work (and perhaps also demonstrate that you can be curious about such things); it shows you how to start looking inside a resource (how to go about dissecting it doing the “View Source” thing); and it shows you how to start reading the entrails of the thing.
In so doing, it helps foster a sense of curiosity about how stuff works, as well as helping develop some of the skills that allow you to actually take things apart (and maybe put them back together again!) The detail also hooks you into the wider systemic considerations – why does a file need to record this or that field, for example, and how does the rest of the system make use of that information. (As MPs have recently been debating the Investigatory Powers Bill, I wonder how many of them have any clue about what sort of information can be gleaned from communications (meta)data, let alone what it looks like and how communications systems generate, collect and use it.)
PS Hmmm, thinks.. this could perhaps make sense as a series of OpenLearn posts?
Whilst looking around to see what sorts of graphical editors there are out there for teaching introductory python programming, I ran a search for blockly python. If you haven’t come across Blockly before, it’s a library for building browser based graphical programming interfaces, based on interlocking blocks, with a Scratch style aesthetic: blockly.
For a start, the environment is set up for working with small data sets, and can display small tabular datasets as well as plot them. (You may remember we also used data to motivate programming for the FutureLearn Learn To Code (a line at a time) course.) The language is a subset of Python 2.7 (the environment uses the Skulpt client side Python interpreter; I’m not sure if the turtle demo works!).
The environment also supports blocks-to-code as well as code-to-blocks translations, so you can paste a chunk of code into the text view, and then display the blocks equivalent. (I think this is done by parsing the Python into AST and then using that as the bridge to the blocks view?)
Alternatively, it you’re happier with the blocks, you can write a programme graphically and then grab the code version. Or you can flip between the two…
As well as the blocks/code view, there is a pseudo-code view that maps the code into more explanatory language. This feature is under active development, I think…
To aid debugging – and learning – the environment allows you to step through the code a line at a time, previewing the current state in the panels on the right hand side.
If you get an error, an error prompt appears. This seems to be quite friendly in some cases, though I suspect not every error or warning is trapped for (I need to explore this a bit more; I can’t help thinking than an “expert” view to show the actual error message might also be useful if the environment is being used as a stepping stone to text-based Python programming.)
The code is available on Github, and I made a start on putting it into a docker container until my build broke (Kitematic on my machine doesn’t seem to like Java at the moment – a known issue – which seems to be required as part of the build process)…
The environment is also wrapped up in a server side environment, and on the Virginia Tech is wrapped in a login-if-you-want-to environment. I didn’t see any benefit from logging in, though I was hoping to be able to name and save my own programmes. (I wonder if it’s also possible to serialise and encode a programme into a URL so it can be shared?)
You can also embed the environment – prepopulated with code, if required, though I’m not sure how to to that? – inline in a web page, so we could embed it in course materials, for example. Being able to hooks this into an auto-marking tool could also be interesting…
All in all, a really nice environment, and one that I think we could explore for OUr own introductory computing courses.
I also started wondering about how BlockPy might be able to work with a Jupyter server/IPython kernel, or be morphed into an IPyWidget…
In the first case, BlockPy could be used to fire up an IPython process via a Jupyter server, and handle code execution and parsing (for AST-block conversion?) that way rather then using the in-browser Python Skulpt library. Having a BlockPy front end to complement Jupyter notebooks could be quite interesting, I think?
On the widget front, I can imagine running BlockPy within a Jupyter notebook, using it to generate code that could be exported into a code cell, for example, though I’m not really clear what benefit this would provide?
So – anyone know if there is any work anywhere looking at taking the BlockPy front-end and making it a standalone Jupyter client?! :-)
A few days ago I came across a project that has been looking at digital preservation, and in particular the long term archiving of “functional” digital objects, such as software applications: bwFLA — Emulation as a Service [EaaS]. (I wonder how long that site will remain there…?!)
The Emulation-as-a-Service architecture simplifies access to preserved digital assets allowing end users to interact with the original environments running on different emulators.
I’d come across the project in part via search for examples of Docker containers and other sorts of VM being used via portable “compute sticks”. It seems that the bwFLA folk have been exploring two ways of making emulated services available: EaaS using Docker and a boot to emulation route from machine images on bootable USBs, although they don’t seem (yet) to have described a delivery system that includes a compute stick. (See a presentation on their work here. )
One of the things that struck me about the digital preservation process was the way in which things need to be preserved so that they can be run in an arbitrary future, or at least, in an arbitrary computing environment. In the OU context, where we have just started a course that shipped a set of interlinked applications to students via a virtual machine that could be run across different platforms, we’re already finding issues arising from flaky combinations of VirtualBox and Windows; what we really need to do is be shipping something that is completely self-bootable (but then, that may in turn, turn up problems?). So this got me thinking that when we design, and distribute, software to students it might make sense to think of the distribution process as an exercise in distributing preserved digital objects? (This actually has implications in a couple of senses: firstly, in terms of simply running the software: how can we distribute it so that students can run it; secondly, in terms of how we contextualise the versioning of the software – OU courses can be a couple of years in planning and five years in preservation, which means that the software we ship may be several versions behind the latest release, if the software has continued to be updated).
So if students have problems running software in virtual machines because of problems running the virtual machine container, what other solutions are there?
One way is to host the software and make it available as a service accessed via a web browser or other universal client, although that introduces two complications: firstly, the need for network access; secondly, ensuring that the browser (which is to say, browser-O/S combination?) or universal client can properly service the service…
A second way is to ship the students something bootable. This could be something like a live USB, or it could be a compute stick that comes with preinstalled software on it. In essence, we distribute the service rather than the software. (On this note, things like unikernels look interesting: just enough O/S to run the service or application you’re interested in.) There are cost implications here, of course, although the costs might scale differently depending on who pays: does the OU cover the cost of distribution (“free” to students); does the student pay at-cost and buy from the OU; does the student pay a commercial rate (eg covering their own hosting fees on a cloud service); and so on?
The means students have at their disposal for running software is also an issue. The OU has used to publish different computing specification guidelines for individual courses, but now I think a universal policy applies. From discussions I’ve had with various folk, I seem to be in a minority of one suggesting that students may in the future not have access to general purpose computers onto which they can install software applications, but instead may be using netbooks or even tablet computers to do their studies. (I increasingly use a cloud host to run services I want to make use of…)
I can see there is an argument for students needing access to a keyboard to make life easier when it comes to typing up assessment returns or hacking code, and also the need for access to screen real estate to make life easier reading course materials, but I also note that increasing numbers of students seem to have access to Kindles which provide a handy second screen way of accessing materials.
(The debate about whether we issue print materials or not continues… Some courses are delivered wholly online, others still use print materials. When discussions are held about how we deliver materials, the salient points for me are: 1) where the display surface is (print is a “second screen’ display surface that can be mimicked by a Kindle – and hence an electronically distributed text; separate windows/tabs on a computer screen are display surfaces within a display surface); 2) whether the display surface supports annotations (print does, beautifully); 3) the search,navigation and memory affordances of the display surface (books open to where you were last reading them, page corners can be folded, you have a sense of place/where you are in the text, and (spatial) memory of where you read things (in the book as we well as on the page); 4) where you can access the display surface (eg in the bath?); 5) whether you can arrange the spatial location of the display surface to place it in proximity to another display surface).
Print material doesn’t come without its own support issues though…
“But (computing) students need a proper computer”, goes the cry, although never really unpacked…
From my netbook browser (keyboard, touchpad, screen, internet connection, but not the ability to install and run “traditional” applications), I can, with a network connection, fire up an arbitrary number of servers in London, or Amsterdam, or Dublin, or the US, and run a wide variety of services. (We require students to have access to the internet so they can access the VLE…)
From my browser, I could connect to a Raspberry Pi, or presumably a compute stick (retailing at about £100), that could be running EaaS applications for me.
So I can easily imagine an “OU Compute Stick” – or “FutureLearn Compute Stick” – that I can connect to over wifi, that runs a Kitematic like UI that can install applications from an OU/FutureLearn container/image repository or from an inserted (micro)SD card. (For students with a “proper” computer, they’d be able to grab the containers off the card and run them on their own computer.)
At the start of their degree, students would get the compute stick; when they need to run OU/FutureLearn provided apps, they grab them from the OU-hub, or receive them in the post on an SD card (in the new course, we’ve noticed in some situations, some problems in downloading large files reliably). The compute stick would have enough computational power to run the applications, which could be accessed over wifi via a browser on a “real” computer, or a netbook (which has a keyboard), or a tablet computer, or even a mobile device. The compute stick would essentially be a completely OU managed environment, bootable, and with it’s own compute power. The installation problems would be reduced to finding a way for the stick to connect to the internet (and even that may not be necessary), and the student to connect to the stick.
Developing such a solution might also be of interest to the digital preservation folk…even better if the compute stick had a small screen so you could get see a glimpse at least of what the application looked like. Hmm..thinks… rather than a compute stick, would shipping students a smartphone rooted to run EaaS work?! Or do smartphones have the wrong sort of processor?
Listening to F1 technical pundit Gary Anderson on a 2014 panel (via Joe Saward) about lessons from F1 for business, I was struck by his comment that “motor racing is about going round in circles..racing drivers go round in circles all day long”, trying to improve lap on lap:
Each time round is another chance to improve, not just for the driver but for the teams, particularly during practice sessions, where real time telemetry allows the team to offer suggested changes as the car is on track, and pit stop allow physical (and computational?) changes to be made to the car.
Each lap is another iteration. Each stint is another iteration. Each session is another iteration. (If you only get 20 laps in a session, that could still give you fifty useful iterations, fifty chances to change something to see if it makes a useful difference.) Each race weekend is another iteration. Each season is another iteration.
Each iteration gives you a chance to try something new and compare it with what you’ve done before.
Who else iterates? Google does. Google (apparently) runs experiments all the time. Potentially, every page impression is another iteration to test the efficacy of their search engine results in terms of convert searchers to revenue generating clickers.
But the thing about iteration is that changes might have negative effects too, which is one reason why you need to iterate fast and often.
But business processes often appear to act as a brake on such opportunities.
Which is why I’ve learned to be very careful writing anything down… because organisations that have had time to build up an administration and a bureaucracy seem tempted to treat things that are written down as somehow fixed (even if those things are written down in socially editable documents (woe betide anyone who changes what you added to the document…)); things that are written down become STOPs in the iteration process. Things that are written down become cast in stone… become things that force you to go round in circles, rather than iterating…
Having been given a “visioning” role for a new level 1 course in production, I’ve started trying to make sense of what an online, or at least, virtual, computing and IT lab might look like for use in an OU context.
One of the ways I’ve tried to carve up the problem is in terms of support tools (the sorts of things a classroom management system might offer – chat rooms, screen sharing, collaborative working, etc) and end-user, task related applications. Another is to try to get a feel for how ecosystems might develop around particular technologies or communities.
It probably won’t surprise regular readers that one of the communities I’ve been looking at is the one growing up around Jupyter notebooks. So here’s a quick summary of some of the Jupyter related projects currently under development that have caught my eye.
Dashboards and Alternative Browser Based UIs
Although I’ve still to start playing with Jupyter widgets, the Jupyter incubator dashboards project seems to be offering support for a structured way o using them in the form of grid-based dashboards generated directly from notebooks. (I guess this is a variant of creating interactive slide decks, eg using nbconvert –to slides, from notebooks?)
It seems as if the dashboard project came out of the IBM Cloud Emerging Technology group (Dynamic Dashboards from Jupyter Notebooks) which suggests that as a tool Jupyter notebooks might have some appeal for business, as well as education and research…
Another company that seems to have bought into the Jupyter ecosystem is technical book publisher O’Reilly. Their thebe code library claims to provide “an easy way to let users on a web page run code examples on a server”, such as a simple HTML UI for a Jupyter served process, as this thebe demo illustrates.
One thing I’ve been wondering about for a rewrite of out level 1 residential school robotics activity is whether we might be able to produce a browser or electron app based desktop or tablet based editor, inspired by the look and feel of the RobotLab drag’n’drop text based editor we’ve used in the course previously, to connect to a Jupyer server running on a Lego EV3 brick; and the thebe demo suggests to me that we might…
Collaboration around Jupyter notebooks comes in two forms: realtime collaborative editing within the same notebook (where two users have a copy of the same notebook open in separate editors and see each others updates in realtime), and collaboration around documents in a shared/social repository.
SageMathCloud already offers realtime collaboration within Jupyter notebooks, but official Jupyter support for this sort of feature is still on the official Jupyter project roadmap (using Google Drive as the backbone).
Realtime collaboration within notebooks is also available in the form of Livebook [code], which lives outside the main Jupyter project; the live demo site allows you to create – and collaborate around – temporary notebooks (pandas included): try opening a couple of copies of the same notebook (same URL) in a couple of browsers to get a feel for how it works…
In terms of asynchronous collaboration, this independent Commit-and-Push to GitHub from Jupyter Notebooks notebook extension looks interesting in terms of its ability to save the current notebook as a git commit (related issue here). The original nbdiff project [code] appears to have stalled, but there again, the SageMathCloud environment provides a history slider that lets you play through a whole series of (regular) saves of a notebook to show how it evolved and get access to “interim” versions of it.
There seems to be an independent NotebookDiff extension for comparing the state of notebook checkins, though I haven’t used it. I’m guessing the GitCheckpoints extension from the same developers (which I also haven’t tried) saves checkpoints as a git commit?
Jupyter on the Desktop
One of the “problems” of current Jupyter notebook usage is that the application does not run as a standalone app; instead, a server is started and then notebooks are accessed via a browser.
The nteract/composition app is a desktop based electron app, currently under development (I couldn’t get it to build with my node.js installation).
See also: this earlier, independently produced, proof of concept IPython Desktop project that offers a cleaner experience; the independent, proof-of-concept Jupyter sidecar electron app, that displays rich Jupyter kernel output from commands issued in a command line shell in an HTML presenting side display; and the Atom Hydrogen extension, which allows code to be executed against Jupyter kernels, Light Table style.
A quick scout around Jupyter related projects in progress shows much promise in the development of end-user tools that will make Jupyter notebooks easier to use, as well as tools that support collaborative working around a particular notebooks.
The Jupyter project has an active community around it and recently advertised for a full time project manager.
Jupyter notebooks feature in the IBM Data Scientist Workbench (as well as things like Wakari and Domino Data Lab) and IBM also seemed to bootstrap the dashboard components. Technical book publisher O’Reilly use Jupyter notebooks as a first-class authoring environment for the O’Reilly publishing program and Github recognises the .ipynb file format a first class document type, rendering HTML previews of .ipynb files uploaded to Github or as Github gists.
In a university context, Jupyter notebooks offer much potential for both teaching and research. It will be interesting to see how university IT departments react to this style of computing, and whether they try to find ways of supporting their community in the use of such systems, or whether their users will simply decide to go elsewhere.
PS I think this is probably going to become a living post…
- nbpresent: next generation slideshows from notebooks, apparently…
- nbbrowserpdf: “LaTeX-free PDF generation for Jupyter Notebooks”
Both of those come from the Anaconda developers, so it seems like Continuum are buying into the Jupyter ecosystem…
And some more from IBM: Jupyter Notebooks as RESTful Microservices that “turn notebooks into RESTful web APIs”. Hmm, literate API definitions than can be consumed by literate API consumer notebooks?
PS [March 2016] For a more recent round-up, see the IBM Emerging Tchnologies blog post: Powered By Jupyter: A Survey of the Project Ecosystem.
One of the many things on my “to do” list is to put together a blogged script that wires together RStudio, Jupyter notebook server, Shiny server, OpenRefine, PostgreSQL and MongDB containers, and perhaps data extraction services like Apache Tika or Tabula and a few OpenRefine style reconciliation services, along with a common shared data container, so the whole lot can be launched on Digital Ocean at a single click to provide a data wrangling playspace with all sorts of application goodness to hand.
(Actually, I think I had a script that was more or less there for chunks of that when I was looking at a docker solution for the databases courses, but that fell by the way side and I suspect the the Jupyter container (IPython notebook server, as was), probably needs a fair bit of updating by now. And I’ve no time or mental energy to look at it right now…:-(
Anyway, the IBM Data Scientist Workbench now sits alongside things like KMis longstanding KMi Crunch Learning Analytics Environment (RStudio + MySQL), and the Australian ResBaz Cloud – Containerised Research Apps Service in my list of why the heck can’t we get our act together to offer this sort of SaaS thing to learners? And yes I know there are cost applications…. but, erm, sponsorship, cough… get-started tokens then PAYG, cough…
It currently offers access to personal persistent storage and the ability to launch OpenRefine, RStudio and Jupyter notebooks:
The toolbar also suggest that the ability to “discover” pre-identified data sources and run pre-configured modeling tools is also on the cards.
The applications themselves run off a subdomain tied to your account – and of course, they’re all available through the browser…
So what’s next? I’d quite like to see ‘data import packs’ that would allow me to easily pull in data from particular sources, such as the CDRC, and quickly get started working with the data. (And again: yes, I know, I could start doing that anyway… maybe when I get round to actually doing something with isleofdata.com ?!;-)
See also these recipes for running app containers on Digital Ocean via Tutum: RStudio, Shiny server, OpenRefine and OpenRefine reconciliation services, and these Seven Ways of Running IPython / Jupyter Notebooks.