Deconstructing the TM351 Virtual Computing Environment via VS Code

For 2020J, which is to say, the 2020 October presentation, of our TM351 Data Management and Analysis course, we’ve deprecated the original VirtualBox packaged virtual machine and moved to a monolithic Docker container that packages all the required software applications and services (a Jupyer notebook server, postgres and mongoDB database servers, and OpenRefine).

As with the VM, the container is headless and exposes applications over http via browser based user interfaces. We also rebranded away from “TM351 VM” to “TM351 VCE”, where VCE stands for Virtual Computing Environment.

Once Docker is installed, the environment is installed and launched purely from the command line using a docker run command. Students early in to the forums have suggested moving to docker compose, which simplifies the command line command significantly, but also at the cost of having to supply a docker-compose.yaml . With OU workflows, it can take weeks, if not months, to get files onto the VLE for the first time, and days to weeks to post updates (along with a host of news announcements and internal strife about the possibility of tutors/ALs and students having different versions of the file). As we need to support cross-platfrom operation, and as the startup command specifies file paths for volume mounts, we’d need different docker-compose files (I think?) because file paths on Mac/Linux hosts, versus Windows hosts, use a different file path syntax (forward vs back slashes as path delimiters. [If anyone can tell me how to write a docker-compose.yaml files with arbitrary paths on the volume mounts, please let me know via the comments…]

Something else that has cropped up early in the forums is mention of VS Code, which presents a way to personalise the way in which the course materials are used.

By default, the course materials we provide for practical activities are all based on Jupyter notebooks, delivered via the Jupyter notebook server in the VCE (or via an OU hosted notebook server we are also exploring this year). The activities are essentially inlined using notebook code cells within a notebook that presents a linear teaching text narrative.

Students access the notebooks via their web browser, wherever the notebook server is situated. For students running the Docker VCE, notebook files (and OpenRefine project files) exist in a directory on the student’s own computer that is then mounted into the container; make changes to the notebooks in the container and those changes are saved in the notebooks mounted from host. Delete the container, and the notebooks are still on your desktop. For students using the online hosted notebook server, there is no way of synchronising files back to the student desktop, as far as I am aware; there was an opportunity to explore how we might allow students to use something like private Github repositories to persist their files in a space they control, but to my knowledge that has not been explored (a missed opportunity, to my mind…).

Using the VS Code Python extension, students installing VS Code on their own computer can connect to the Jupyter server running in the containerised VCE and (I don’t know if the permissions allow this on the hosted server).

The following tm351vce.code-workspace file describes the required settings:

{
"folders": [
{
"path": "."
}
],
"settings": {
"python.dataScience.jupyterServerURI": "http://localhost:35180/?token=letmein"
}
}

The VSCode Python extension renders notebooks, so students can open local copies of files from their own desktop and execute code cells against the containerised kernel. If permissions on the hosted Jupyter service allow remote/external connections, this would provide a workaround for synching notebooks files: students would work with notebook files saved on their own computer but executed against the hosted server kernel.

Queries can be run against the database servers via the code cells in the normal way (we use some magic to support this for the postgres database).

If we make some minor tweaks to the config files for the PostgreSQL and MongoDB database servers, we can use the VS Code PostgreSQL extension and MongoDB extension to run queries from VS Code directly against the databases.

For example, the postgres database:

image

and the mongo database:

image

Note that this is now outside the narrative context of the notebooks, although it strikes me that we could generate .sql and .json text files from notebooks that show code literally and comment out the narrative text (the markdown text in the notebooks).

However, we wouldn’t be able to work directly with the data returned from the database via Python/pandas dataframes, as we do in the notebook case. (Note also that in the notebooks we use a Python API for querying the mongo database, rather than directly issuing Javascript based queries.)

At this point you might ask why we would want to deconstruct / decompose the original structured notebook+notebook UI environment and allow students to use VS Code to access the computational environment, not least when we are in the process of updating the notebooks and the notebook environment to use extensions that add additional style and features to the user environment. Several reasons come to my mind that are motivated by finding ways in which we can essentially lose control, as educators, of the user interface whilst still being reasonably confident that the computational environment will continue to perform as we intend (this stance will probably make many of my colleagues shudder; I call it supporting personalisation…):

  • we want students to take ownership of their computational environment; this includes being able to access it from their own clients that may be better suited to their needs, eg in terms of familiarity, accessibility, productivity, etc;
  • a lot of our students are already working in software development and already have toolchains they are working with. Whilst we see benefits of using the notebook UI from a teaching and learning perspective, the fact remains that students can also complete the activities in other user environments. We should not hinder them from using their own environments — the code should still continue to run in the same way — as long as we explain how the experience may not be the same as the one we are providing, and also noting that some of the graphics / extensions we use in the notebooks may not work in the same way, or may not even work at all, in the VS Code environment.

If students encounter issues when using their own environment, rather than the one we provide, we can’t offer support. If the personalised learning environment is not as supportive for teaching and learning as the environment we provide, it is the student’s choice to use it. As with the Jupyter environment, the VS Code environment sits at the centre of a wide ecosystem of third party extesions. If we can make our materials available in that environment, particulalry for students already familiar with that environment, they may be able to help us by identifying and demonstrating new ways, perhaps even more effective ways, of using the VS Code tooling to support their learning than the enviorment we provide. (One example might be the better support VS Code has for code linting and debugging, which are things we don’t teach, and that our chosen environment perhaps even prevents students who know how to use such tools from making use of them. Of course, you could argue we are doing students a service by grounding them back in the basics where they have to do their own linting and print() statement debugging… Another might be the Live Share/collaboration service that lets two or more users work collaboratively in the same notebook, which might be useful for personal tutorial sessions etc.)

From my perspective, I believe that, over time, we should try to create materials that continue to work effectively to support both teaching and learning in environments that students may already be working in, and not just the user interface environments we provide, not least becuase we potentially increase the number of ways in which students can see how they might make use of those tools / environments.

PS I do note that there may be licensing related issues with VS Code and the VS Code extensions store, which are not as open as they could be; VSCodium perhaps provides a way around that.