Fragment: A Personal Take on Anaconda

Anaconda is a cross-platform (Windows / Mac OS X / Linux) Python distribution developed commercially by Continuum Analytics, Inc. and built around the conda package management system. It is available as a free open source individual edition (docs), as well as various commercial editions.

The distribution includes:

  • a comprehensive scientific computing stack (including things like Pandas, IPython, Jupyter tools, NumPy, SciPy, Matplotlib etc by default.
  • a desktop development application (the Spyder editor);
  • a cross-platform package manager that is capable of bundling operating system packages as well as language specific packages (Python, R etc);
  • support for separate conda environments/user namespaces.

The Anaconda distribution has several negative side-effects:

  • the download size and footprint on disk is large;
  • there are side-effects:
    • Anaconda may rewrite paths that clobber already installed software;
    • there may be unwanted elements in the distribution (eg the Spyder application, unrequired packages etc);
  • creating and testing new Anaconda packages, often customised for each operating system, is often complex;
  • packages releases may lag official releases and not all release versions may be available;
  • dependency reconciliation / conflicts can often causes issues when installing packages;
  • finally, the installation process is not always smooth…

Using custom distributions via Miniconda can reduce the overhead of unused / unwanted packages and components but this adds to maintenance and testing overheads.

In terms of flexibility, conda environments can be configured for various languages such as Python and R. This means that different environments could be configured for different modules. However, the overhead on any given module teaching students how to use environments often means that students work in a single default environment, which can cause issues if students are required to install Anaconda for different modules. The solution is to teach students how to use conda environments, but this adds weight to an individual module, does not necessarily appear to add value to a specific module, and may cause issues of its own (such as students being confused about what environment they are working in, how to move between environments and so on).

There are good arguments to teach students in general about namespaces and environments in computational environments at level 1 (Python environments, Jupyter notebook kernels, conda environments), and develop skills in using them, but this is more of a qualification / cross-module / generalist skill / benefit, so no module is willing to use its teaching time budget to develop this not immediately or apparently directly relevant skill.

As a scientific computing stack, Anaconda is arguably limited in terms of the sorts of environment or application it can distribute. Anaconda can be used to deploy environments for developing software or running software. To deploy and run an application may require installing particular packages (perhaps in a particular conda environment to keep them isolated from other applications) and then run an application server.

When it comes to providing consistency or environment between hosted solutions provided by the institution, or by third parties, the need to run the Anaconda on some operating system means that we cannot guarantee that students are working in exactly the same environment with the same user interface or performance across platforms or across local and hosted deployments.

For deploying simple applications, some languages may have support for creating self-contained executable packages capable of running on various platforms (for example, in the case of Python, ) but as these are not generalisable, they are of limited interest.

A more general approach is to deploy software environments and applications using a virtual machine of some sort (either a “full” virtual machine such as a provided by VirtualBox or VMWare) or using containerised software such as Docker containers. The use of virtualised software is more general — in producing the deployment we have full control of over the contents of the environment, such as the operating system and a conda environment installed into it — and the student will run exactly the same software wherever it is deployed (locally, institutionally hosted, third party hosted).

One disadvantage of the virtualised approach is that it makes it harder to deploy desktop applications: the virtualised environment does not have access to the host desktop. Running desktop applications inside a vritual machine and rendering them on the host desktop is possible but may require complex configuration and the installation of additional window management components or some sort of bridge. That said, virtual machine applications such as VirtualBox do provide a visual desktop environment if required, and VMs and containers alike can also publish desktops to remote desktop applications such as the cross-platform Microsoft Remote Desktop (RDP) client or via a browser using tools such as noVNC or XPRA. However, handling video or audio may be problematic with such clients.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: