Running Legacy Windows Desktop Applications Under Wine Directly in the Browser Via XPRA Containers

Okay, so here’s another way of trying to run legacy Windows applications under Wine in a Docker container via browser.

This variant improves on Running RobotLab (Legacy Windows App) Remotely or Locally in a Docker Container Under Wine Via RDP by not requiring the user to update Wine and by launching directly into either the RobotLab application or the Neural application.

If you have Docker running, you should be able to just type:

#Run RobotLab (default)
docker run --name tm129robotlab --shm-size 1g -p 3395:10000 -d ousefuldemos/tm129robotics-xpra-html5

#Run RobotLab (explicitly)
docker run --name tm129robotlabx --shm-size 1g -p 3396:10000 -e start=robotlab -d ousefuldemos/tm129robotics-xpra-html5

#Run Neural (explicitly)
docker run --name tm129neuralx --shm-size 1g -p 3397:10000 -e start=neural -d ousefuldemos/tm129robotics-xpra-html5

Here’s the Dockerfile (also see the repo):

#This container has been removed
#and the original repo archived (it used an old Linux base container)
#FROM lanrat/docker-xpra-html5

#I forked the lanrat/docker-xpra-html5 and rebuilt it using ubuntu:bionic
#https://github.com/ouseful-backup/docker-xpra-html5
FROM ousefuldemos/docker-xpra-html5

USER root

#Required to add repo
RUN apt-get update && apt-get install -y software-properties-common wget

#Install wine
RUN dpkg --add-architecture i386

RUN wget -qO- https://dl.winehq.org/wine-builds/winehq.key | apt-key add -
RUN apt-add-repository 'deb https://dl.winehq.org/wine-builds/ubuntu/ bionic main'
RUN apt update && apt-get install -y --install-recommends winehq-stable

#Install the wine packages wine wants to load if they aren't already there
#There are lots of warnings in the install but they seem to work in use?
RUN mkdir -p /home/user/.cache/wine
RUN wget http://dl.winehq.org/wine/wine-mono/4.8.1/wine-mono-4.8.1.msi -O /home/user/.cache/wine/wine-mono-4.8.1.msi
RUN wget http://dl.winehq.org/wine/wine-gecko/2.47/wine_gecko-2.47-x86.msi -O /home/user/.cache/wine/wine_gecko-2.47-x86.msi
RUN wget http://dl.winehq.org/wine/wine-gecko/2.47/wine_gecko-2.47-x86_64.msi -O /home/user/.cache/wine/wine_gecko-2.47-x86_64.msi

USER user
RUN wine msiexec /i /home/user/.cache/wine/wine_gecko-2.47-x86_64.msi
RUN wine msiexec /i /home/user/.cache/wine/wine_gecko-2.47-x86.msi
RUN wine msiexec /i /home/user/.cache/wine/wine-mono-4.8.1.msi

USER root

#Use the recipe in https://blog.ouseful.info/2019/03/11/running-microsoft-vs-code-remotely-xpra-and-rdp/
#for starting with RobotLab

#Copy over Win application folders
COPY Apps/  /opt/Apps


#Add some start commands

ADD robotlab.sh /usr/local/bin/robotlab
RUN chmod +x /usr/local/bin/robotlab

ADD neural.sh /usr/local/bin/neural
RUN chmod +x /usr/local/bin/neural



#Pulseaudio also has a switch in cmd
#Can't get this working atm...
#Does it even make sense to try?
#i.e. can XPRA HTML be used to play audio in a browser anyway?
#RUN apt-get install -y pulseaudio

#Go back to user...
USER user


ENV start robotlab

#Start with robotlab
CMD xpra start --bind-tcp=0.0.0.0:10000 --html=on  --exit-with-children --daemon=no --xvfb="/usr/bin/Xvfb +extension Composite -screen 0 1920x1080x24+32 -nolisten tcp -noreset" --pulseaudio=no --notifications=no --bell=no --start-child=${start}

#Example image pushed as 
#docker build -t ousefuldemos/tm129robotics-xpra-html5 .
#Default runs robotlab
#docker run --name tm129x --shm-size 1g -p 3395:10000 -d ousefuldemos/tm129robotics-xpra-html5
#docker run --name tm129x --shm-size 1g -p 3395:10000 -e start=robotlab -d ousefuldemos/tm129robotics-xpra-html5

One thing I’s started wondering now is: could we run this via a Jupyter notebook UI using jupyter-server-proxy (I tried, and it doesn’t seem to work atm: the proxy just goes into an infinite redirect loop?); or launch it as a standalone container using JupyterHub Dockerspawner and a Dockerfile shim to change the start CMD so things start on port 8888 (a naive attempt at that didn’t seem to work either).

At the very least, this seems to offer a reasonably natural way of launching a containerised desktop application directly into the browser?

(It would be useful to know if PulseAudio can be used to play sound from a container launched on something like Digital Ocean through the XPRA HTML5 desktop viewed in a browser before I waste any more time chasing that, and if so, it would be really handy to see a minimal working example Dockerfile ;-)

I’m guessing it should also work in the Google Cloud Run serverless container context [howto]? I still need to try this out…

The Long Road From Proof of Concept / Quick Demo Through Reference Architecture to Production System

I tinker at the level of proof of concept, playful demo and half hour hack (when I try things out, it’s my intention that I should be able to make some good progress and get something running in half an hour. It may end up taking an hour, a couple of hours, half a day, even a couple of days if I get really obsessed/frustrated and think it’s worth spending that extra time (?!) on, but the initial intention typically is: could I get something working to proof of concept level quickly?

As I’ve written before, one reason is funnels: if it takes 3 weeks to try something out, not many people will get to try it out. If you see something new in a tweet and it takes ten minutes to try, you might. And from that, whatever the thing is might get traction more widely if within that 10 minutes you see enough promise to want to spend more time on the thing. Or it might just help you on a temporary problem, and you can use it, move on, drop it, perhaps remembering it as yet another of those weirdly shaped screwdrivers that only fits very peculiarly headed screws, but it useful for them nonetheless.

Through trying lots of things of things out you also get a feel for what’s new, what’s interesting, what’s more of the same, what’s actually different. Downes knows this too…

So, playful demos. I spent a chunk of time last night trying to launch an OpenRefine container directly from Jupyterhub using a Dockerspawner. (It didn’t work.) My thinking is that being able to launch arbitrary containers from behind Jupyterhub means that have-a-go educators could co-opt Jupyterhub as a multi-user front end to launch anything in a container that returns something on port 8888. (I’m still not sure what Jupyterhub Dockerspawner requires of a container it launches (is it just an http response on port 8888?) or what it sends to the container when it tries to launch it (does it send a command to append to a ENTRYPOINT? does it send environment variables in? If you can point me to docs, transparent debug examples/logs, that’d be much appreciated).

I’ve not really used Jupyterhub and didn’t want to use The Littlest Jupyter Hub (although I guess you can change that to use Dockerspawner? Hmmm… Bah…), so it also provided an opportunity for me to find a (quick) way of firing up Jupyterhub servers.

I ended up following the recipe for the simple example in the jupyterhub/dockerspawner repo. It comes with these caveats:

This is a simple example of running jupyterhub in a docker container.

This shows the very basics of running the Hub in a docker container (mainly setting up the network). To run for real, you will want to:

– …

jupyterhub-deploy-docker does all of these things.

So: enough to get up and running, no more than that… That’s the level I tend work at.

One of the nice things about Juptyer ecosystem is that I can get started at this quick level, and produce containers that can be launched by production systems that work easily way with my quick local demo. I might even be able to tinker around with Jupyterhub customisation, tweaking style templates and so on to explore different ways of customising the presentation which might also be relevant to the final production system.

The jupyterhub/jupyterhub-deploy-docker setup, which provides a [r]eference deployment of JupyterHub with docker goes a bit further than I need for simple personal testing / proof of concept and requires more investment in setup time.

As a reference deployment, the README suggests use cases include (but are not necessarily limited to):

  • creating a JupyterHub demo environment that you can spin up relatively quickly.
  • providing a multi-user Jupyter Notebook environment for small classes, teams, or departments.

The reference deployment is useful for me because it provides a logical diagram /  architectural example showing what other things need to be considered for a production system rather than a plaything, even if the reference deployment does not demonstrate them at production strength.

(Note to self: it would be useful to annotate the reference deployment with commentary about why each piece is there and what sorts of criteria you might bring to bear when deciding one way of implementing it versus another.)

It also comes with a disclaimer:

This deployment is NOT intended for a production environment. It is a reference implementation that does not meet traditional requirements in terms of availability nor scalability.

If you are looking for a more robust solution to host JupyterHub, or you require scaling beyond a single host, please check out the excellent zero-to-jupyterhub-k8s project.

(It might also be worth noting that for a small scale production use-case, The Littlest JupyterHub (TLJH) [jupyterhub/the-littlest-jupyterhub], a “[s]imple JupyterHub distribution for 1-100 users on a single server” might also be appropriate?)

The Zero to JupyterHub with Kubernetes [jupyterhub/zero-to-jupyterhub-k8s] deployment adds complexity further, providing a comprehensive set of “[r]esources for deploying JupyterHub to a Kubernetes Cluster” (the docs are actually targeted to Google Kubernetes Engine, but we (well, not me, obvs..;-) managed to use them to bootstrap an Azure install). This is moving into production territory now (we use this for our TM112 disposable notebook optional activity), although by following the instructions, if you have a couple of hours, or perhaps half a day, to start with (rather than half an hour to start with…) plus access to a Kubernetes cluster, you can still give it a spin. (I tried last year to get it running with a local k8s cluster running via Docker on my local machine, but couldn’t get it to work at the time. It may be worth trying this again now, and finding, or posting a recipe, for doing this…)

A large part of my frustration in working at the OU arises from not being able to explore technology ideas more rapidly. It’s easy to be quick at the proof of concept level, harder to get things into production. I know that. But things like the Jupyter ecosystem provide an opportunity for end-user-development in one part of the ecosystem (eg within a container launched by Dockerspawner, or within a notebook via notebook extensions) whilst another part gets the production side right. Or even just facilitates the playfulness.

For example, yesterday I spotted this spacy-course [repo]. If you havenlt come across it, spacy is a really powerful, easy to use, natural language processing library.

The course is split into chapters, with sections in chapters and pages in sections.

Some of the sections are slide displays, with central teaching points and commentary on the side. (Methinks it should be easy enough to add an audio player to read the script on the side, which could be quite interesting?)

Other sections, containing practical activities, are arranged as collapsible elements.

The course supports code execution using MyBinder (it looks like it uses juniper.js to manage this; I wonder how easy it would be to use Voila instead?):

From looking at the repo, the course seems to have been around some time, so now I’m wondering why it took me so long to find it?!

[Prompted by @betatim, it seems there’s a backstory: the course was on DataCamp, but the course developer, @_inesmontani, got frustrated with that provider and instead “wanted to make a free version of my spaCy course so you don’t have to sign up for their service – and ended up building my own interactive app. Powered by the awesome @mybinderteam & @gatsbyjs” What’s more, “[t]he app and framework are 100% open-source and based on Markdown + custom elements. I built it for my content, but if you want to use it to publish your own DIY online course…” By the by, for a course revision, we’re looking at ways we can take all the course content out of the VLE and deliver it via our Jupyter fronted VM… There are three main reasons for this: 1) students should be allowed to take away a copy of the course materials, not just be given access to them for the duration of the course and a couple of years after; 2) getting errata addressed is a nightmare with the current document workflow — the version controlled, issue tracked, workflow we’re trying to work to improves this; 3) we’re interested in exploring how to present the course material in a more structured, searchable and interactive / interesting way. I really take heart from this spacy course example…]

I’m not sure how the content was created. If there’s a transform from Jupyter notebooks into this course format (perhaps using Jupytext, or a Jupyter Book style production route?), that could be really interesting… (At least, to me…. [REDACTED SNARK].)

If you want to try it yourself, Ines has put together this forkable [s]tarter repo for building interactive Python courses.

When it comes to production systems, end user development like this is perhaps part of the problem, though? Production systems folk don’t want end users producing things…?

PS Yes and no to that…paraphrasing something else I saw yesterday, I tend to assume excellence, and tend to only provide negative feedback. A lot of my commentary tends to be more neutral — X does this; I had to do Y then Z to get that to work; etc. As a rule of thumb, I only comment on public activities that I come across and I don’t comment on things that are only discoverable behind authentication.

On the other hand, Tracking Jupyter is a personal experiment into finding a way of providing synoptic feedback about an open system. That that community is open, and that a large number of the activities carried out within it are transparent and discoverable, makes such feedback possible.

Sometimes, my commentary comes with added snark in my personal comms channels (social media, this blog). Which is part of the point. That, and the f****g swearing, are deliberately used to limit the readership, and the willingness of people to link to the content (it’s inappropriate; not properAcademic). And they’re channels where I vent frustration.

I know how to maintain Chinese Walls. Contrary to what folk may think, I don’t blog everything. A lot of stuff that appears in this blog is only here because I can’t find anyone to engage in discussion about it internally, despite trying… And a lot of stuff doesn’t appear. (Not as much as didn’t used to appear, though, back when folk did used to talk to me…)

PPS This sort of personal comment is also, in part, a device to limit linking. Plus the blog is my personal notebook, and as such, is what it is…

;-)

Running RobotLab (Legacy Windows App) Remotely or Locally in a Docker Container Under Wine Via RDP

One of our longer running courses (TM129 — Technologies in practice) distributes a Windows desktop application (RobotLab) developed internally 15+ years ago that implements a simple 2D robot simulator.

For the last few years, we’ve been supposed to make software available on a cross platform basis. For Windows users, I think the application is recompiled every so often to cope with Windows OS upgrades, for Linux it’s distributed using Wine (I think?) and for Macs it’s bundled under PlayOnMac.

A recent issue with the Mac version prompted me to revisit my earlier attempt at producing a DIT4C Inspired RobotLab Container with a simple RDP container that a student could connect to via the Microsoft RDP (remote desktop protocol) client. (One advantage of RDP is that sound sort of works and the RobotLab activities includes one that involves sound…)

Here’s a minimal Dockerfile, derived from danielguerra69/ubuntu-xrdp:

FROM danielguerra/ubuntu-xrdp

#Required to add repo
RUN apt-get update && apt-get install -y software-properties-common

RUN dpkg --add-architecture i386

RUN wget -nc https://dl.winehq.org/wine-builds/winehq.key<

RUN apt-key add winehq.key

RUN apt-add-repository 'deb https://dl.winehq.org/wine-builds/ubuntu/ xenial main'

RUN apt update && apt-get install -y --install-recommends winehq-stable

COPY Apps/  /opt/

To build the Docker image, I put original Apps/ folder containing RobotLab and Neural folders in the same directory as the Dockerfile and then run:

docker build -t myimagetagname  .

The container then be run from that as:

docker run  --name mycontainername  --shm-size 1g -p MYMAPPEDPORT:3389 -d  myimagetagname

For reference, a version of the container can be found here — ousefulcoursecontainers/tm129rdp — and if you need an RDP client they can be found here. There's a repo here, but there are various experiments scattered across various branches and it's not very well documented / clear what's where and what's working yet and how…

If you have docker locally or remotely, the demo container can be run using:

docker run --name tm129 --hostname tm129demo --shm-size 1g -p 3391:3389 -d ousefulcoursecontainers/tm129rdp

(You should be able to run the container remotely on Digital Ocean. See here for a crib.)

In the RDP client application, create a new connection on port 3391 (or whatever you mapped in the docker run command)  as per:

Login with user: ubuntu

The password seems optional but is also: ubuntu

If you need to sudo using the terminal on the remote desktop, the password is: ubuntu

The RobotLab and Neural apps are in the /opt directory (a more recent build uses /opt/Apps, I think?).

When you first run the applications, wine wants to install several packages (gecko twice (?), mono once). (I made a start on trying to run the associated installers in the Dockerfile, but the approach I’be been taking so far doesn’t seem to work…)

#Use a base XRDP container
FROM danielguerra/ubuntu-xrdp

#Required to add a repo
RUN apt-get update && apt-get install -y software-properties-common

#Add additional repo required for wine install
RUN dpkg --add-architecture i386
RUN wget -nc https://dl.winehq.org/wine-builds/winehq.key
RUN apt-key add winehq.key
RUN apt-add-repository 'deb https://dl.winehq.org/wine-builds/ubuntu/ xenial main'
RUN apt update && apt-get install -y --install-recommends winehq-stable

#The first time wine is used it wants to download some bits...
#I've tried to to add via the Dockerfile but can't get it to work (yet?!) 
#Not sure these are the correct versions, either?
#RUN wget http://dl.winehq.org/wine/wine-mono/4.8.1/wine-mono-4.8.1.msi
#RUN wget http://dl.winehq.org/wine/wine-gecko/2.47/wine_gecko-2.47-x86.msi
#RUN wget http://dl.winehq.org/wine/wine-gecko/2.47/wine_gecko-2.47-x86_64.msi
#RUN wine msiexec /i wine_gecko-2.47-x86_64.msi
#RUN wine msiexec /i wine_gecko-2.47-x86.msi
#RUN wine msiexec /i wine-mono-4.8.1.msi

#Copy over the Windows applications we want to run under wine
COPY Apps/  /opt

#I can't seem to create a user or copy the files to that user home directory?
#If I do, I just get a black screen when I try to connect using RDP client on a Mac.

Files can be saved in RobotLab but you need to select My Documents within the wine context, the files ending up down a path:

I don’t know if there’s a way of configuring this so we can save files more directly  into the host filesystem rather than down the wine path? Would a symlink work?

If the container is halted and then restarted, any updates made to it will be immortalised in the container. So you don’t need to keep installing wine updates each time you want to use the container.

docker stop  --name tm129
docker start  --name tm129

I tried to make a cleaner build with tm129 user and the apps in a more convenient location, and then used a docker commit to make a new image, but the containers don’t seem to start up correctly from the new image.  I also tried tidying where the Apps folder was copied to, creating a tm129 user etc in the Dockerfile, but RDP didn’t seem to work thereafter (black screen on connect).

As far as to do items go, it would be nice to:

  1. create a tm129 user via the Dockerfile and install the applications to that user’s home directory;
  2. install the additional required wine packages via the Dockerfile;
  3. use a symlinked file path, or something, to make saving and loading files in the windows/wine apps a bit simpler.

If we can’t do the above as part of the build using the Dockerfile, find a way to update a container manually and then export a customised Docker image from it.

Nice to haves would be desktop icons pointing to the course applications. A demo of how to start a container that launches the RobotLab application on start (or an image that launches the remote desktop into one of the applications) could also be handy as a reference.

Motorsport Stats — A Comprehensive Free Source of Motor Racing Sports Results?

In passing, I note the launch of Motorsport Stats, “the sport’s pre-eminent provider of motorsport results, live data and and visualised racing analytics for media owners, rights-holders, bookmakers and broadcasters”, apparently… It looks like it’s free, but I’m not sure what the license terms are yet…

A part of Autosport Media, it looks as if it wraps several motorsport results databases, including Forix.

F1 stats looks like a scrape of the FIA timing sheets:

and whilst there are some graphics, I’ve no idea how you find them other than by luck…

The WRC Results go back a good for years, but the presentation is limited to stage / overall classifications presented in a not totally useful way to my mind:

By way of reference, here’s how I’m currently looking at stage results:

and overall classification:

Part of the business model seems to be selling things like widget displays. For example, here’s some marketing blurb from last year for the  Motorsport Stats Formula One widget suite.

All_and_doodles

They’re also into the production of dataviz products, from social media infographics to video graphics using Vizrt.

Hmmm… thinks… it’s been ages since I did any TV sports graphics round-ups… looks to be at least a couple of years going by these examples: Augmented TV Sports Coverage & Live TV Graphics and Behind the Scenes of Sports Broadcasting – Virtual Sets, VIrtual Signage and Virtual Advertising, etc.

I’ve seen a couple of mentions in reports about an API, but haven’t found any docs around that yet. A peak at browser developer tools suggests JSON API calls, which is handy:

It shouldn’t be too much of a chore (?!) to create a wrapper around the API if it plays nice and uses structured URLs with obvious ways of picking out record IDs / keys out of the JSON to use with the structured URLs:

If you’ve spotted docs for the API, found a Python wrapper for it, or even created your own, please let me know:-)

PS I keep meaning to get restarted with my Wrangling F1 Data With R book, or perhaps in a “More Data Wrangling…” version, or perhaps “…With Python”. But there never seems to be enough hours in the day…

Simple Script for Grabbing BBC Programme Info

It’s been a long time since I used to play with BBC data (old examples here) but whilst trying to grab some top level programme names / identifiers / descriptions of BBC programme content that we might be able to make use of in an upcoming course revision, I thought I’d have a quick play to so see if any of the JSON feeds were still there.

It seems they are, so I popped together a quick throwaway thing for grabbing some programme info by series and programme ID, with the code available via this gist.

(Folk don’t necessarily believe that I write code every day. But I do, most days. Things like this… “disposable helper scripts”, developed over a coffee break rather than doing a Killer Sudoku or trying a crossword, treated as a simple throwaway coding puzzle with the off chance that they’re temporarily OUseful.)

PS I’ve think I’ve posted noticings like this before, but always useful to remark on every so often… Using contextual natural / organic advertising for recruitment…

BBC_-_Programmes.png

I wonder what percentage of computing academics are aware that such things go on?!

API Secrets Mean the End of Standalone Single Page Web Apps and Why Serverless Isn’t Exactly What it Says on the Tin

The web used to be such a simple place: write some HTML+CSS, perhaps add in a little bit of javascript, all of it readable via View Source, upload it to a webserver somewhere, and job done.

It soon got a bit more complicated. If you run a company that’s actually doing something with information retrieved from the web page, you need scripts on the backend:

But the web page code was often still quite straightforward to look through.

The arrival of web APIs meant that you could call on other people’s webservices and add them into your own. Want to add a Google map to your web page for example? That could all be done, via Javascript, in the browser client:

What this meant was that you could build quite rich applications with just HTML, CSS and Javascript; you didn’t necessarily need to run server scripts and you could het away with very simple web hosting: nothing more than a place to upload you files. Any execution of (Javascript) code would happen on the client side, in the browser.

In recent years, however, there has been a tendency for web APIs to require an API token to access the API. The token allows the API publisher to track, and limit, use of the API from a particular user; tokens should also be kept secret.

So for example, you can’t just call the Youtube API to search for videos on a particular topic. Instead, you have to make a request using your key as part of the search query. You can still list the results and add video players to your page which will play without the need for a token, but the initial API request needs a key.

What this does is place a burden on someone wanting to call the API. No longer can they publish a single web page app, where everything is contained in a single web page. Instead, they have do do some code execution on the server side so they can keep their token secret:

A single page, self-contained web app is still possible, but you give your token away so that anyone else can use it, and run up (mis)usage on your account…

This is a real pain and represents another way in which powerful features (the ability to easily call third party APIs) is denied to novices, making the on ramp harder. You can’t just hack together some HTML’n’JS and post it somewhere, letting the browser run the Javascript code and calling the third party APIs. You need to solve another problem first — finding somewhere to run chunks of your code in private to call the API services using your private key, building the page  based on the response from the API and then serving that in public.

Just by the by, web pages have also got harder to read when you View Source. Many of them dynamically create HTML elements from Javascript, which means if you View Source, all you see is the JS code.

(You can still get to see the HTML, but it means you have to go into the browser’s developer tools area so you can inspect what the browser is actually rendering.)

Another shift that is happening is the shift to so called “serverless” models for publishing web applications. In this case, the idea is not that you upload your website code (HTML, Javascript, CSS, server scripts) to a web server somewhere, but you upload it to a potential webserver. When a request to view the website comes in, a server is launched dynamically to serve the application.

If your application is on a server that was created for a previous visitor, and that server is still running, will be used to serve the application rather than a new one being launched. Unless your application is receiving lots of visitors at the same time, in which case the serverless host may autoscale it for you, and launch additional servers running your web application to cope with the load. (Autoscaling can also be applied to “traditional” web servers.)

What’s important to realise about serverless web application architectures is that they aren’t serverless: a web server is still required to serve the application. The serverless property means that there is no webserver unless someone makes a request to view the site.

For some examples of serverless approaches via lambda functions, see Implementing Slack Slash Commands Using Amazon Lambda Functions – Getting Started and Searching the UK Parliament API from Slack Slash Commands Using a Python Microservice via Hook.io Webhooks.

One of the early, easy to use serverless providers was Zeit.Now (see for example: Publish Static Websites, Docker Containers or Node.js Apps Just by Typing: now). An attractive feature of the Zeit offering was that it could launch web applications packaged in Docker containers in a serverless way, but this seems to have been deprecated  recently (for example, see this post from Simon Willison).

However, it seems that Google have just announced a new service that launches Docker containers in a serverless way: Google Cloud Run. For a third party review, see here.

 

Fragment: Components for Rolling Your Own GIS Inside Jupyter Notebooks

I’ve been tinkering with maps again… I think we really do need to start paying more attention to them. In particular, how we can help students to create them in concert with data that has geospatial relevance.

Last night, preparing for a Skyped 9am meeting this morning, I was looking for a recipe for allowing students to draw a shape on a map, get polygon data describing the shape back from the map, and then use that polygon data to do a lookup for crime figures or OSM streets within that boundary.

My recipe was, as ever, a bit hacky, but it was a first step and it was gone past 1am…

The recipe took the following form:

#gdf is a (geo)pandas dataframe
#Get the lat/lon of the the first point (i.e. an arbitrary point)
example_coords = gdf[['lat','lon']].iloc[0].values
#Generate a URL to open a geojson.net at that location and zoom into it
print('https://geojson.net/#12/{}/{}'.format(example_coords[0],example_coords[1]))

The printed out link is actually clickable…

Draw the shape and grab the geoJSON:

#Paste the JSON data below so we can assign it to the jsondata variable
jsondata='''
{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [-0.960474, 51.165091],
            [-0.988426, 51.154801],
            [-1.007309, 51.141879],
            [-0.971955, 51.135133],
            [-0.945854, 51.151786],
            [-0.960474, 51.165091]
          ]
        ]
      }
    }
  ]
}
'''

We can then parse that into a Shapely object:

#If the json data is a FeatureCollection with a single features list item
# we can parse it as follows...
import json

#Load the json string as a python dict and extract the geometty
gj = json.loads(jsondata)['features'][0]['geometry']

from shapely.geometry import shape
#Generate and preview the shape
shape( gj )

#Alternatively, we can let geopandas handle the conversion
gpd = geopandas.GeoDataFrame.from_features( gj )

#Preview the first geometry item
gpd['geometry'].iloc[0]

So that… works… but it’s a bit clunky….

Let’s recap what we’re doing: go to geojson.net, draw a shape, get the JSON, turn the JSON into a Shapely shape, derive boundary data from the shape that we can use for searches within the shape.

But can we do better?

The ipyleaflet Jupyter notebook extension has been coming on leaps and bounds recently, and now includes the ability to embed ipywidgets within the map.

It also includes a draw control that lets us draw shapes on the map and get the corresponding geojson back into the notebook (code) environment.

Via StackOverflow, I found a handy recipe for getting the data back into the notebook from the shape drawn on the map. I also used the new ipyleaflet ipywidget integration to add a button to the map that lets us preview the shape:

#via https://gis.stackexchange.com/a/312462/119781
from ipyleaflet import Map, basemaps, basemap_to_tiles, DrawControl, WidgetControl
from ipywidgets import Button

from shapely.geometry import shape

watercolor = basemap_to_tiles(basemaps.Stamen.Watercolor)
m = Map(layers=(watercolor, ), center=(50, -1), zoom=5)
draw_control = DrawControl()

draw_control.circle = {
    "shapeOptions": {
        "fillColor": "#efed69",
        "color": "#efed69",
        "fillOpacity": 1.0
    }
}

feature_collection = {
    'type': 'FeatureCollection',
    'features': []
}

def handle_draw(self, action, geo_json):
    """Do something with the GeoJSON when it's drawn on the map"""    
    feature_collection['features'].append(geo_json)

draw_control.on_draw(handle_draw)

m.add_control(draw_control)

button = Button(description="Shape co-ords")

def on_button_clicked(b):
    #Generate and preview the shape
    display(shape(feature_collection['features'][0]['geometry']))
    #print(feature_collection)

button.on_click(on_button_clicked)

widget_control = WidgetControl(widget=button, position='bottomright')
m.add_control(widget_control)
m

#Draw a shape on the map below... then press the button...
#The data is available inside the notebook via the variable: feature_collection

This is the result:

The green shape is the rendering of a shapely object generated from the shapefile extracted from the drawn shape on the map. It looks slightly different to the one on the map because “projections, innit?”. But it is the same. Just displayed in a differently projected co-ordinate grid.

So that’s maybe an hour last night working on my own hacky recipe and finding the SO question last night, half an hour today trying out the SO recipe and another half hour on this blog post.

And the result is a recipe that would allow us to easily draw a shape on a map and grab some data back from it….

Like this for example…

Start off by getting the boundary into the form required by the Police API:

drawn_area = shape(feature_collection['features'][0]['geometry'])

#Get the co-ords
lon, lat = drawn_area.exterior.coords.xy
drawn_boundary = list(zip(lat, lon))

Then we can look-up crimes in that area and pop the data into a geopandas dataframe:

import pandas as pd
import geopandas
from shapely.geometry import Point

#get crime in area
from police_api import PoliceAPI
api = PoliceAPI()


def setCrimesAsGeoDataFrame(crimes, df=None):
    ''' Convert crimes result to geodataframe. '''
    if df is None:
        df=pd.DataFrame(columns = ['cid','type', 'month','name','lid','location','lat','lon'])
        #[int, object, object, int, object, float, float]
    for c in crimes:
        df = df.append({'cid':c.id,'type':c.category.id, 'month':c.month, 'name':c.category.name,
                            'lat':c.location.latitude,'lon':c.location.longitude, 
                        'lid':c.location.id,'location':c.location.name }, ignore_index=True)

    df['lat']=df['lat'].astype(float)
    df['lon']=df['lon'].astype(float)
    
    df['Coordinates'] = list(zip(df['lon'], df['lat']))
    df['Coordinates'] = df['Coordinates'].apply(Point)
    
    return geopandas.GeoDataFrame(df, geometry='Coordinates')

crimes_df = setCrimesAsGeoDataFrame( api.get_crimes_area(drawn_boundary, date='2019-01') )
crimes_df.head()

Then we can plot the data on a map:

from ipyleaflet import Marker, MarkerCluster
import geopandas

m = Map( zoom=2)
m.add_layer(MarkerCluster(
    markers=[Marker(location=geolocation.coords[0][::-1]) for geolocation in crimes_df.geometry])
    )
m

(Cue another 90 mins that should only have been 20 mins because I made a stupid mistake of centering the map on (51, 359), as cut and pasted from code-on-the-web, rather than a sensible (51, -1) and then spent ages trying to debug the wrong thing…)

We can then do another iteration and create a map with a draw control that lets us draw a shape on the map and button that will take the shape, look up crimes within it, and plot corresponding markers on the same map…

m = Map(layers=(watercolor, ), center=(50.9,-1), zoom=9)

draw_control.on_draw(handle_draw)

m.add_control(draw_control)

button = Button(description="Shape co-ords")

feature_collection = {
    'type': 'FeatureCollection',
    'features': []
}

def on_button_clicked2(b):
    ''' Get the crimes back within the location. '''
    crimes_df = setCrimesAsGeoDataFrame( api.get_crimes_area(drawn_boundary, date='2019-01') )
    m.add_layer(MarkerCluster(
        markers=[Marker(location=geolocation.coords[0][::-1]) for geolocation in crimes_df.geometry])
    )

button.on_click(on_button_clicked2)

widget_control = WidgetControl(widget=button, position='bottomright')
m.add_control(widget_control)
m

Here’s how it looks:

For some reason it also seems to be pulling reports from just outside the boundary area? But it’s close enough…. and we could always post-filter the data returned from the Police API to select just the crimes reported within the drawn boundary.

Related notebooks are in this repo: https://github.com/psychemedia/crime-data-demo. The above is demoed in the Initial Look At the Data notebook, I think. The repo should be fully Binderised (I think).

PS Getting this far is easy, and demonstrates one of the differences between making “functional” code available to students and “polished” apps. If you draw lots of shapes, or the wrong sort of shape on the map, things will go wrong. The code to handle the complex shapes will have to get more defensive and more complex, either telling students that the shapes profile is not acceptable, or finding ways of handling multiple shapes properly. At the moment, if you want to use the same map to clear the shape and get some new data in, any crime markers on the canvas are not cleared. So that’s a usability thing. And fixing it adds complexity to the code and complexity (possibly) to the UI. Or you could just rerun the cell and you will start with a fresh map canvas.