Some Catching Up To Do On Data, AI and Robotics Policy Readings…

I don’t know if it’s an age thing, or the fact I try to kill the laptop after 9pm, but I feel as if time for tinkering and reading and playing with tech is beyond me at the moment…

So for example, the stuff I’m linking to here are things I haven’t read (but should…) and that I hope to come back to… (You may also notice a new blog tag for this post: TO DO :-(

  • From last month, Royal Society policy report on Machine Learning [PDF]; (incl. how data is collected and processed – i.e. why you should care about it); by the by, I suspect this post from Dale Lane will complement the Royal Society report nicely: Introducing Machine Learning to kids; it’s a lovely blend of policy ideas and accessibly explained tech principles.
  • And another recent report, from the Responsible Robotics lobbying group, on Our Sexual Future with Robots [PDF]. One comment I saw around this was the notion of regarding “sex robots as pornography”; it’d be quite interesting to spin round an appropriate part of something like Pimp State, critiquing the various transfers that arise from treating people as objects vs. objects as people, and how that might in turn influence social relationships, in the context of robots.
  • I’ve completely slipped behind in reports on data brokers (often these are of interest from the way they pick up on – and get quoted in – other reports; seeing which lobbying groups and consulting forms are involved in producing the reports can also be illuminating…) So for example, from George Soros’ Open Society Foundations we have Data Brokers in an Open Society [PDF. The US Federal Trade Commission (FTC) have a couple of reports I don’t recall reading before: Data Brokers: A Call For Transparency and Accountability (May 2014) [PDF] and Big Data: A Tool for Inclusion or Exclusion? Understanding the Issues (January, 2016) [PDF]. The UK Competition and Markets Authority (CMA) also had a related consultation report from 2015 on the Commercial use of Consumer Data [PDF];
  • I don’t tend to follow security topics much, but as I try to tell OU students in the rolling general interest robotics talk I tend to give at residential school, following codes of conduct, upcoming legislation and Health and Safety Exec announcements can often provide indicators about emerging technologies based on any legislative changes required to enable their widespread adoption and roll-out. So from the HSE’s New operational guidance page, I discovered this report on Cyber Security for Industrial Automation and Control Systems (IACS) [PDF]. The HSE Foresight Report (2016) [PDF] is probably also worth a read.

By the by, the only way I’m going to get to read these is probably to pop them into a physical reader… I don’t tend to print stuff out on a printer much any more, but I do use Lulu… So here are a couple of possible useful tricks: Joining PDF files in OS X from the command line (which refers you to /System/Library/Automator/Combine\ PDF\ Pages.action/Contents/Resources/; or maybe this python package? PyPDF2 [docs].

Here’s a recipe for downloading and bundling the docs into a single PDF:


!mkdir -p toread

for pdf in pdfs:
    !wget {pdf} -P toread/ 

!pip3 install pypdf2
!rm bigRead.pdf

from PyPDF2 import PdfFileMerger, PdfFileReader
merger = PdfFileMerger()
for pdf in pdfs:


I guess I could have also bookmarked the PDF URLs and then pulled down the corresponding bookmark feed to access the PDF URLs.

Having got a single monolithic PDF, I can now upload it to Lulu as a private/preview book and get a copy of all the docs in a handy paperback book reader…

But Google’s Email Scanning, in General, Isn’t Going to Stop, Is it?

A blog post, last week, from the Goog – As G Suite gains traction in the enterprise, G Suite’s Gmail and consumer Gmail to more closely align:

G Suite’s Gmail is already not used as input for ads personalization, and Google has decided to follow suit later this year in our free consumer Gmail service. Consumer Gmail content will not be used or scanned for any ads personalization after this change.

So presumably, then, Google was doing something functionally similar to the below, where emails are scanned and the results of the scanning are used directly to support personalised ad delivery:

But even though scanning “for ad personalisation” will stop, scanning of email will presumably continue – for spam detection, in support of antivirus/malware protection, to extract structured information that can be passed to other Google services…

As an example of the latter, I just booked a hotel room and the email confirmation went to my Gmail account. As if by magic, the booking details are extracted and placed in my Google calendar, which then synchs to my phone calendar: book a room online, and booking details appear the calendar on my phone.

Really useful…


Does that mean my emails can still be used to power ad personalisation, albeit at one step remove?

For example, will Google still consider events in Google calendar as fair game for generating personalised ads?

Here, the emails aren’t being used directly to drive ad personalistion, although derived versions of the data scraped from the emails are…

PS The above diagrams were generated using actdiag, a “simple activity-diagram image generator”, which runs as an online service and is also available as a python package (though it didn’t work for me in Jupyter notebooks last time I tried). To create the diagram, you write it, and the rendering is done by machine.

Here’s the script I used to create the first diagram:

  Email -> EmailScanner -> AdPersonalise -> AdDeliver -> PersonalisedAds;

  lane Google {
  lane Gmail {
  lane Scanner {
  lane AdSense {
    AdPersonalise; AdDeliver;

And here’s the second.

   Email -> EmailScanner -> Calendar  -> EventScan -> AdPersonalise -> AdDeliver -> PersonalisedAds;

  lane Google {
  lane Gmail {
  lane GoogleCalendar {
  lane Scanner {
    EmailScanner; EventScan;
  lane AdSense {
    AdPersonalise; AdDeliver;

The same service supports other chart types too.

HexJSON HTMLWidget for R, Part 3

In HexJSON HTMLWidget for R, Part 1 I described a basic HTMLwidget for rendering hexJSON maps using d3-hexJSON, and HexJSON HTMLWidget for R, Part 2 described updates for supporting colour.

Having booked off today for emergency family cover that turned out not to be required, I had another stab at the package, so it now supports the following additional features…

Firstly, I had a go at popping some “base” hexjson files into a location within the package from which I could load them (checkin). Based on a crib from here, which suggests putting datafiles into an extdata folder in the package inst/ folder, from where devtools::build() makes them available in the built package root directory.

hexjsonbasefiles <- function(){
  list.files(system.file("extdata", package = "hexjsonwidget"))

With the files in place, we can use any base hexjson files included included in the package as the basis for hexmaps.

I also added the ability to switch off labels although later in the day I simplified this process…

One thing that was close to the top of my list was the ability to merge the contents of a dataframe into a hexJSON object. In particular, for a row identified by a particular key value associated with a hex key value, I wanted to map columns onto hex attributes. The hexjson object is represented as a list, so this required a couple of things: firstly, getting the dataframe data into an appropriate list form, secondly merging this into the hexjson list using the rlist::merge() function from the rlist package. Here’s the gist of the trick I ended up with, which was to construct a list split() from each row in a dataframe, with the rowname as the list name, using lapply(.., as.list):

ll=lapply(split(customdata, rownames(customdata)), as.list)
jsondata$hexes = list.merge(jsondata$hexes, ll)

A hexjsondatamerge(hexjson,df) function takes a hexjson file and merges the contents of the dataframe into the hexes:

The contents of a dataframe can also be merged in directly when creating a hexjsonwidget:

Having started to work with dataframes, it also seemed like it might be sensible to support the creation of a hexjson object directly from a dataframe. This uses a similar trick to the one used in creating the nested list for the merge function:

hexjsonfromdataframe <- function(df,layout="odd-r", keyid='id',
                                 q='q', r='r'){

  rownames(df) = df[[keyid]]
  df[[keyid]] = NULL
  colnames(df)[colnames(df) == q] = 'q'
  colnames(df)[colnames(df) == r] = 'r'

       hexes=lapply(split(df, rownames(df)), as.list))


As you might expect, we can then use the hexjson object to create a hexjsonwidget:

A hexjsonwidget can also be created directly from a dataframe:

If we wanted to save the hexjson to a file, we could do something like: write( toJSON( jjx ), "test_out.hexjson" ).

In creating the hexjson-from-dataframe, I also refactored some of the other bits of code to simplify the number of parameters I’d started putting into the hexjsonwidget() function, in effect overloading them so the same named parameter could be used in different supporting functions.

I think that’s pretty much it from the developments I had in mind for the package. Now all I need to do is put it into practice… testing for which will, no doubt, throw up issues!)

PS Not quite it… I just added some simple file handlers too: to save the hexjson to a file, use hexjsonwrite(df, filename) and to read a json/hexjson file into a hexjson object use hexjsonread(filename).

HexJSON HTMLWidget for R, Part 2

In my previous post – HexJSON HTMLWidget for R, Part 1 – I described a first attempt at an HTMLwidget for displaying hexJSON maps using d3-hexJSON.

I had another play today and added a few extra features, including the ability to:

  • add a grid (as demonstrated in the original d3-hexJSON examples),
  • modify the default colour of data and grid hexes,
  • set the data hex colour via a col attribute defined on a hexJSON hex, and
  • set the data hex label via a label attribute defined on a hexJSON hex.

We can now also pass in the path to a hexJSON file, rather than just the hexJSON object:

Here’s the example hexJSON file:

And here’s an example of the default grid colour and a custom text colour :

I’ve also tried to separate out the code changes as separate commits for each feature update: code checkins. For example, here’s where I added the original colour handling.

I’ve also had a go at putting some docs in place, generated using roxygen2 called from inside the widget code folder with devtools::document(). (The widget itself gets rebuilt by running the command devtools::install().)

Next up – some routines to annotate a base hexJSON file with data to colour and label the hexes. (I’m also wondering if I should support the ability to specify arbitrary hexJSON hex attribute names for label text (label) and hex colour (col), or whether to keep those names as a fixed requirement?) See what I came up with here: HexJSON HTMLWidget for R, Part 3.

HexJSON HTMLWidget for R, Part 1

In advance of the recent UK general election, ODI Leeds published an interactive hexmap of constituencies to provide a navigation surface over various datasets relating to Westminster constituencies:

As well as the interactive front end, ODI Leeds published a simple JSON format for sharing the hex data – hexjson that allows you to specify an identifier for each, some data values associated with it, and relative row (r) and column (q) co-ordinates:

It’s not hard to imagine the publication of default, or base, hexJSON documents that include standard identifier codes and appropriate co-ordinates, e.g. for Westminster constituencies, wards, local authorities, and so on being developed around such a standard.

So that’s one thing the standard affords – a common way of representing lots of different datasets.

Tooling can then be developed to inject particular data-values into an appropriate hexJSON file. For example, a hexJSON representation of UK HEIs could add a data attribute identifying whether an HEI received a Gold, Silver or Bronze TEF rating. That’s a second thing the availability of a standard supports.

By building a UI that reads data in from a hexJSON file, ODI Leeds have developed an application that can presumably render other people’s hexJSON files, again, another benefit of a standard representation.

But the availability of the standard also means other people can build other visualisation tools around the standard. Which is exactly what Oli Hawkins did with his d3-hexjson Javascript library, “a D3 module for generating [SVG] hexmaps from data in HexJSON format” as announced here. So that’s another thing the standard allows.

You can see an example here, created by Henry Lau:

You maybe start to get a feel for how this works… Data in a standard form, standard library that renders the data. For example, Giuseppe Sollazzo (aka @puntofisso), had a play looking at voter swing:

So… one of the things I was wondering was how easy it would be for folk in the House of Commons Library, for example, to make use of the d3-hexjson maps without having to do the Javascript or HTML thing.

Step in HTMLwidgets, a (standardised) format for publishing interactive HTML widgets from Rmarkdown (Rmd). The idea is that you should be able to say something like:

hexjsonwidget( hexjson )

and embed a rendering of a d3-hexjson map in HTML output from a knitred Rmd document.

(Creating the hexjson as a JSON file from a base (hexJSON) file with custom data values added to it is the next step, and the next thing on my to do list.)

So following the HTMLwidgets tutorial, and copying Henry Lau’s example (which maybe drew on Oli’s README?) I came up with a minimal take on a hexJSON HTMLwidget.


It’s little more than a wrapping of the demo template, and I’ve only tested it with a single example hexJSON file, but it does generate d3.js hexmaps:




It also needs documenting. And support for creating data-populated base hexJSON files. But it’s a start. And another thing the hexJSON has unintentionally provided supported for.

But it does let you create HTML documents with embedded hexmaps if you have the hexJSON file handy:

By the by, it’s also worth noting that we can also publish an image snapshot of the SVG hexjson map in a knitr rendering of the document to a PDF or Microsoft Word output document format:

At first I thought this wasn’t possible, and via @timelyportfolio found a workaround to generate an image from the SVG:


, export_widget( )
), viewer=NULL) %>%
webshot( delay = 3 )

But then noticed that the PDF rendering was suddenly working – it seems that if you have the webshot and htmltools packages installed, then the PDF and Word rendering of the HTMLwidget SVG as an image works automagically. (I’m not sure I’ve seen that documented – the related HTMLwidget Github issue still looks to be open?)

See also: HexJSON HTMLWidget for R, Part 2, in which support for custom hex colour and labeling is added, and HexJSON HTMLWidget for R, Part 3, where I add in the ability to merge an R dataframe into a hexjson object, and create a hexjsonwidget directly from a dataframe.

BYOA (Bring Your Own Application) – Running Containerised Applications on the Desktop

Imagine this: you’re on a Mac, though at times you’re on a Windows box, and at other times you’ve just got a browser to hand. You’re happy using applications that get stuff done, even if it means using different tools for different purposes, but you get fed up having to find applications that you can install across all the different platforms you work on. Not least because sometimes the things don’t seem to want to install with all sorts of other stuff being installed first.

Or maybe this: you want to run some data analysis using Python, or R, in Jupyter notebooks, or RStudio, and the dataset you want to work with is quite large; which means you also need to hook up to a database, or a query engine, and get the data into a form that the database or query engine can cope with, as well as hooking up your analysis environment (RStudio, or Jupyter notebooks) to what is now a queryable datasource.

So what do you do?

Here’s what I do: I use a technology that’s been developed for another purpose, and I make it personal.

The technology is – are – Docker containers. Docker is a brand, but in part it’s the same way that Hoover, and more recently, Dyson, are brands. Say: Hoover, or Dyson, and you quite possibly mean: vacuum cleaner. Say “Docker”, and you quite possibly mean lightweight virtual machine using Linux Containers, or Open Container Initiative (OCI) containers. But Docker is easier to say/remember. And it also refers to a complete technology stack – and toolset – for working with those containers. (Read more….)

(You can skip this paragraph – it’s not how I use Docker at all…) Docker is most widely used by the people who develop web applications, and who keep web applications running at scale. It’s a devops tech (devops, short for development/operations). The devs hack the code, the ops folk get that code running for users. Traditionally, these were separate teams; now, there is some overlap, particularly if you’re running on agile and continuous integration and/or continuous deployment are your thing. Docker, in short, is something your more curious IT folk may know something about. Or not.

But I don’t care about that. Not least, because they probably don’t think about Docker in the way I do…

What I care about are some of the features that Docker has, and how I can use those features to make my own life easier, irrespective of how the folk who develop Docker, or the folk in IT, think it should be used; which is probably more along the lines of helping enterprise IT deliver enterprise across the whole organisation, rather than supporting personal, DIY, BYOA (“bring your own app”) IT that works at an individual level in the form of end-user applications, or personal digital workbenches (I’ll come on to those in a moment…)

Take the first example mentioned at the start – wanting to run the same application pretty much anywhere, without the installation grief. For example, suppose you wanted to run the OpenRefine, a rather handy browser based tool for cleaning and filtering datasets. Installing OpenRefine requires that you download and install an appropriate version of Java, as well as downloading and installing OpenRefine. What do you do?

Here’s what I do: I use Docker (which yes, does require me to download and install Docker. But I’ll pick up on that later…) And I use Kitematic (which, admittedly, I have to install from the Docker toolbar menu; note I could also launch containers from the command line…).

But first a bit of jargon…

Docker Images versus Docker Containers

One way off thinking of a docker image is as an installer file. It’s the thing you download that you can then use to install your application. In the case of docker, the image file identifies the operating system requirements of the application, as well as the application code itself. The docker application uses this information to launch a docker container, which is an instance of your application. Note that you can run multiple instances of the same image at the same time (that is, multiple containers launched from the same image). This is different to the model of having multiple documents open in the same application, such as multiple spreadsheets or word documents open in Microsoft Excel or Word. Instead, it would be akin to running several different instances of Excel or Word, with the documents open in one copy of Word isolated from and invisible to another running copy of Word.

The containers themselves can also be running or stopped; a stopped container is one that is essentially hibernating, and whose operation can be resumed. Any changes made to the contents of a container are retained if the container is stopped and restarted. Containers can also be destroyed, which is like uninstalling the application and removing all its install files. When a new container is launched, it’s a bit like reinstalling and starting a fresh installation of the application from the original installer (the docker image).


Kitematic is a desktop application (for Mac and Windows) that looks a bit like an ap store.

Using Kitematic, I can search for an OpenRefine image – in this case, ou-tm351-openrefine-test, which happens to be an image I created myself – that has been published via the Dockerhub public docker image registry.

I create the container – which downloads a Docker image that contains a self-contained operating system and all the files that OpenRefine needs to run as well as the application itself – and starts it running.

I copy the URL – in this case localhost:32771, and paste it into my browser location bar:

You may notice the Kitematic page also had a folder highlighted – this is a folder shared between the container and my desktop. Project files will be save to that folder. When I have finished with the application, I can stop it, and then restart it later. Or I can destroy it. This kills the container and removes all its files, but the base image is still on my computer. If I want to run OpenRefine again, I won’t need to download any files. But – and this is important – the application container that is started up will be completely fresh, run as-if for the the first time from the image I downloaded. Like a reinstalled application. Which is effectively what it is. But the files stored on my desktop will be shared back with it.

Stopping/starting or restarting the container is like switching it off and on again. Deleting the container in Kitematic, then searching for it again and creating a new container uses the previously downloaded files and essentially reinstalls the application.

If I want to run another application, I can  – there are lots of images available. For example, if you want to run a local version of the Dillinger markdown editor, you can find the original of that:

Note, however, that here the lack of love Kitematic has received in the last few years – I imagine because Docker are more interested in enterprise use of Docker, not personal use – because if you just try to create the container, it won’t work (you’ll get an error message). Instead, you need to specify the version of the editor you want to use explicitly. Click the dots and in order to select a tag (that is, a version):

Choose the tag/version, and close the selection dialogues – then click on the Create button:


This time, the homescreen is a little more helpful, providing a link that will launch the app in the browser. You may also get a preview of the screen (sometimes this doesn’t appear; click the Settings tab, then the Home tab, and it may refresh. As I mentioned above, Kitematic doesn’t seem to have had a lot of love since Docker bought it…)

Click the link button, and you should be redirected to the app running in your browser:

So that’s part of the story…

Viewing Application UIs and Launching Applications from Shortcuts

As well as using Kitematic to launch containerised applications, we can also use desktop shortcuts. I mentioned how we can use docker commands on the command like to launch an application – what Kitematic does is simply generate and execute those commands, and then display status and configuration information about the running container – so we can just pop equivalent commands into a shortcut, click on the shortcut and launch the application.

This is what Jessie Frazelle does with her Docker Containers on the [Mac] Desktop. The corresponding desktop-container library shelf is here: jessfraz/dockerfiles and associated shortcut definitions here: desktop-container commands. The applications are packaged to use X11 to expose the graphical user interface, so if you want to run RStudio, or Libre Office or Audacity in a desktop-container on a Mac, that could be a great place to start. Bits of guidance on getting up and running with X11 applications on OS/X can be found here: Docker for Mac and GUI applications and here: Bring Linux apps to the Mac Desktop with Docker, although this recipe seems most complete: Containerize GUI Applications on Mac.

Note: in testing, I had to install XQuartz (brew install xquartz), enabling the Security tab preference to Allow connections from network client, and socat (brew install socat), reboot, and then run socat TCP-LISTEN:6000,reuseaddr,fork UNIX-CLIENT:\"$DISPLAY\" (which is blocking) in a terminal on its own; in another terminal, I found the local IP address as ip=$(ifconfig en0 | grep inet | awk '$1=="inet" {print $2}') and could then run a docker command of the form docker run --name firefox -e DISPLAY=$ip:0 -v /tmp/.X11-unix:/tmp/.X11-unix jess/firefox which starts up XQuartz if it’s not already running; note also that on occasion I kept getting situations with the error socat[1739] E bind(5, {LEN=0 AF=2}, 16): Address already in use; the netstat command shows port 60000 waiting, but after a few minutes there’s a time out and the port is freed up; closing the docker app window also leaves the old, named container around, which needs removing with something of the form docker rm firefox before the app can be run again. Note that there is no --device /dev/snd mountpoint for docker in OS/X so no easy way to pass through audio? I’m not sure how well this would work on a Windows machine, but presumably there are X11 clients for Windows that aren’t too painful to install? I’m not sure what socat equivalent would be if the client was only listening privately and needed a tunnel from the open X11 port 6000 to the socket where the client listens for connections? (Nothing needed with xming e.g. as suggested here?

(Hmmm, thinks: a version of something like Kitematic that also bundles an X11 client could be really handy? )

Whereas Jessie Frazelle’s desktop-containers make use of the X11 windowing system to all her host operating system to display the UI of the applications running inside a container, I prefer to use applications that expose their UI via http using a browser. Where this is not possible – as for example in the case of the Audacity audio editor, it is often possible to run the application within the container and then render the desktop running in the container via a browser window using an HTML remote desktop viewer such as Apache Guacamole. Here’s an old proof of concept: Accessing GUI Apps Via a Browser from a Container Using Guacamole.

Creating a Docker Library Shelf

Another part of the story is that you can build your own images and either share them publicly via the Dockerhub registry, keep them locally on your own computer, post them to a private Dockerhub repository (you  get a single private repository as part of the Dockerhub free plan, or can pay for more…), or run your own image registry.

Using this latter option – running your own Docker registry – means you can essentially run your own digital application shelf, although I don’t think you’ll be able to use Kitematic with it without forking your own version of it to use your registry (issue)? Instead, you have to use the command line. Again, the lack of support for “personal users” of Docker is an issue, but that’s another reason to try to blog more recipes and find more workarounds, and maybe get the occasional feature request adopted if we can define a use case well enough…

Alternatively, you can just make the shelf available as a set of Dockerfiles and shortcuts, as Jessie Frazelle has done with her jessfraz/dockerfiles Github repo.

There are a couple more posts to come in this series: one looking at how we can run Docker containers on a remote host (“in the cloud”); another looking at how we can link multiple applications to provide our own linked application, digital workbenches, within which linked applications can easily share files and connections between themselves.

Easy Web Publishing With Github

A quick note I’ve been meaning to post for ages… If you need some simple web hosting, you can use Github: simply create a top level docs folder in your repo and pop the files you want to serve in that directory.

And the easiest way to create that folder? In a repo web page, click the Create New File button, and use the filename docs/ to create the docs folder and add a web homepage to it as a markdown file.

(And to create a Github repo, if it’s your first time? Get a Github account, click on the “Create New Repository” or “Add New Repository” or a big “+” button somewhere and create a new repo, checking the box to automatically create an empty README file automatically to get the repo going. (If you forget to do that that, don’t panic – you should still be able to create a new file somewhere from the repo webpage to get the repo going…)

To serve the contents of the docs folder as a mini-website, click on the repo Settings button:

then scroll down to the Github Pages area, and select as a source master branch/docs folder.

When you save it, wait a minute or two for it to spin up, and then you should be able to see your website published at

For example, the files in the docs folder of  the default master branch of my Github psychemedia/parlihacks repository are rendered here:

If you have files on you  desktop you want to publish to the web, click on the docs folder on the Github repo webpage to list the contents of that directory, and simply drag the files from your desktop onto the page, then click the Commit changes button to upload them. If there are any files with the same name in the directory, the new file will be checked in as an updated version of the file, and you should be able to compare the differences to the previous version…

PS Github can be scary at times, but the web UI makes a lot of simple git interactions easy. See here for some examples – A Quick Look at Github Classroom and a Note on How Easy Github on the Web Is To Use… – such as editing files directly in Github viewed in your browser, or dragging and dropping files from your desktop onto the a Github repo web page to check in an update to a file.