Category: Rstats

Programming, meh… Let’s Teach How to Write Computational Essays Instead

From Stephen Wolfram, a nice phrase to describe the sorts of thing you can create using tools like Jupyter notebooks, Rmd and Mathematica notebooks: computational essays that complements the “computational narrative” phrase that is also used to describe such documents.

Wolfram’s recent blog post What Is a Computational Essay?, part essay, part computational essay,  is primarily a pitch for using Mathematica notebooks and the Wolfram Language. (The Wolfram Language provides computational support plus access to a “fact engine” database that ca be used to pull factual information into the coding environment.)

But it also describes nicely some of the generic features of other “generative document” media (Jupyter notebooks, Rmd/knitr) and how to start using them.

There are basically three kinds of things [in a computational essay]. First, ordinary text (here in English). Second, computer input. And third, computer output. And the crucial point is that these three kinds of these all work together to express what’s being communicated.

In Mathematica, the view is something like this:


In Jupyter notebooks:

In its raw form, an RStudio Rmd document source looks something like this:

A computational essay is in effect an intellectual story told through a collaboration between a human author and a computer. …

The ordinary text gives context and motivation. The computer input gives a precise specification of what’s being talked about. And then the computer output delivers facts and results, often in graphical form. It’s a powerful form of exposition that combines computational thinking on the part of the human author with computational knowledge and computational processing from the computer.

When we originally drafted the OU/FutureLearn course Learn to Code for Data Analysis (also available on OpenLearn), we wrote the explanatory text – delivered as HTML but including static code fragments and code outputs – as a notebook, and then ‘ran” the notebook to generate static HTML (or markdown) that provided the static course content. These notebooks were complemented by actual notebooks that students could work with interactively themselves.

(Actually, we prototyped authoring both the static text, and the elements to be used in the student notebooks, in a single document, from which the static HTML and “live” notebook documents could be generated: Authoring Multiple Docs from a Single IPython Notebook. )

Whilst the notion of the computational essay as a form is really powerful, I think the added distinction between between generative and generated documents is also useful. For example, a raw Rmd document of Jupyter notebook is a generative document that can be used to create a document containing text, code, and the output generated from executing the code. A generated document is an HTML, Word, or PDF export from an executed generative document.

Note that the generating code can be omitted from the generated output document, leaving just the text and code generated outputs. Code cells can also be collapsed so the code itself is hidden from view but still available for inspection at any time:

Notebooks also allow “reverse closing” of cells—allowing an output cell to be immediately visible, even though the input cell that generated it is initially closed. This kind of hiding of code should generally be avoided in the body of a computational essay, but it’s sometimes useful at the beginning or end of an essay, either to give an indication of what’s coming, or to include something more advanced where you don’t want to go through in detail how it’s made.

Even if notebooks are not used interactively, they can be used to create correct static texts where outputs that are supposed to relate to some fragment of code in the main text actually do so because they are created by the code, rather than being cut and pasted from some other environment.

However, making the generative – as well as generated – documents available means readers can learn by doing, as well as reading:

One feature of the Wolfram Language is that—like with human languages—it’s typically easier to read than to write. And that means that a good way for people to learn what they need to be able to write computational essays is for them first to read a bunch of essays. Perhaps then they can start to modify those essays. Or they can start creating “notes essays”, based on code generated in livecoding or other classroom sessions.

In terms of our own learnings to date about how to use notebooks most effectively as part of a teaching communication (i.e. as learning materials), Wolfram seems to have come to many similar conclusions. For example, try to limit the amount of code in any particular code cell:

In a typical computational essay, each piece of input will usually be quite short (often not more than a line or two). But the point is that such input can communicate a high-level computational thought, in a form that can readily be understood both by the computer and by a human reading the essay.

...

So what can go wrong? Well, like English prose, can be unnecessarily complicated, and hard to understand. In a good computational essay, both the ordinary text, and the code, should be as simple and clean as possible. I try to enforce this for myself by saying that each piece of input should be at most one or perhaps two lines long—and that the caption for the input should always be just one line long. If I’m trying to do something where the core of it (perhaps excluding things like display options) takes more than a line of code, then I break it up, explaining each line separately.

It can also be useful to "preview" the output of a particular operation that populates a variable for use in the following expression to help the reader understand what sort of thing that expression is evaluating:

Another important principle as far as I’m concerned is: be explicit. Don’t have some variable that, say, implicitly stores a list of words. Actually show at least part of the list, so people can explicitly see what it’s like.

In many respects, the computational narrative format forces you to construct an argument in a particular way: if a piece of code operates on a particular thing, you need to access, or create, the thing before you can operate on it.

[A]nother thing that helps is that the nature of a computational essay is that it must have a “computational narrative”—a sequence of pieces of code that the computer can execute to do what’s being discussed in the essay. And while one might be able to write an ordinary essay that doesn’t make much sense but still sounds good, one can’t ultimately do something like that in a computational essay. Because in the end the code is the code, and actually has to run and do things.

One of the arguments I've been trying to develop in an attempt to persuade some of my colleagues to consider the use of notebooks to support teaching is the notebook nature of them. Several years ago, one of the en vogue ideas being pushed in our learning design discussions was to try to find ways of supporting and encouraging the use of "learning diaries", where students could reflect on their learning, recording not only things they'd learned but also ways they'd come to learn them. Slightly later, portfolio style assessment became "a thing" to consider.

Wolfram notes something similar from way back when...

The idea of students producing computational essays is something new for modern times, made possible by a whole stack of current technology. But there’s a curious resonance with something from the distant past. You see, if you’d learned a subject like math in the US a couple of hundred years ago, a big thing you’d have done is to create a so-called ciphering book—in which over the course of several years you carefully wrote out the solutions to a range of problems, mixing explanations with calculations. And the idea then was that you kept your ciphering book for the rest of your life, referring to it whenever you needed to solve problems like the ones it included.

Well, now, with computational essays you can do very much the same thing. The problems you can address are vastly more sophisticated and wide-ranging than you could reach with hand calculation. But like with ciphering books, you can write computational essays so they’ll be useful to you in the future—though now you won’t have to imitate calculations by hand; instead you’ll just edit your computational essay notebook and immediately rerun the Wolfram Language inputs in it.

One of the advantages that notebooks have over some other environments in which students learn to code is that structure of the notebook can encourage you to develop a solution to a problem whilst retaining your earlier working.

The earlier working is where you can engage in the minutiae of trying to figure out how to apply particular programming concepts, creating small, playful, test examples of the sort of the thing you need to use in the task you have actually been set. (I think of this as a "trial driven" software approach rather than a "test driven* one; in a trial,  you play with a bit of code in the margins to check that it does the sort of thing you want, or expect, it to do before using it in the main flow of a coding task.)

One of the advantages for students using notebooks is that they can doodle with code fragments to try things out, and keep a record of the history of their own learning, as well as producing working bits of code that might be used for formative or summative assessment, for example.

Another advantage is that by creating notebooks, which may include recorded fragments of dead ends when trying to solve a particular problem, is that you can refer back to them. And reuse what you learned, or discovered how to do, in them.

And this is one of the great general features of computational essays. When students write them, they’re in effect creating a custom library of computational tools for themselves—that they’ll be in a position to immediately use at any time in the future. It’s far too common for students to write notes in a class, then never refer to them again. Yes, they might run across some situation where the notes would be helpful. But it’s often hard to motivate going back and reading the notes—not least because that’s only the beginning; there’s still the matter of implementing whatever’s in the notes.

Looking at many of the notebooks students have created from scratch to support assessment activities in TM351, it's evident that many of them are not using them other than as an interactive code editor with history. The documents contain code cells and outputs, with little if any commentary (what comments there are are often just simple inline code comments in a code cell). They are barely computational narratives, let alone computational essays; they're more of a computational scratchpad containing small code fragments, without context.

This possibly reflects the prior history in terms of code education that students have received, working "out of context" in an interactive Python command line editor, or a traditional IDE, where the idea is to produce standalone files containing complete programmes or applications. Not pieces of code, written a line at a time, in a narrative form, with example output to show the development of a computational argument.

(One argument I've heard made against notebooks is that they aren't appropriate as an environment for writing "real programmes" or "applications". But that's not strictly true: Jupyter notebooks can be used to define and run microservices/APIs as well as GUI driven applications.)

However, if you start to see computational narratives as a form of narrative documentation that can be used to support a form of literate programming, then once again the notebook format can come in to its own, and draw on styling more common in a text document editor than a programming environment.

(By default, Jupyter notebooks expect you to write text content in markdown or markdown+HTML, but WYSIWYG editors can be added as an extension.)

Use the structured nature of notebooks. Break up computational essays with section headings, again helping to make them easy to skim. I follow the style of having a “caption line” before each input. Don’t worry if this somewhat repeats what a paragraph of text has said; consider the caption something that someone who’s just “looking at the pictures” might read to understand what a picture is of, before they actually dive into the full textual narrative.

As well as allowing you to create documents in which the content is generated interactively - code cells can be changed and re-run, for example - it is also possible to embed interactive components in both generative and generated documents.

On the one hand, it's quite possible to generate and embed an interactive map or interactive chart that supports popups or zooming in a generated HTML output document.

On the other, Mathematica and Jupyter both support the dynamic creation of interactive widget controls in generative documents that give you control over code elements in the document, such as sliders to change numerical parameters or list boxes to select categorical text items. (In the R world, there is support for embedded shiny apps in Rmd documents.)

These can be useful when creating narratives that encourage exploration (for example, in the sense of  explorable explantations, though I seem to recall Michael Blastland expressing concern several years ago about how ineffective interactives could be in data journalism stories.

The technology of Wolfram Notebooks makes it straightforward to put in interactive elements, like Manipulate, [interact/interactive in Jupyter notebooks] into computational essays. And sometimes this is very helpful, and perhaps even essential. But interactive elements shouldn’t be overused. Because whenever there’s an element that requires interaction, this reduces the ability to skim the essay."

I've also thought previously that interactive functions are a useful way of motivating the use of functions in general when teaching introductory programming. For example, An Alternative Way of Motivating the Use of Functions?.

One of the issues in trying to set up student notebooks is how to handle boilerplate code that is required before the student can create, or run, the code you actually want them to explore. In TM351, we preload notebooks with various packages and bits of magic; in my own tinkerings, I'm starting to try to package stuff up so that it can be imported into a notebook in a single line.

Sometimes there’s a fair amount of data—or code—that’s needed to set up a particular computational essay. The cloud is very useful for handling this. Just deploy the data (or code) to the Wolfram Cloud, and set appropriate permissions so it can automatically be read whenever the code in your essay is executed.

As far as opportunities for making increasing use of notebooks as a kind of technology goes, I came to a similar conclusion some time ago to Stephen Wolfram when he writes:

[I]t’s only very recently that I’ve realized just how central computational essays can be to both the way people learn, and the way they communicate facts and ideas. Professionals of the future will routinely deliver results and reports as computational essays. Educators will routinely explain concepts using computational essays. Students will routinely produce computational essays as homework for their classes.

Regarding his final conclusion, I'm a little bit more circumspect:

The modern world of the web has brought us a few new formats for communication—like blogs, and social media, and things like Wikipedia. But all of these still follow the basic concept of text + pictures that’s existed since the beginning of the age of literacy. With computational essays we finally have something new.

In many respects, HTML+Javascript pages have been capable of delivering, and actually delivering, computationally generated documents for some time. Whether computational notebooks offer some sort of step-change away from that, or actually represent a return to the original read/write imaginings of the web with portable and computed facts accessed using Linked Data?

Fragment – DIT4C – Docker Base Containers for Edu Remote Computing Labs

What’s an effective way of helping a student run a desktop application when their own computer won’t run the application, for whatever reason, locally? Virtualised software, running remotely, provides one solution. So here’s an example of a project that looks at doing just that:  DIT4C (“Data Intensive Tools for the Cloud”)a platform for hosting data analysis tools “in the cloud” using containers [repo].

Prepackaged, standalone containers are defined for a range of applications, including RStudio, Jupyter notebooks, Jupyter+R and OpenRefine

Standalone Containers With Branded Landing Page

The application containers are built on top of a base container that includes an nginx webser/proxy, a GoTTY shell and a file uploader. The individual containers then have a “homepage” that links to the particular application:

So what do we have at this point?

  • a branded landing page;
  • browser accessed shell:
  • a browser accessed file uploader:

These services are all running within a single container. I don’t know if there’s a way of linking multiple containers using docker-compose? This would require finding some way of announcing the services provided by each container, to a central nginx server which could then link to each from a single homepage. But this would mean separate terminals and file loaders into each one (though maybe the shared files could be handled as a single mounted volume shared across all the linked containers?

Once again, I’m coming round to the idea that using a single container to run multiple services, rather than several linked containers each running a single service, is simpler, even if it does go against the (ideal?) model of using containers as part of a small pieces, loosely joined architecture? I think I need to post a simple recipe (or recipes) somewhere that show different ways of running multiple services within a single container. The docker docs  – Run multiple services in a container – provide a crib in to this at the moment.

X11 Applications

Skimming the docs, I notice reference to a base X11 desktop container. Interesting… I have a PhD student looking for an easy way to host a Qt widget running application in the cloud for evaluation purposes. To this end I’ve just started looking around for X11/noVNC web client containers that would allow us to package the app in a simple container then access it from something like Digital Ocean (given there’s no internal OU docker container hosting service that I’m allowed to access (or am aware of… Maybe on the Faculty cluster?)).

So things like this show the way – a container that offers a link to a containerised “desktop” application, in this case QGIS (dit4c/dockerfile-dit4c-container-qgis); (does the background colour mean anything, I wonder? How could we make use of background colour in OU containers?):


Following the X11 Session link, we get to a desktop:

There’s an icon in the toolbar to the application we want – QGIS:

What I’m thinking now is this could be handy for running the V-REP robot simulator, and maybe Gephi…

It also makes me think that things could be simplified a little further by offering a link to QGIS, rather than X11 Application, and opening the application in full screen mode (on the virtualised desktop) on start-up. (See Distributing Virtual Machines That Include a Virtual Desktop To Students – V-REP + Jupyter Notebooks for some thoughts on how to use VMs to distribute a single pre-launched on startup desktop application to try to simplify the student experience.)

It also makes me even more concerned about the apparent lack of interest in the OU, and even awareness of, the possibilities of virtualised software offerings. For example, at a recent SIG group on (interactive) maps/mapping, brief mention was made of using QGIS, and problems arising therefrom (though I forget the context of the problems). Here we have a solution – out there for all to see and anyone to find – that demonstrates the use of QGIS in a prebuilt container. But who, internally, would think to mention that? I don’t think any of the Tech Enhanced Learning folk I’ve spoken to would even consider it, if they are even aware of it as an option?

(Of course, in testing, it might be rubbish… how much bandwidth is required for a responsive experience when creating detailed maps? See also one of my earlier related experiments: Accessing GUI Apps Via a Browser from a Container Using Guacamole, which remotely accessing the Audacity audio editor using a cloud hosted container.)

The Platform Offering

Skimming through the repos, I (mistakenly, as it happens) thought I saw a reference to resbaz (ResBaz Cloud – Containerised Research Apps as a Service). I was mistaken in thinking I had seen a reference in the code I skimmed though, but not, it seems, in the fact that there is a relationship:

And so it seems that perhaps more interestingly than the standalone containers is that DIT4C is a platform offering (architecture docs), providing authenticated access to users, file persistence (presumably?) and the ability to launch prebuilt docker images as required.

That said, looking at the Github repository commits for the project, there appears to have been little activity since March 2017 and the gitter channel appears to have gone silent at the end of 2016. In addition, the docs for getting an instance of the platform up and running are a little bit too sparse for me to follow easily… [UPDATE: it seems as if the funding did run out/get pulled:-(]

So maybe as a project, DIT4C is perhaps now “of historical interest” only, rather than being a live project we might have been able to jump on the back of to get an OU hosted remote computing lab up and running? :-( That said, the ResBaz (Research Bazaar) initiative, “worldwide festival promoting the digital literacy emerging at the center of modern research”, still seems to be around…

Oh How I Have Failed Thee, Jupyter Notebooks…

Although I first came across Jupyter – then IPython – notebooks in October 2012 (I think…), it took me another six months or so before I started playing regularly with them and pitched them for the then nascent TM351 course (geeknotes/history). We decided to explore the notebooks when the course/module team first met around about October 2013. Four years ago. Notebooks were also adopted for the Learn to Code for Data Analysis FutureLearn course (H/T to Michel Wermelinger for driving that) and only get the briefest of look-ins in the new level 1 course TM112 (even after I showed we could probably get turtle running in them…).

But to my shame I haven’t lobbied more on campus, and haven’t done the rounds giving talks and workshops and putting together meaningful demos.

Which is possibly the sort of activity that this newly advertised, and hugely attractive, role at the University of Edinburgh (h/t @PhilBarker) is designed to support – eLearning Officer Computational Notebooks.

Do you have a sound knowledge of technology and an enthusiasm for evaluating new approaches in education? We are looking for a learning technologist with a passion for communication and relationship management to lead a pilot of Jupyter notebooks for learning and teaching at the University of Edinburgh.

Jupyter notebooks are open-source web applications that enable learners to create, share and reuse computational narratives. Based within the central Information Services you will work closely with academic colleagues in Science and Engineering. You will analyse user requirements, advise on and support the use of Jupyter and evaluate the success of the pilot.

After clicking on the Apply button, we get to some more detail. Part of the purpose of the job is to “scope, assess demand and support requirements for a computational notebook (Jupyter Notebook) service”, something we’re trying to push through in a very limited form in the OU in order to support the TM112 notebook activity.

Here’s how the responsibilities unpick:

  1. To help academic and support staff make best use of learning technology services (in this case Jupyter Computational Notebook Service) and where required supporting and managing service change. Documenting use cases and sharing good practice and innovative solutions to improve the user experience. (Approx % of time 40%)
  2. To work with the user community and project partners in academic departments in order to continually improve the services and range of tools on offer. To maintain an up-to-date knowledge of the broader e-learning landscape in order to influence strategic direction and to develop innovative and appropriate use of learning technologies. (Approx % of time 30%)
  3. To participate and lead user and partner engagement events, in order to promote collaboration, knowledge sharing and greater awareness of services. To organise testing, training and workshops to support users. To represent the University and its interests both internally and externally. (Approx % of time 20%)
  4. Contribute to process improvement within both ISG and the wider University. Liaise and negotiate within members of University committees, user forums and working groups to formulate policy in accordance with the university strategic aims for learning and teaching. (Approx % of time 10%)

(On process improvement, I think Jupyter notebooks can provide a useful authoring environment (along with things like “written diagrams“) for “reproducible”  (which is to say, maintainable) course materials in the OU context. An approach I have had a total lack of success in promoting.)

I couldn’t help but try out a quick search for other notebooks related job ads, and turned up a handful of research posts, including one for a Bioinformatics Training Developer at the University of Cambridge – Cancer Research UK Cambridge Institute. The job duties and requirements provide an interesting complement to the skills required of a data journalist:

The training courses and summer schools already established are very popular and have gained a strong reputation. In this role, you will further develop the existing courses to reflect new advances. You will also create and deliver new training courses and materials in scientific data analysis and visualization, … . You will be responsible for assessing the training needs of research scientists and shaping a programme to meet those needs. This is an excellent opportunity to develop and apply new training approaches, making use of technologies such as R/Python Notebooks, Shiny web applications and Docker.

The successful candidate will have a degree in a scientific or computational discipline and preferably a postgraduate degree (MSc or PhD) and/or significant experience in Bioinformatics or Computational Biology, including the analysis of omics datasets using R. The role requires a high level of interpersonal and organizational skills and previous experience in preparation and delivery of training courses is essential. Strong practical skills in R and/or Python are highly desirable, including the use of version control systems, e.g. GitHub. …

[My emphasis.]

It’s maybe also worth mentioning here the current consultation around the draft Data Scientist Integrated Degree Apprenticeship (level 6) standard. Please comment if you can…

PS I popped together a feed for a search for “notebooks” on jobs.ac.uk using fetchrss.com to try to keep track of future upcoming academic job ads making mention of notebooks.

ergastR – R Wrapper for ergast F1 Results Data API

By the by, I’ve posted a first attempt at an R package – ergastR to wrap the ergast developer API, which is where I get chunks of data from for my f1datajunkie tinkerings.

You can find it on Github: psychemedia/ergastR.

The function names are the ones used in the Wrangling F1 Data With R book.

The R package needs a bit of tidying up and also needs work on the following: cacheing, so that we don’t keep hitting the ergast API unnecessarily; paged results handling (I fudge this a bit at the moment by explicitly setting a large results limit); and dual handling of ergast API versus downloaed ergast database requests (if a database connection string is passed, use that rather than make a call to the ergast API). But it’s a start… Feel free to raise issues via the repo:-)

In related news, Will Vaughan tipped me off to a Python package he’s started putting together to wrap the ergast API: ergast-python. He’s also making a start on some Wrangling F1 Data Jupyter notebooks that make use of the Python wrapper: wranglingf1data.

HexJSON HTMLWidget for R, Part 3

In HexJSON HTMLWidget for R, Part 1 I described a basic HTMLwidget for rendering hexJSON maps using d3-hexJSON, and HexJSON HTMLWidget for R, Part 2 described updates for supporting colour.

Having booked off today for emergency family cover that turned out not to be required, I had another stab at the package, so it now supports the following additional features…

Firstly, I had a go at popping some “base” hexjson files into a location within the package from which I could load them (checkin). Based on a crib from here, which suggests putting datafiles into an extdata folder in the package inst/ folder, from where devtools::build() makes them available in the built package root directory.

hexjsonbasefiles <- function(){
  list.files(system.file("extdata", package = "hexjsonwidget"))
}

With the files in place, we can use any base hexjson files included included in the package as the basis for hexmaps.

I also added the ability to switch off labels although later in the day I simplified this process…

One thing that was close to the top of my list was the ability to merge the contents of a dataframe into a hexJSON object. In particular, for a row identified by a particular key value associated with a hex key value, I wanted to map columns onto hex attributes. The hexjson object is represented as a list, so this required a couple of things: firstly, getting the dataframe data into an appropriate list form, secondly merging this into the hexjson list using the rlist::merge() function from the rlist package. Here’s the gist of the trick I ended up with, which was to construct a list split() from each row in a dataframe, with the rowname as the list name, using lapply(.., as.list):

ll=lapply(split(customdata, rownames(customdata)), as.list)
jsondata$hexes = list.merge(jsondata$hexes, ll)

A hexjsondatamerge(hexjson,df) function takes a hexjson file and merges the contents of the dataframe into the hexes:

The contents of a dataframe can also be merged in directly when creating a hexjsonwidget:

Having started to work with dataframes, it also seemed like it might be sensible to support the creation of a hexjson object directly from a dataframe. This uses a similar trick to the one used in creating the nested list for the merge function:

hexjsonfromdataframe <- function(df,layout="odd-r", keyid='id',
                                 q='q', r='r'){

  rownames(df) = df[[keyid]]
  df[[keyid]] = NULL
  colnames(df)[colnames(df) == q] = 'q'
  colnames(df)[colnames(df) == r] = 'r'

  list(layout=layout,
       hexes=lapply(split(df, rownames(df)), as.list))
}

hexjsonpart3_6

As you might expect, we can then use the hexjson object to create a hexjsonwidget:

A hexjsonwidget can also be created directly from a dataframe:

If we wanted to save the hexjson to a file, we could do something like: write( toJSON( jjx ), "test_out.hexjson" ).

In creating the hexjson-from-dataframe, I also refactored some of the other bits of code to simplify the number of parameters I’d started putting into the hexjsonwidget() function, in effect overloading them so the same named parameter could be used in different supporting functions.

I think that’s pretty much it from the developments I had in mind for the package. Now all I need to do is put it into practice… testing for which will, no doubt, throw up issues!)

PS Not quite it… I just added some simple file handlers too: to save the hexjson to a file, use hexjsonwrite(df, filename) and to read a json/hexjson file into a hexjson object use hexjsonread(filename).

HexJSON HTMLWidget for R, Part 2

In my previous post – HexJSON HTMLWidget for R, Part 1 – I described a first attempt at an HTMLwidget for displaying hexJSON maps using d3-hexJSON.

I had another play today and added a few extra features, including the ability to:

  • add a grid (as demonstrated in the original d3-hexJSON examples),
  • modify the default colour of data and grid hexes,
  • set the data hex colour via a col attribute defined on a hexJSON hex, and
  • set the data hex label via a label attribute defined on a hexJSON hex.

We can now also pass in the path to a hexJSON file, rather than just the hexJSON object:

Here’s the example hexJSON file:

And here’s an example of the default grid colour and a custom text colour :

I’ve also tried to separate out the code changes as separate commits for each feature update: code checkins. For example, here’s where I added the original colour handling.

I’ve also had a go at putting some docs in place, generated using roxygen2 called from inside the widget code folder with devtools::document(). (The widget itself gets rebuilt by running the command devtools::install().)

Next up – some routines to annotate a base hexJSON file with data to colour and label the hexes. (I’m also wondering if I should support the ability to specify arbitrary hexJSON hex attribute names for label text (label) and hex colour (col), or whether to keep those names as a fixed requirement?) See what I came up with here: HexJSON HTMLWidget for R, Part 3.

HexJSON HTMLWidget for R, Part 1

In advance of the recent UK general election, ODI Leeds published an interactive hexmap of constituencies to provide a navigation surface over various datasets relating to Westminster constituencies:

As well as the interactive front end, ODI Leeds published a simple JSON format for sharing the hex data – hexjson that allows you to specify an identifier for each, some data values associated with it, and relative row (r) and column (q) co-ordinates:

It’s not hard to imagine the publication of default, or base, hexJSON documents that include standard identifier codes and appropriate co-ordinates, e.g. for Westminster constituencies, wards, local authorities, and so on being developed around such a standard.

So that’s one thing the standard affords – a common way of representing lots of different datasets.

Tooling can then be developed to inject particular data-values into an appropriate hexJSON file. For example, a hexJSON representation of UK HEIs could add a data attribute identifying whether an HEI received a Gold, Silver or Bronze TEF rating. That’s a second thing the availability of a standard supports.

By building a UI that reads data in from a hexJSON file, ODI Leeds have developed an application that can presumably render other people’s hexJSON files, again, another benefit of a standard representation.

But the availability of the standard also means other people can build other visualisation tools around the standard. Which is exactly what Oli Hawkins did with his d3-hexjson Javascript library, “a D3 module for generating [SVG] hexmaps from data in HexJSON format” as announced here. So that’s another thing the standard allows.

You can see an example here, created by Henry Lau:

You maybe start to get a feel for how this works… Data in a standard form, standard library that renders the data. For example, Giuseppe Sollazzo (aka @puntofisso), had a play looking at voter swing:

So… one of the things I was wondering was how easy it would be for folk in the House of Commons Library, for example, to make use of the d3-hexjson maps without having to do the Javascript or HTML thing.

Step in HTMLwidgets, a (standardised) format for publishing interactive HTML widgets from Rmarkdown (Rmd). The idea is that you should be able to say something like:

hexjsonwidget( hexjson )

and embed a rendering of a d3-hexjson map in HTML output from a knitred Rmd document.

(Creating the hexjson as a JSON file from a base (hexJSON) file with custom data values added to it is the next step, and the next thing on my to do list.)

So following the HTMLwidgets tutorial, and copying Henry Lau’s example (which maybe drew on Oli’s README?) I came up with a minimal take on a hexJSON HTMLwidget.

library(devtools)
install_github("psychemedia/htmlwidget-hexjson")

It’s little more than a wrapping of the demo template, and I’ve only tested it with a single example hexJSON file, but it does generate d3.js hexmaps:

rstudio_hexjson

library(jsonlite)
library(hexjsonwidget)

jj=fromJSON('./example.hexjson')
hexjsonwidget(jj)

It also needs documenting. And support for creating data-populated base hexJSON files. But it’s a start. And another thing the hexJSON has unintentionally provided supported for.

But it does let you create HTML documents with embedded hexmaps if you have the hexJSON file handy:

By the by, it’s also worth noting that we can also publish an image snapshot of the SVG hexjson map in a knitr rendering of the document to a PDF or Microsoft Word output document format:

At first I thought this wasn’t possible, and via @timelyportfolio found a workaround to generate an image from the SVG:

library(exportwidget)
library(webshot)
library(htmltools)
library(magrittr)

html_print(tagList(
hexjsonwidget(jj)
, export_widget( )
), viewer=NULL) %>%
webshot( delay = 3 )

But then noticed that the PDF rendering was suddenly working – it seems that if you have the webshot and htmltools packages installed, then the PDF and Word rendering of the HTMLwidget SVG as an image works automagically. (I’m not sure I’ve seen that documented – the related HTMLwidget Github issue still looks to be open?)

See also: HexJSON HTMLWidget for R, Part 2, in which support for custom hex colour and labeling is added, and HexJSON HTMLWidget for R, Part 3, where I add in the ability to merge an R dataframe into a hexjson object, and create a hexjsonwidget directly from a dataframe.