Category: Anything you want

Fragment – Wizards From Jupyter RISE Slideshows

Note to self, as much as anything…

At the moment I’m tinkering with a couple of OU hacks that require:

  • a login prompt to log in to OU auth
  • a prompt that requests what service you require
  • a screen that shows a dialogue relating to the desired service, as well as a response from that service.

I’m building these up in Jupyter notebooks, and it struck me that I could create a simple, multi-step wizard to mediate the interaction using the Jupyter RISE slideshow extension.

For example, the first slide is the login, the second slide the service prompt, the third screen the service dialogue, maybe with child slides relating to that?

(Hmm, thinks – would be interesting if RISE supported even more non-linear actions over and above it’s 1.5D nature? For example, branched logic, choosing which of N slides to go to next?)

Anyway, just noting that as an idea: RISE live notebook slideshows as multi-step wizards.

If Only I’d Been More Focussed… National-Local Data Robot Media Wire

And so it came to pass that Urbs Media started putting out their Arria NLG generated local data stories, customised from national data sets, on the PA news wire, as reported by the Press GazetteFirst robot-written stories from Press Association make it into print in ‘world-first’ for journalism industry – and Hold the Front Page: Regional publishers trial new PA robot reporting project.

Ever keen to try new approaches out, my local hyperlocal, OnTheWight, have already run a couple of the stories. Here’s an example: Few disadvantaged Isle of Wight children go to university, figures show.

Long term readers might remember that this approach is one that OnTheWight have explored before, of course, as described in OnTheWight: Back at the forefront of next wave of automated article creation.

Back in 2015, I teamed up with them explore some ideas around “robot journalism”, reusing some of my tinkerings to automate the production of a monthly data story OnTheWight run around local jobless statistics. You can see a brief review from the time here and an example story from June 2015 here. The code was actually developed a bit further to include some automatically generated maps (example) but the experiment had petered out by then (“musical differences”, as I recall it!;-) (I think we’re talking again now.. ;-) I’d half imagined actually making a go of some sort of offering around this, but hey ho… I still have some related domains I bought on spec at the time…

At the time, we’d been discussing ways for what to do next. The “Big Idea” as I saw it was that doing the work of churning through a national dataset, with data at the local level, once, (for OntheWight), meant that the work was already done for everywhere.

robot_intermediatePR

To this end, I imagined a “datawire” – you can track the evolution of that phrase through OUseful.info posts here – that could be used to distribute localised press releases automatically generated from national datasets. One of the important things for OnTheWIght was the importance of getting data reports out quickly once a data set had been released. (I seem to remember we raced each other – the manual route versus the robot one.) My tools weren’t fully automated – I had to keep hitting reload to fetch the data rather than having a cron job start pinging the Nomis website around the time of the official release, but that was as much because I didn’t run any servers as anything. One thing we did do was automatically push the robot generated story into the OnTheWight WordPress blog draft queue, from where it could be checked and published by a human editor. The images were handled circuitously (I don’t think I had a key to push image assets to the OnTheWight image server?)

The data wire idea was actually sketched out a couple of years ago at a community journalism conference (Time for a Local Data Wire?), and that was perhaps where our musical differences about the way forward started to surface? :-(

One thing you may note is the focus on producing press releases, with the intention that a journalist could build a story around the data product, rather than the data product standing in wholesale for a story.

I’m not sure this differs much from the model being pursued by Urbs Media, the organisation that’s creating the PA data stories, and that is funded in part at least by a Google Digital News Initiative (DNI) grant: PA awarded €706,000 grant from Google to fund a local news automation service in collaboration with Urbs Media.

FWIW, give me three quarters of a million squids, or Euros, and that’d do me as a private income for the rest of working my life; which means I’d be guilt free enough to play all time…!

One of the things that I think the Urb stories are doing is including quotes on the national statistical context taken from the original data release. For example:

Which reminds me – I started to look at the ONS JSON API when it appeared (example links), but don’t think I got much further than an initial play... One to revisit, to see if it can be used as a source from which automated quote extraction is possible…

Something our original job stats stories didn’t really get to evolve as far  as being the inspiration for contextualising reporting – they were more or less a literal restatement of the “data generated press release”. I seem to recall that this notion of data-to-text-to-published-copy started to concern me, and I began to explore it in a series of posts on “robot churnalism” (for example, Notes on Robot Churnalism, Part I – Robot Writers and Notes on Robot Churnalism, Part II – Robots in the Journalism Workplace).

(I don’t know how many of the stories returned in that search were from PA stories. I think that regional news group operators such as Johnston Press and Archant also run national units producing story templates that can be syndicated, so some templated stories may come from there.)

I think there are a couple more posts in that series still in my draft queue somewhere which I may need to finish off… Perhaps we’ll see how the new stories start to play out to see whether we start to see the copy being reprinted as is or being used to inspire more contextualised local reporting around the data.

I also recall presenting on the topic of “Robot Writers” at ILI in 2016 (I wasn’t invited back this year:-(

So what sort of tech is involved in producing the PA data wire stories? From the preview video on the Urbs Media website, the technology behind the Radar project –  Reporters and Data and Robots  – looks to be the Articulator Lite application developed by Arria NLG. If you haven’t been keeping up, Arria NLG is the UK equivalent of companies like Narrative Science and Automated Insights in the US which I’ve posted about on and off for the last few years (for example, Notes on Narrative Science and Automated Insights).

Anyway, it’ll be interesting to see how the PA / Urbs Media thing plays out. I don’t know if they’re automating the charts’n’maps production thing yet, but if they do then I hope they generate easily skinnable graphic objects that can be themed using things like ggthemes or matplotlib styles.

There’s a stack of practical issues and ethical issues associated with this sort of thing, and it’ll be interesting to see if any concerns start to be aired, or oopses appear. The reporting around the Births by parents’ characteristics in England and Wales: 2016 could easily be seen as judgemental, for example.

PS I wonder if they run a Slack channel data wire? Slackbot Data Wire, Initial Sketch Maybe there’s still a gap in the market for one of my ideas?! ;-)

Open ALMS….

A year or two after graduating, and having failed in bid to become an independent music promoter with a couple of others (we booked Hawkwind rather than Nirvana; oops…) I was playing chess with a stoner of Buddha-like nature who showed me how the pieces moved. (That is, I knew how the pieces moved square-wise; but this was more in the sense of “doh… so if you do that……., then that, and that, and then this, then that and, …okay. Another game?”)

Sometimes, I get that feeling from OLDaily (the being taught how the pieces move thing; I can’t really comment on the adjectival bits). As a case in point, Stephen recently linked to an old piece of work from David Wiley, amongst others: The Four R’s of Openness and ALMS Analysis: Frameworks for Open Educational Resources.

The paper opens with a review of “The Four R’s of Openness”, the rights to reuse, redistribute, revise and remix and then goes on to consider Wiley’s ALMS framework, which represent’s Stephen’s move:

One of the primary benefits of an OER is that it can be adapted, or localized, to the needs of specific situations.

Even if a work has been licensed so that users are free to reuse, redistribute, revise and remix it, several technical factors affect openness, particularly in terms of revising and remixing. If producers of OER give people permission to use their resources, they should also consider giving them the technical keys to unlock the OERs so that they can adapt the OER to their own needs. … ALMS is an acronym that stands for: Access to editing tools. Level of expertise required to revise or remix. Meaningfully editable. Source-file access.

Access to editing tools. When people try to revise OER, one of the first questions they will need to ask is ―What software do I use to edit this resource? …

Level of expertise required to revise or remix. … Even if end users have access to editing tools, if they need 100 hours of training to use the tool effectively, revising OERs that rely on those tools will likely be beyond their reach. …

Meaningfully editable. Perhaps the classic example of OER that are not meaningfully editable are scanned PDF documents. If a person takes handwritten notes, scans images of those notes and puts them into PDF format, the contents of the resulting file cannot be meaningfully edited. The only way to revise or remix this work is to type out the words in the PDF into a word processor document and make revisions there. [TH: so you might argue that diagrams are typically not meaningfully editable.]

Source-file access. … A source file is the file that a programmer or developer edits and works with in order to produce a final product. …

Open educational resources will be easy to revise or remix technically if they are meaningfully editable (like a web page), access to the source file is provided (like an HTML file), can be edited by a wide range of free or affordable software programs (like an RTF file), and can be edited with software that is easy to use and is used by many people.

Open educational resources will be difficult to revise or remix technically if they are not meaningfully editable (like scanned handwriting), are not self-sourced (like a Flash file), can only be edited by one, single platform, expensive software program (like a Microsoft OneNote file), and can only be edited with software that requires extensive training and is used by relatively few people.

[T]echnical aspects of OER will affect how “open” they really are. Creators of OER who wish to promote revising and remixing should ensure that OER are designed in such a way that users will have access to editing tools, that the tools needed to will not require a prohibitive level of expertise, and that the OER are meaningfully editable and self-sourced.

So… I’ve tried explain some of my recent thinking around this topic in Maybe Programming Isn’t What You Think It Is? Creating Repurposable & Modifiable OERs and  OERs in Practice: Re-use With Modification, but that didn’t get anywhere. Inspired by the ALMS piece, I’ll try again…

…by talking about Binder and Binderhub, again… Binderhub is a technology for launching on demand a Jupyter notebook server that serves notebooks against a particular programming environment. The notebook server essentially provides an interactive user interface via a browser in the form of Jupyter notebooks. The specification for the computing environment accessed from the notebooks, as well as the notebooks themselves, can be published in a public Github repository. Binderhub copies the files in the repository, and then uses them to build the computing environment.

Think of the computing environment like your desktop computer. To open particular documents, you need particular applications installed. The installed applications are part of your environment. If I give you a weird document type, you might have to install an application in order to open that document. The document might open easily for me because I have customised my computer with a particular environment (which includes all the applications I have installed on it to open and edit weird document types). But it might be difficult for you to install whatever it is you need to install to open the document and work with it. How much easier if you could just use my computer to do it. Or my environment.

Some of the files in a public github repository, then, can be used to define an environment (they are the “source code” for it. Binder can use them to build for you a copy of the environment that I work in, particularly if the environment I work in is one built from me by Binder from the same repository. If the files work for me, they should work for you.

As well as being a technology, Binderhub is, currently, a free, open service. It can be used to launch interactive notebooks running in an environment defined via files in a public github repository. Try one: Binder From the Cell menu, select Run All.

The demos in that example relate to maths. If you prefer to see music related demos, use this Binder instead: Binder

The definition for the environment, and the notebooks (that is, the “content”) are here: psychemedia/showntell/maths and here:
psychemedia/showntell/music.

All the rich assets in a notebook – images, sounds, interactives – can be generated from “source code” contained in the notebook (from the toolbar, select the Toggle selected cell input display toolbar button to reveal / hide the code). In many cases the source code can be quite concise because it calls on other, powerful applications preinstalled in the environment. The environment we defined in the repository.

Actually, that’s not strictly true. We may also pull on third party resources, such as embedded Google or Bing maps. But we could, if we wanted to, add a service to deliver the map tiles into the local environment, by defining the appropriate services in the repository and building them into the environment “from source”. We could, if we wanted to, build a self-contained environment that contains everything it needs in order to render everything that appears in the notebook.

What Binder and Binderhub do, then, is remove the technical barrier to entry when it comes to setting up a computing environment that is pre-configured to run a particular set of notebooks. Binder will build the environment for you, from source code. If the author of the notebooks creates and tests the notebooks to their satisfaction in the same environment, they should work…

So now we get to the notebooks. What the notebooks do is provide an environment that can be used to author, which is to say edit, which is to say revise and remix, rich content (where “rich” might be taken to mean “including media resources other than static text”). They also provide an environment for viewing and interacting with those materials. Again, for the viewer / user, there is no need to install anything. Binder will provide, via your browser.

At the moment, most of the focus on the development, and use, of Jupyter notebooks is for scientific research, but there is growing use in education and (data) journalism too.

I think there is a huge opportunity for using notebooks as a general purpose environment for creating materials that can make use of compute stuff, and from there start to work as a gateway drug for learning how to make more effective use of computers by writing easily learned magic incantations, a line at a time.

As an example, see if you can figure out to embed a map located on an address familiar to you using the following Binder: Binder

To read more about that notebook, see Embedding folium Maps In Jupyter Notebooks Using IPython Magic and Extending the folium Magic…. For other examples, such as how to embed images showing Scratch like programs, see Scratch Materials – Using Blockly Style Resources in Jupyter Notebooks. For an example of how to use a Jupyter notebook to create, and display, an interactive slideshow, see OERs in Practice: Repurposing Jupyter Notebooks as Presentations.

For more references to the ALMS work, see: Measuring Technical Difficulty in Reusing Open Educational Resources with the ALMS Analysis Framework, Seth M. Gurell, PhD Thesis.

Extending the folium Magic…

A couple of days ago I posted about some IPython magic for embedding interactive maps in Jupyter notebooks.

I had a bit more of a play yesterday, and then into the night, and started to add support in for other things. For example, you can now build up a map using several pieces of magic, for example by assigning a magic generated map to a variable and then passing that map in as the basemap to another piece of magic.

If the output of the previously run code cell is a folium map, the magic will use that map as the starting map by default – there is a switch (-b None), or some alternative magic (%folium_new_map) that forces a fresh map if you want to start from scratch.

The magic now also has some support for trying to guess where to centre a map, as well as logic that tries to guess at what columns you might want to use when generated a choropleth map from some data and a geojson file. This is best exemplified by some helper magic:

The same routines are used to try to guess column names if you omit them when trying to plot a choropleth. For example, here we only pass in a reference to the data file, the geojson file, and the numeric data column to colour the map. The columns used to plot the boundaries are guessed at.

This is likely to be very ropey – I’ve only tested it with a single pair of data/geosjon files, but it might work for limited general cases…

You can play with it here: Binder

Code is…

…something that everyone is supposed to learn, apparently, but I’m not sure why; and you don’t have to if you’re an adult learner, compared to someone in school?

So here are a couple of reasons why I think knowledge of how to use code is useful.

Code is a means for telling computers, rather than computer users, what to do

Writing instructional material for students that require them to use computer application is a real pain. You have to write laborious From the file menu, select…, in the popup wizard, check the… instructions, or use screenshots (which have to look exactly the same as what the learner is likely to see or they can get worried) or screencasts (which have to look exactly the same as what the learner is likely to see or they can get worried).

The problem is, of course, you don’t want the learner to really do any of that. You want them to be apply to apply a series of transformations to some sort of resource, such as a file.

On the other hand, you can write a set of instructions for what you want the computer to do to the resource, and give that to the learner. The learner can use that medium for getting the computer to transform the resource rather more literally…

Code is a tool for building tools

Code lets you build your own tools, or build your own tools on top of your own tools.

One huge class of tools you can build are tools that automate something else, either by doing a repetitive task multiple times over on your behalf, or by abstracting a lengthy set of instructions into a single instruction with some “settings” passed as parameters. And yes, you do know what parameters are: in the oven for thirty minutes at gas mark 5, which is to say, in “psuedo-code”, oven(gas_mark=5, time=30).

Here’s one tool I built recently – some magic that makes it easy to embed maps in Jupyter notebooks.

Code is a tool for extending other tools

If some you have access to the code that describes a tool, you can extend it, tune it, or customise it with your own code.

For example, here’s that  previous magic extended.

Code is something that can be reused in, or by, other bits of code

If you’ve written some code to perform a particular task in order to achieve something in one tool, you don’t need to write that code again if you want to perform the same task in another tool. You can just reuse the code.

Lots of code is “algorithmic boilerplate”, in the sense that it implements a recipe for doing something. Oftentimes, you may want to reuse the “algorithmic boilerplate” that someone else has got working in their application / tool, and reuse it, perhaps with slight remodification (swapping dataframes for datafiles, for example).

Disclaimer

In my student days, many nights were spent between the computer room, and the Student Union playing bridge. In the bridge games, the bidding was ad hoc because no-one could ever remember any of  the bidding systems you’re “supposed to use” properly… Obviously, it wasn’t proper bridge, but it passed the time amicably…

So unlike folk who (think code should be taught properly and don’t think to use code themselves for daily tasks), I have more pragmatic approach and ((think my code only has to work enough to make it worthwhile using it) or (because it’s an applied recreational activity like doing a crossword)).

Embedding folium Maps In Jupyter Notebooks Using IPython Magic

Whilst trying to show how interactive maps can be embedded in a Jupyter notebook, one of the comments I keep backing back is that “It’s too hard” because you have to write two or three lines of code.

So I’ve tried to simplify things by wrapping the two or three lines of code up as IPython magic, which means you can use a one liner.

The code can be found in this Githib repo: psychemedia/ipython_magic_folium.

To install:

pip install git+https://github.com/psychemedia/ipython_magic_folium.git

To load the magic in a Jupyter notebook:

%load_ext folium_magic

Then call as: %folium_map

The magic currently only works as line magic.

See the folium_magic_demo.ipynb notebook for examples, or run using Binder.

Binder

Display Map

  • -l, --latlong: latitude and longitude values, comma separated. If no value is provided a default location will be used;
  • -z, --zoom (default=10): set initial zoom level;

Add markers

  • -m, --marker: add a single marker, passed as a comma separated string with no spaces after commas; eg 52.0250,-0.7084,"My marker"

-M,--markers: add multiple markers from a Python variable; pass in the name of a variable that refers to:
– a single dict, such as markers={'lat':52.0250, 'lng':-0.7084,'popup':'Open University, Walton Hall'}
– a single ordered list, such as markers=[52.0250, -0.7084,'Open University, Walton Hall']
– a list of dicts, such as markers=[{'lat':52.0250, 'lng':-0.7084,'popup':'Open University, Walton Hall'},{'lat':52.0, 'lng':-0.70,'popup':'Open University, Walton Hall'}]
– a list of ordered lists, such as markers=[[52.0250, -0.7084,'Open University, Walton Hall'], [52., -0.7,'Open University, Walton Hall']]

If no -l co-ordinate is set to centre the map, the co-ordinates of the single marker, or the mid-point of the multiple markers, are used instead.

Display `geojson` file

  • -g, --geojson: path to a geoJSON file

If no -l co-ordinate is set to centre the map, the mid-point of the geojson boundary is used instead.

Display a Choropleth Map

A choropoleth map is displayed if enough information is provided to disaplay one.

  • -g/ --geojson: path to a geoJSON file
  • -d, --data: the data source, either in the form of a pandas dataframe, or the path to a csv data file
  • -c, --columns: comma separated (no space after comma) column names from the data source that specify: column to match geojson key,column containing values to display
  • -k, --key: key in geojson file to match areas with data values in data file;
  • optional:
  • -p, --palette: default='PuBuGn'
  • -o, --opacity: default=0.7

For example, load data from a pandas dataframe:

Or load from a data file:

 

This is still a bit fiddly because it requires you to add lat/longs for the base map and/or markers. But this could probably be addressed (ha!) by building in a geocoder, if I can find one that’s reliable and doesn’t require a key.

Fragment – Breaking Enigma Was Only Part of the Story…

Reading Who, me? They warned you about me? on ethics associated with developing new technologies, this quote jumped out at me: [m]y claim is that putting an invention into a public space inevitably makes that invention safer (Mike Loukides, December 7, 2017).

In recent years, the UK Government had several goes at passing bills that referred to the collection of communications data, the who, where, when and how of a communication but not its content .

[N]ot its content.

Many folk are familiar with stories of the World War Two codebreakers, the boffins, Alan Turing among them, who cracked the German enigma code. How they helped win the war by reading the content of enemy communications.

So given it was the content wot won it, we, cast as “enemies”, might conclude that the protecting the content is key. That the communications data is less revealing.

But that’s not totally true. Other important intelligence can be derived from traffic analysis, looking at communications between actors even if you don’t know the content of the messages.

If I know that X sent a message to Y  and Z five minutes before they committed a robbery on several separate connections, I might suspect that X  knew Y and Z, and was implicated in the crime, even if I didn’t know about the content of the messages.

Location data can also be used to draw similar inferences. For example, the Bloomberg article Mobile-Phone Case at U.S. Supreme Court to Test Privacy Protections describes a recent US Supreme Court case reviewing an appeal from a convicted armed robber who was in part convicted on the basis of evidence that data obtained from [his] wireless carriers to show he was within a half-mile to two miles of the location of four  … robberies when they occurred.

So what has this to do with “putting an invention into a public space”? Perhaps if the stories about how military intelligence made and makes use of traffic analysis and location analysis, and not just the content of decrypted messages, the collection of such data may not seem so innocuous…

When invention takes place in public, we (the public) know that it exists. We can become aware of the risks. Mike Loukides, December 7, 2017.

Just sayin’…