Embedding folium Maps In Jupyter Notebooks Using IPython Magic

Whilst trying to show how interactive maps can be embedded in a Jupyter notebook, one of the comments I keep backing back is that “It’s too hard” because you have to write two or three lines of code.

So I’ve tried to simplify things by wrapping the two or three lines of code up as IPython magic, which means you can use a one liner.

The code can be found in this Githib repo: psychemedia/ipython_magic_folium.

To install:

pip install git+https://github.com/psychemedia/ipython_magic_folium.git

To load the magic in a Jupyter notebook:

%load_ext folium_magic

Then call as: %folium_map

The magic currently only works as line magic.

See the folium_magic_demo.ipynb notebook for examples, or run using Binder.

Binder

Display Map

  • -l, --latlong: latitude and longitude values, comma separated. If no value is provided a default location will be used;
  • -z, --zoom (default=10): set initial zoom level;

Add markers

  • -m, --marker: add a single marker, passed as a comma separated string with no spaces after commas; eg 52.0250,-0.7084,"My marker"

-M,--markers: add multiple markers from a Python variable; pass in the name of a variable that refers to:
– a single dict, such as markers={'lat':52.0250, 'lng':-0.7084,'popup':'Open University, Walton Hall'}
– a single ordered list, such as markers=[52.0250, -0.7084,'Open University, Walton Hall']
– a list of dicts, such as markers=[{'lat':52.0250, 'lng':-0.7084,'popup':'Open University, Walton Hall'},{'lat':52.0, 'lng':-0.70,'popup':'Open University, Walton Hall'}]
– a list of ordered lists, such as markers=[[52.0250, -0.7084,'Open University, Walton Hall'], [52., -0.7,'Open University, Walton Hall']]

If no -l co-ordinate is set to centre the map, the co-ordinates of the single marker, or the mid-point of the multiple markers, are used instead.

Display `geojson` file

  • -g, --geojson: path to a geoJSON file

If no -l co-ordinate is set to centre the map, the mid-point of the geojson boundary is used instead.

Display a Choropleth Map

A choropoleth map is displayed if enough information is provided to disaplay one.

  • -g/ --geojson: path to a geoJSON file
  • -d, --data: the data source, either in the form of a pandas dataframe, or the path to a csv data file
  • -c, --columns: comma separated (no space after comma) column names from the data source that specify: column to match geojson key,column containing values to display
  • -k, --key: key in geojson file to match areas with data values in data file;
  • optional:
  • -p, --palette: default='PuBuGn'
  • -o, --opacity: default=0.7

For example, load data from a pandas dataframe:

Or load from a data file:

 

This is still a bit fiddly because it requires you to add lat/longs for the base map and/or markers. But this could probably be addressed (ha!) by building in a geocoder, if I can find one that’s reliable and doesn’t require a key.

Scratch Materials – Using Blockly Style Resources in Jupyter Notebooks

One of the practical issues associated with using the Scratch desktop application (or it’s OU fork, OUBuild) for teaching programming is that runs on the desktop (or perhaps a tablet? It’s an Adobe Air app which I think runs on iOS?). This means that the instructional material is likely to be separated from the application, either as print or as screen based instructional material.

OUBuild

If delivered via the same screen as the application, there can be a screen real estate problem when trying to display both the instructional material and the application.

In OU Build, there can also be issues if you want to have two projects open at the same time, for example to compare a provided solution with your own solution, or to look at an earlier project as you create a new one. The solution is to provide two copies of the application, each running its own project.

Creating instructional materials can also be tricky, requiring the capturing of screenshots from the application and then inserting them in the materials, along with the attendant risk when it comes to updating the materials that screenshots as captured in the course materials may drift from the actuality of the views in the application.

So here are a couple of ways that we might be able to integrate Scratch like activities and guidance into instructional materials.

Calysto/Metakernel Jigsaw Extension for Jupyter Notebooks

The Calysto/Metakernel* Jigsaw extension for Jupyter notebooks wraps the Google Blockly package for use in a Jupyter notebook.

Program code is saved as an XML file, which means you can save and embed multiple copies of the editor within the same Jupyter notebook. This means an example programme can be provided in one embed, and the learner can build up the programme themselves in another, all in the same page.

The use of the editor is a bit tricky – it’s easy to accidentally zoom in and out, and I’m guessing not very accessible, but it’s great as a scratchpad, and perhaps as an instructional material authoring environment?

Live example on Binderhub

For more examples, see the original Jigsaw demo video playlist.

For creating instructional materials, we should be able to embed multiple steps of a programme in separate cells, hiding the code input cell (that is, the %jigsaw line) and then export or print off the notebook view.

LaTeX Scratch Package

The LaTeX Scratch package provides a way of embedding Blockly style blocks in a document through simple LaTeX script.

Using a suitable magic we can easily add scripts to the document (the code itself could be hidden using the notebook Hide Code Cell Input extension.

We can also create scripts in strings and then render those using line magic.

Live example on Binderhub

One thing that might be quite interesting is a parser that can take the XML generated from the Jigsaw extension and generate LaTeX script from it, as well as generating a Jigsaw XML file from the LaTeX script?

Historical Context

The Scratch rebuild – OU Build – used in the OU’s new level 1 introductory computing course is a cross platform, Adobe Air application. I’d originally argued that if the earlier taken decision to use a blocks style environment was irreversible, the browser based BlockPy (review and code) application might be a more interesting choice: the application was browser based, allowed users to toggle between blocks and Python code views, displayed Python errors messages in a simplified form, and used a data analysis, rather than animation, context, which meant we could also start to develop data handling skills.

BlockPy

One argument levelled against adopting BlockPy was that it looked to be a one man band in terms of support, rather than the established Scratch community. I’m not sure how much we benefit from, or are benefit to, the Scratch community though? If OU Build is a fork,  we may or may not be able to benefit from any future support updates to the Scratch codebase directly. I don’t think we commit back?

If the inability to render animations had also been a blocker, adding an animation canvas as well as the charting canvas would have been a possibility? (My actual preference was that we should do a bigger project and look to turn BlockPy into a Jupyter client.)

Fragment – Breaking Enigma Was Only Part of the Story…

Reading Who, me? They warned you about me? on ethics associated with developing new technologies, this quote jumped out at me: [m]y claim is that putting an invention into a public space inevitably makes that invention safer (Mike Loukides, December 7, 2017).

In recent years, the UK Government had several goes at passing bills that referred to the collection of communications data, the who, where, when and how of a communication but not its content .

[N]ot its content.

Many folk are familiar with stories of the World War Two codebreakers, the boffins, Alan Turing among them, who cracked the German enigma code. How they helped win the war by reading the content of enemy communications.

So given it was the content wot won it, we, cast as “enemies”, might conclude that the protecting the content is key. That the communications data is less revealing.

But that’s not totally true. Other important intelligence can be derived from traffic analysis, looking at communications between actors even if you don’t know the content of the messages.

If I know that X sent a message to Y  and Z five minutes before they committed a robbery on several separate connections, I might suspect that X  knew Y and Z, and was implicated in the crime, even if I didn’t know about the content of the messages.

Location data can also be used to draw similar inferences. For example, the Bloomberg article Mobile-Phone Case at U.S. Supreme Court to Test Privacy Protections describes a recent US Supreme Court case reviewing an appeal from a convicted armed robber who was in part convicted on the basis of evidence that data obtained from [his] wireless carriers to show he was within a half-mile to two miles of the location of four  … robberies when they occurred.

So what has this to do with “putting an invention into a public space”? Perhaps if the stories about how military intelligence made and makes use of traffic analysis and location analysis, and not just the content of decrypted messages, the collection of such data may not seem so innocuous…

When invention takes place in public, we (the public) know that it exists. We can become aware of the risks. Mike Loukides, December 7, 2017.

Just sayin’…

OERs in Practice: Repurposing Jupyter Notebooks as Presentations

Over coffee following a maps SIG meeting last week, fellow Jupyter notebooks enthusiast Phil Wheeler wondered about the extent to which tutors / Associate Lecturers might be able to take course materials delivered as notebooks and deliver them in tutorials as slideshows.

The RISE notebook extension allows notebooks to be presented using the reveal.js HTML presentation framework. The slides essentially provide an alternative client to the Jupyter notebook and can be autolaunched using Binder. (I’m not sure if the autostart slideshow can also be configured to Run All cells before it starts?)

To see how this might work, I marked up one of the notebooks in my showntell/maths demo setting some of the cells to appear as slides in a presentation based on the notebook.

I also used the Hide Input Jupyter extension to hide code cell inputs so that the code used to generate an output image or interactive could be hidden from the actual presentation.

Get into the slideshow editor mode from the notebook View menu, select Cell Toolbar and then Slideshow. Reset the notebook disable using View > Cell Toolbar > None.

To run the presentation with code cell outputs pre-rendered, you first need to run all the cells. From the notebook Cell menu select Run All to execute all the cells.  You can now enter the slideshow using the Enter/Exit RISE Slideshow toolbar buttom (it looks like a bar chart). Exit the presentation using the cross in the topleft of the slideshow display.

[Live demo on Binderhub]

PS building on the idea of using mapping notebook cells into a reveal.js tagged display, I wonder if we could do something similar using a scrollytelling framework such as scrollama, scrollstory or idyll?

OERs in Practice: Re-use With Modification

Over the years, I’ve never really got my head round what other people mean by OERs (Opern Educational Resources) in terms of how they might be used.

From my own perspective, wholesale reuse (“macro reuse”) of a course isn’t relevant to me. When tasked with writing an OU unit, if I just point to a CC licensed course somewhere else and say “use that”, I suspect it won’t go down well.

I may want to quote a chunk a material, but I can do that with books anyway. Or I may want to reuse an activity, and then depending on how much rework or modification is applied, I may reference the original or not.

Software reuse is an another possibility, linking out to or embedding a third party application, but that tends to fall under the banner of openly licensed software reuse as much as OER reuse. Sometimes the embed may be branded; sometimes it may be possible to remove the branding (depending on how the asset is created, and the license terms), sometimes the resource might be a purely white label resource that can be rebranded.

Videos and audio clips are another class of resource that I have reused, partly because they are harder to produce. Video clips tend to come in various forms: on the one hand, things like lectures retain an association with the originator (a lecture given by Professor X of university Y is very obviously associated with Professor X and university Y); on the other hand, an animations, like software embeds, might come in a branded form, white labelled, or branded as distributed but white label licensed so you can remove/rebrand if you want to put the effort in.

Images are also handy things to be able to reuse, again because they can be hard to produce in at least two senses: firstly, coming up with the visual or graphical idea, i.e. how to depict something in a way that supports teaching or learning; secondly, actually producing the finished artwork. One widely used form of image reuse in the OU is the “redrawing” of an image originally produced elsewhere. This represents a reuse, or re-presentation, of an idea. In a sense, the image is treated as a sketch that is then redrawn.

This level of “micro reuse” of a resource, rather than the “macro reuse” of a course, is not something that was invented by OERs – academics have always incorporated and referenced words and pictures created by others – but it can make reuse easier by simplifying the permissions pathway (i.e. simplifying what otherwise might be a laborious copyright clearance process).

One of the other ways of making use of “micro” resources is to reuse them with modification.

If I share a text with you as a JPG of a PDF document, it can be quite hard for you to grab the text and elide a chunk of it (i.e. remove a chunk of it and replace it with … ). If I share the actual text as text, for example, in a Word document, you can edit it as you will.

Reuse with modification is also a fruitful way of reusing diagrams. But it can be harder to achieve in practical terms. For example, in a physics or electronics course, or a geometry course, there are likely to be standard mechanical principle diagrams, electrical circuits or geometrical proofs that you are likely to want to refer to. These diagrams may exist as openly licensed resources, but… The numbers or letters you want to label the diagram with may not be the same as in the original. So what do you do? Redraw the diagram? Or edit the original, which may reduce the quality of the original or introduce some visual artefact the reveals the edit (“photocopy lines”!).

But what if the “source code” or means of producing the diagram. For example, if the diagram is created in Adobe Illustrator or CorelDRAW and the diagram made available as an Adobe Artwork .ai file or a  CorelDRAW .cdr file, and you have an editor (such as the original, or an alternative such as Inkscape) that imports those file formats, you can edit and regenerate a modified version of the diagram at the same level of quality as the original. You could also more easily restyle the diagram, even if you don’t change any of the content. For example, you could change line thickness, fonts or font sizes, positioning, and so on.

One of the problems with sharing image project files for particular applications is that the editing and rendering environment for working with project file is likely separate from your authoring environment. If, while writing the text, you change an item in the text and want to change the same item as referenced in the image, you need to go to the image editor, make the change, export the image, copy it back into your document. This makes document maintenance hard and subject to error. It’s easy for the values of the same item as referenced in the text and the diagram to drift. (In databases, this is why you should only ever store the value of something once and then refer to its value by reference. If I have your address stored in two places, and you change address, I have to remember to change both of them; it’s also quite possible that the address I have for you will drift between the two copies I have of it…)

One way round this is to include the means for creating and editing the image within your text document. This is like editing a Microsoft Word document and including a diagram by using Microsoft drawing tools within the document. If you share the complete document with someone else, they can modify the diagram quite easily. If you share a PDF of the document, they’ll find it harder to modify the diagram.

Another way of generating diagrams is to “write” it, creating a “program” that defines how to draw the diagram and that can be run in a particular environment to actually produce the diagram. By changing the “source code” for the diagram, and rerunning it, you can generate a modified version of the diagram in whatever format you choose.

This is what packages like TikZ support [docs].

And this is what I’ve been exploring in Jupyter notebooks and Binderhub, where the Jupyter notebook contains all the content in the output document, including the instructions to create image assets or interactives, and the Binder container contains all the software libraries and tools required to generate and embed the image assets and interactives within the document from the instructions contained within the document.

That’s what I was trying to say in Maybe Programming Isn’t What You Think It Is? Creating Repurposable OERs (which also contains a link to a runnable example).

PS by the by, I also stumbled across this old post, an unpursued bid, today, that I have no recollection of at all: OERs: Public Service Education and Open Production. Makes me wonder how many other unfinished bids I started…

Maybe Programming Isn’t What You Think It Is? Creating Repurposable & Modifiable OERs

With all the “everyone needs to learn programming” hype around, I am trying to be charitable when it comes to what I think folk might mean by this.

For example, whilst trying to get some IPython magic working, I started having a look at TikZ, a LaTex extension that supports the generation of scientific and mathematical diagrams (and which has been around for decades…).

Getting LaTeX environments up and running can be a bit of a pain, but several of the Binderhub builds I’ve been putting together include LateX, and TikZ,  which means I have an install-free route trying snippets of TikZ code out.

As an example, in my showntell/maths demo includes an OpenLearn_Geometry.ipynb notebook that includes a few worked examples of how to “write” some of the figures that appear in an OpenLearn module on geometry.

From the notebook:

The notebook includes several hidden code cells that generate the a range of geometric figures. To render the images, go to the Cell menu and select Run All.

To view/hide the code used to generate the figures, click on the Hide/Reveal Code Cell Inputs button in the notebook toolbar.

To make changes to the diagrams, click in the appropriate code input cell, make your change, and then run the cell using the Run Cell (“Play”) button in the toolbar or via the keyboard shortcut SHIFT-ENTER.

Entering Ctrl-Z (or CMD-Z) in the code cell will undo your edits…

Launch the demo notebook server on Binder here.

Here’s an example of one of the written diagrams (there may be better ways; I only started learning how to write this stuff a couple of days ago!)

Whilst tinkering with this, a couple of things came to mind.

Firstly, this is programming, but perhaps not as you might have thought of it. If we taught adult novices some of the basic programming and coding skills using Tikz rather than turtle, they’d at least be able to create professional looking diagrams. (Okay, so the syntax is admittedly probably a bit scary and confusing to start with… But it could be simplified with some higher level, more abstracted, custom defined macros that learners could then peek inside.)

So when folk talk about teaching programming, maybe we need to think about this sort of thing as well as enterprise Java. (I spent plenty of time last night on the Stack Exchange TEX site!)

Secondly, the availability of things like Binderhub make it easier to build preloaded distributions that can be run by anyone, from anywhere (or at least, for as long as public Binderhub services exist). Simply by sharing a link, I can point you to a runnable notebook, in this case, the OpenLearn geometry demo notebook mentioned above.

One of the things that excites me, but I can’t seem to convince others about, is the desirability of constructing documents in the way the OpenLearn geometry demo notebook is constructed: all the assets displayed in the document are generated by the document. What this means is that if I want to tweak an image asset, I can do. The means of production – in the example, the TikZ code – is provide; it’s also editable and executable within the Binder Jupyter environment.

When HTML first appeared, web pages were shonky as anything, but there were a couple of buts…: the HTML parsers were forgiving, and would do their best to whatever corruption of HTML was thrown at them; and the browsers supported the ability to View Source (which still exists today; for example, in Chrome, go to the View menu then select Developer -> View Source).

Taken together, this meant that: a) folk could copy and paste other people’s HTML and try out tweaks to “cool stuff” they’d seen on other pages; b) if you got it wrong, the browser would have a go at rendering it anyway; you also wouldn’t feel as if you’d break anything serious by trying things out yourself.

So with things like Binder, where we can build disposable “computerless computing environments” (which is to say, pre-configured computing environments that you can run from anywhere, with just a browser to hand), there are now lots of opportunities to do powerful computer-ingy things (technical term…) from a simple, line at a time notebook interface, where you (or others) can blend notes and/or instruction text along with code – and code outputs.

For things like the OpenLearn demo notebook, we can see how the notebook environment provides a means by which educators can produce repurposeable documents, sharing not only educational materials for use by learners, or appropriation and reuse by other educators, but also the raw ingredients for producing customised forms of the sorts of diagrams contained in the materials: if the figure doesn’t have the labels you want, you can change them and re-render the diagram.

In a sense, sharing repurposeable, “reproducible” documents that contain the means to generate their own media assets (at least, when run in an appropriate environment: which is why Binderhub is such a big thing…) is a way of sharing your working. That is, it encourages open practice, and the sharing of how you’ve created something (perhaps even with comments in the “code” explaining why you’ve done something in a particular way, or where the inspiration/prior art came from), as well as the what of the things you have produced.

That’s it, for now… I’m pretty much burned out on trying to persuade folk of the benefits of any of this any more…

PS TikZ and PGF TikZ and PGF: TeX packages for creating graphics programmatically. Far more useful than turtle and Scratch?

Amazon AWS re:Invent Round-Up…

At the Amazon AWS re:Invent event last week, Amazon made a slew of announcements relating to new AWS service offerings. Here’s a quick round-up of some of the things I noticed, with links to announcement blog posts rather than the actual services themselves…

First up, AWS Cloud9,  a browser based Integrated Development Environment (IDE) for writing, running, and debugging code. AWS have been moving into developer and productivity tools for some time, and this is another example of that.

For the non-developer,  Amazon Sumerian may be of interest, providing a range of tools and resources that allow anyone to create and run augmented reality (AR), virtual reality (VR), and 3D applications. The interface is a GUI driven one, so it’ll be interesting to see what Amazon have made of it compared to the horrors of their developer service UIs…

Whilst text editors are the preferred environment by “real” developers, many of the rest of us find Jupyter notebooks a more accommodating environment. So it’s interesting to see Amazon using them as part of their SageMaker service, a fully managed end-to-end machine learning service that enables data scientists, developers, and machine learning experts to quickly build, train, and host machine learning models at scale. The notebooks look to be used for data exploration and cleaning, whilst other components include model building, training, and validation using Docker containers, with model hosting that can also provide A/B testing of multiple models simultaneously.

On the back end, AWS already offer a range of database services, but now there’s also a graph database called Neptune. Graphs provide a really powerful way of thinking about and querying datasets – they really should be taught more and at an earlier stage computing education – so it’s nice to see support for graph databases growing.

I’m not sure how closely Neptune ties in to the new AWS AppSync, a fully managed serverless GraphQL service for real-time data queries, synchronization, communications and offline programming features? Skimming the announcement blog post it looks as if it ties in to DynamoDB, so I’m not sure? Perhaps this is more about using it as a responsive data query language and server-side runtime for querying data sources that allow for real-time data retrieval and dynamic query execution?

Whenever I try to use an Amazon Web Service, I find myself in the middle of configuration screen hell. When I saw the AWS Fargate service announcement open with the claim that [a]t AWS we saw [container management solutions] as an opportunity to remove some undifferentiated heavy lifting, it made me laugh. I find it easy to launch containers at places like Digital Ocean via Docker machine, but now perhaps things will be easier on AWS?

Of course, containers are still virtualised offerings – maybe you really want access to the bare metal of servers you probably don’t have hanging around at home?

  • Processing: two Intel Xeon E5-2686 v4 processors running at 2.3 GHz, with a total of 36 hyperthreaded cores (72 logical processors);
  • Memory: 512 GiB;
  • Storage: 15.2 terabytes of local, SSD-based NVMe storage;
  • Network: 25 Gbps of ENA-based enhanced networking.

That do you?

If you aren’t taken by the idea of running your data through AI models running in the cloud, even if it is on your own bare metal rented servers, you might fancy running them locally. DeepLens isn’t just a video camera with a 4 megapixel (1080P) camera and 2D microphone array, with 8GB of memory, Ubuntu 16.04 that can download prebuilt models and run your data through them. If it’s anything like the AI packing Lockheed Martin F35 stealth jet, I’m not sure what it phones home, though?

Who remembers dialling the speaking clock when they were a kid? Here’s a modern day version of a universal time signal that can help you keep distributed things in synch with each other. The new  Amazon Time Sync Service provides a time synchronization service delivered over Network Time Protocol (NTP) which uses a fleet of redundant satellite-connected and atomic clocks in each region to deliver a highly accurate reference clock. NTP has been around for ages, but providing a reliable service as just another AWS service call helps build lock in to the AWS ecosystem. (In part this reminds me of Google making a bid for DNS domination several years ago.)

Although Amazon have until now shied away from offering their own operating system, (Fire OS never really went anywhere), I wonder if they are making a play for the internet of things with Amazon FreeRTOS, an IoT microcontroller operating system … that extends the FreeRTOS kernel, a popular real-time operating system, with libraries that enable local and cloud connectivity, security, and (coming soon) over-the-air updates. Hmm… Android wasn’t a Google initiative originally…

And if they can’t get you to use their IoT O/S, maybe you will avail yourself of the IoT Device Defender. Details look light on this at the moment, but this puts a marker in the sand…

With Amazon Echo, Amazon made an early play for voice devices in the home. One of the benefits of getting services out there and used by folk means you have more training data to feed your AI services. So it’s perhaps not surprising that there’s a push on a new batch of voice related services:

  • Amazon Comprehend, a continuously-trained trained Natural Language Processing (NLP) service. Features are typical of this sort of service, as offered already by the likes of Google, Microsoft and IBM: language detection, entity detection, sentiment analysis, and key phrase extraction, but I’m not sure I’ve spotted topic modeling as a service before (but then, I haven’t been looking);
  • Amazon Translate is a high-quality neural machine translation service that … provide[s] fast language translation of text-based content;
  • Amazon Transcribe is an automatic speech recognition (ASR) service. Apparently, audio files stored on the Amazon Simple Storage Service (S3) can be analysed directly, with timestamps provided for each word and inferred punctuation

It wasn’t so very long ago that YouTube hadn’t event been imagined yet. But the pace of change is such that if you want to build your own, you probably can do, complete with monetisation services. AWS Media Services is an array of broadcast-quality media services, offering:

  • file-based transcoding for OTT, broadcast, or archiving. Features apparently include multi-channel audio, graphic overlays, closed captioning, and several DRM options;
  • live encoding to deliver video streams in real time to both televisions and multiscreen devices. Support for ad insertion, multi-channel audio, graphic overlays, and closed captioning;
  • video origination and just-in-time packaging that takes a single input and produces output for multiple devices, with support for multiple monetization models, time-shifted live streaming, ad insertion, DRM, and blackout management;
  • media-optimized storage that enables high performance and low latency applications such as live streaming;
  • monetization services that support ad serving and server-side ad insertion and accurate reporting of server-side and client-side ad insertion.

Of course, if you don’t want to become an over the top TV broadcaster, you could always use a couple more of the  new video services as a part of your own state surveillance system.

For example, Amazon Kinesis Video Streams can ingest streaming video (or other time-encoded data) from millions of camera devices without having to set up or run your own infrastructure. Hook that into to your public traffic cams and process it…

…perhaps with Amazon Rekognition Image, which provides scalable image recognition and analysis, with object and scene detection, real-time facial recognition, celebrity recognition and text recognition. Apparently, it’s the first video analysis service of its kind that uses the complete context of visual, temporal, and motion of the video to perform activity detection and person tracking. Oh good…