September 2018 – OUseful.Info, the blog…

Fragment – ROER: Reproducible Open Educational Resources

Fragment, because I’m obviously not making sense with this to anyone…

In the words of David Wiley (@opencontent), in defining the “open” in open content and open educational resources [link], he identifies “the 5R activities” that are supported by open licensing:

Retain – the right to make, own, and control copies of the content (e.g., download, duplicate, store, and manage)

Reuse – the right to use the content in a wide range of ways (e.g., in a class, in a study group, on a website, in a video)

Revise – the right to adapt, adjust, modify, or alter the content itself (e.g., translate the content into another language)

Remix – the right to combine the original or revised content with other material to create something new (e.g., incorporate the content into a mashup)

Redistribute – the right to share copies of the original content, your revisions, or your remixes with others (e.g., give a copy of the content to a friend)

Whilst the legal framework is the one that has to be in place for educational institutions as publishers of (third party) content, where there is particular emphasis on citing others and not reusing content in an unacknowledged way, I have always been more interested in the practice of reusing content, particular when that means reuse with modification.

Various others have suggested sixth Rs. For example, Chris Aldridge’s The Sixth “R” of Open Educational Resources identifies “Request update (or maybe pull Request, Recompile, or Report to keep it in the R family?)“, in the sense of keeping stuff in a repo so people can fork the report, update it and track changes (maybe Revisions is a better R for that?). Rather than be an R relating to a right you can assert, Revisions is more about the practice.

As a trawl through the history of this blog suggests (for example, Open Content Anecdotes from nigh on a decade ago), I’ve also been less interested in the legal framework around OERs than I am in the practical reuse (with modification) of particular (micro? atomic?) assets, such as diagrams, or problem sets. That is, the things you are more likely to spot as relevant, useful or interesting and weave into your own materials, or replace your own crappy diagrams.

To a large extent, the legal bit doesn’t stop me, particularly if no-one finds out. The blocker is in the practicalities associated with reversioning the physical asset, and making actual changes or modifications to it, that makes it appropriate for my course.

(You can always redraw diagrams, which can also help you get round copyright on non-openly licensed works, but that takes time, skill, and maybe a drawing package you don’t have access to.)

So the idea that I started trying to crystallise out almost a year ago now — OERs in Practice: Re-use With Modification — is based around another R, a leading R, or a +1 R, which (as with Aldridge’s suggested R) is a practice based R: Reproducibility. (Hmm.. maybe this post should have been The 5+2Rs of Open Content?)

By their very nature, these resources are resources that include their own “source code” for creating assets, so that if you want to create a modified version of the asset, you can modify the source and then regenerate the asset. A trivial example is to use diagrams that are “written diagrams” – diagrams generated from textual, written descriptions or codified versions of them and rendered from the description by a particular chart generating tool (for example, Writing Diagrams, Writing Diagrams – Boxes and Arrows and Writing Diagrams (Incl. Mathematical Diagrams)).

As to why this is a fragment, I’m stopping here… discussion about reproducibility is elsewhere, and will be to follow too, along with why this approach is opens up new opportunities for educators as well as learners. For now, see these other fragments on that topic in date order.

[Update: via a comment, @opencontent reminds me that he also made the distinction between legal and practical issues, with practical concerns raised in the ALMS framework – I should have read on from the legal issues to the Poor Technical Choices Make Open Content Less Open section… See the comment thread to this post for more, as well as this related post from the previous time the ALMS model was raised to my attention: Open ALMS. I also note this recent IRRODL paper on Defining OER-Enabled Pedagogy, which I need to read through…]

PS some more related fragments from a hastily written, unsuccessful internal Esteem Project bid:

Description

One problem associated with producing rich educational materials is that inconsistencies can occur when cross referencing text with media assets such as charts, tables, diagrams and computer code produced via different production routes. The project will explore and demonstrate how emerging technologies and workflows developed to support reproducible work practices can be adopted for the development of reproducible educational resources, including but not limited to educational materials rich in mathematical content, scientific / engineering diagrams, exploratory and explanatory statistics, maps and geospatial analysis, music theory and analysis, interactive browser based activities, animations and dynamically created audio assets.

The aim is to demonstrate:

The range of assets that can be produced / directly authored by academics including static, animated and interactive elements such as print quality drawings, animated scientific diagrams, and interactive web activities and applications (eg interactive maps, 3D models, etc.)

The workflows associated with demonstrating the production, maintenance, and reuse with modification of the assets and works derived from them, including but not limited variations on a theme in the production of parameterised assessment materials

The potential for using reproducible materials to facilitate maintenance, reversioning / updating and reuse with modification of module materials The outputs will include:

A range of OU module and OpenLearn unit materials reworked using the proposed technologies and workflows

A library of reproducible educational resource templates for a range of topic areas capable of being reused with modification in order to produce a range of assets from a common template.

The project will demonstrate how freely available, open source technologies can be used to support the direct authoring of rich and interactive media assets in a reproducible way.

Rationale

Reproducible research tools increasingly support the direct authoring of rich documents that blend text, data, code, code outputs, with media assets (audio, video, static and animated images, interactives) generated from text based computer scripts.

The project proposes the co-option of such tools for use as authoring tools for reproducible educational materials. A single “source document” can include text as well as scripts for generating tables and charts, for example, from data contained within the document itself, minimising the distance between the production of assets from the materials they are used in.

The resulting workflow supports consistency in production and maintenance as well as reuse with modification thereafter by allowing updates in situ that can be used to recreate modified assets (diagrams created dynamically using updated values, for example). Materials will also be modifiable by ALs for tutorial use.

Authoring tools also support the direct authoring and creation of interactive components based around templated third party widgets such as 3D molecule viewers, or interactive maps, allowing authors to directly author interactive components that can be embedded in online / browser accessed materials.

Examples of the sorts of assets I had in mind to rework can be found in several of the notebooks available here.

Jupyter Notebooks Seep into the Everyday…

If you ever poke around in developer documentation, you’ll be familiar with the idea that it often contains code fragments, and things to copy and paste into the command line.

This morning, I noticed an announcement around the Android Management API, and in particular the detail of the Quick Start Guide:

To get started with the Android Management API, we’ve created a Colab notebook that you can follow to enroll (sic) an enterprise, create a policy, and provision a device.

… a Colab notebook [their strong emphasis]…

Which is to say, a Jupyter notebook running on Google’s collaborative Colab notebook platform.

Here’s what the start of the quick start docs, notebook style, look like:

As you’d expect, a bled of text and code. You can see the first code block at the bottom of the screenshot where you can enter your own personal project id; later code cells walk you through connecting to the API and working with it:

So – that’s interesting thing number one… Google using notebooks just anyway as part of “operationalised” tutorial materials (Interactive documentation? Operationalised documentation?

See also: [Jupyter Notebooks for Scheduling HESA Data Returns? If It’s Good Enough for Netflix…] (https://blog.ouseful.info/2018/09/25/jupyter-notebooks-for-hesa-data-returns/) for how notebooks can be used in production environments for scheduling operations and not just for analysis.

Second thing: Colab. It’s been some time since I looked at it, but one thing that jumped out at me was the inclusion of code snippets in the sidebar…

These working demo code snippets can be added the document at the cursor point:

Running them produces the example output, as you’d expect, in the output part of the cell:

Snippets to add linking and brushing (data vis 101; look it up…) or linked charts are also available:

I’m not sure if this sort of interaction (in terms of getting interactive assets into a document) is what the OpenCreate team are looking to provide, but this may be worth looking at just to see how the interaction feels, and the way in which it allows authors to add live interactive charts, for example, to a content notebook.

In other news, my belated and hastily written project bid to the internal Esteem group to look at creating “reproducible educational materials” as demonstrated by reworking examples of OpenLearn materials was rejected as “not a clearly defined scholarship project”. I’m going to carry on with it anyway, because I think we can learn a lot from it about how notebook style environments can be used to produce:

new forms of interactive educational material, opened up by the availability of “generative” content creation code/magics;
educational resources that are by their very “source included” nature capable of reuse with modification;
educational resources that can provide students with a rich range of interactive activities, presented inline/contextualised in a narrative document, added to the document by authors directly with little, if any, technical / programming skills;
an interactive environment where students/learners can create their own interactives and / or generative examples to explore a particular topic or idea, again without the need for much in the way of technical / programming skills.

I’m also going to start working on a set of training resources around Jupyter notebooks using my 0.2FTE not OU time; depending on how that goes, I may try to turn Jupyter training / development into an exit plan. Please get in touch — tony.hirst@open.ac.uk — if this sounds of interest…

Name (Date) Title, Available at: URL (Accessed: DATE): So What?

Academic referencing is designed, in part, to support the retrieval of material that is being referenced, as well as recognising provenance.

The following guidance, taken from the OU Library’s Academic Referencing Guidelines, is, I imagine, typical:

That page appears in an OU hosted Moodle course (OU Harvard guide to citing references) that requires authentication. So whilst stating the provenance, it won’t necessarily support the retrieval of content from that site for most people.

Where an (n.d) — no date — citation is provided, it also becomes hard for someone checking the page in the future whether or not the content has changed, and if so, which parts.

Looking at the referencing scheme for organisational websites, there’s no suggestion that authentication is required is listed in the citation (the same is true in the guidance for citing online newspaper articles).

I also didn’t see guidance offhand for how to reference pages where the page presentation is likely customised by “an algorithm” according to personal preferences or interaction history; placement of things like ads are generally dynamic, and often personalised (personalisation may be based on multiple things, such as the cookie state of the browser with which you are looking at a page, or the history of transactions (sites visited) from the IP address you are connecting to a site from).

This doesn’t matter for static content, but it does matter if you want to reference something like a screenshot / screencapture, for example showing the results of a particular search on a web search engine. In this case, adding a date and citing the page publisher (that is, the web search engine, for example) is about as good as you can get, but it misses a huge amount of context. The fact that you got extremist results might be because your web history reveals you to be a raging fanatic, and the fact that you grabbed the screenshot from the premises of your neo-extremist clubhouse just added more juice to the search. One partial solution to disabling personalisation features might be to run a search in a “private” browser session where cookies are disabled, and cite that fact, although this still won’t stop IP address profiling and browser fingerprinting.

I’ve pondered related things before, eg when asking Could Librarians Be Influential Friends? And Who Owns Your Search Persona?, as well as in a talk given 10 years ago and picked up at the time by Martin Weller on his original blog site (Your Search Is Valuable To Us; or should that be: Weller, M. (2008) 'Your Search Is Valuable To Us' *The Ed Techie*, 9 September [Blog] Available at http://nogoodreason.typepad.co.uk/no_good_reason/2008/10/your-search-is-valuable-to-us.html (Accessed 26 September 2018).?).

Most of the time, however, web references are to static content, so what role does the Accessed on date play here? I can imagine discussions way back when, when this form was being agreed on (is there a history of the discussion that took place when formulating and adopting this form?) where someone said something like “what we need is to record the date the page was accessed on and capture it somewhere“, and then the second part of that phrase was lost or disregarded as being too “but how would we do that?”…

One of the issues we face in maintaining OU courses, where content starts being written 2 years before a course start and is expected to last for 5+ years of presentation, is maintaining the integrity of weblinks. Over that period of time, you might expect pages to change in a couple of ways, even if the URL persists and the “content” part remains largely the same:

the page style (that is, the view as presented) may change;
the surrounding navigation or context (for example, sidebar content) may change.

But let’s suppose we can ignore those. Instead, let’s focus on how we can try to make sure that the a student can follow a link to the resource we intend.

One of the things I remember from years ago were conversations around keeping locally archived copies of webpages and presenting those copies to students, but I’m not sure this ever happened. (Instead, there was a short of middle ground compromise of running link checkers, but I think that was just to spot 404 page not found errors rather than checking a hash made on the content you were interested in, which would be difficult.)

At one point, I religiously kept archived copies of pages I referenced in course materials so that if the page died, I could check back on my own copy to see what the sense of the page now lost was so I could find a sensible alternative, but a year or two off course production and that practice slipped.

Back to the (Accessed DATE) clause. So what? In Fragment – Virtues of a Programmer, With a Note On Web References and Broken URLs I mentioned a couple of Wikipedia bots that check link integrity on Wikipedia (see also: Internet Archive blog: More than 9 million broken links on Wikipedia are now rescued). These can perform actions like archiving web pages, checking links are still working, and changing broken links to point to an archived copy of the same link. I hinted that it would be useful if the VLE offered the same services. They don’t, at least, not going by reports from early starters to this year’s TM351 presentation who are already flagging up broken links (do we not run a link checker anymore? (I think I asked that in the Broken URLs post a year ago, too…?)

Which is where (Accessed DATE) comes in. If you do accede to that referencing convention, why not make sure that that an archived copy of that page, ideally made on that date. Someone chasing the reference can then see what you accessed, and perhaps if they are visiting the page somewhen in the future, see how the future page compares with the original. (This won’t help with authentication controlled content or personalised page content though.)

An easy way of archiving a page in a way that others can access it is to use the Internet Archive’s Wayback Machine (for example, If You See Something, Save Something – 6 Ways to Save Pages In the Wayback Machine).

From the Wayback Machine homepage, you can simply add a link to a page you want to archive:

hit SAVE NOW (note, this is saving a different page; I forgot to save the screenshot of the previous one, even though I had grabbed it. Oops…):

and then you have access to the archived page, on the date it was accessed:

A more useful complete citation would now be Weller, M. (2008) 'Your Search Is Valuable To Us' *The Ed Techie*, 9 September [Blog] Available at http://nogoodreason.typepad.co.uk/no_good_reason/2008/10/your-search-is-valuable-to-us.html (Accessed 26 September 2018. Archived at https://web.archive.org/web/20180926102430/http://nogoodreason.typepad.co.uk/no_good_reason/2008/10/your-search-is-valuable-to-us.html).

Two more things…

Firstly, my original OUseful.info blog was hosted on an OU blog server; when that was decommissioned, I archived the posts on a subdomain of .open.ac.uk I’d managed to grab. That subdomain was deleted a few months ago, taking with it the original blog archive. Step in the Wayback Machine. It didn’t have a full copy of the original blog site, but I did manage to retrieve quite a few of the pages using this wayback machine downloader using the command wayback_machine_downloader http://blogs.open.ac.uk/Maths/ajh59 or for a slightly later archivewayback_machine_downloader http://ouseful.open.ac.uk/blogarchive. I made the original internal URLs relative (find . -name '*.html' | xargs perl -pi -e 's/http:\/\/blogs.open.ac.uk\/Maths\/ajh59/./g'), (or as appropriate for http://ouseful.open.ac.uk/blogarchive), used a similar approach to remove tracking scripts from the pages, uploaded the pages to Github (psychemedia/original-ouseful-blog-archive), enabled the repo as a Github pages site, and the pages are now at https://psychemedia.github.io/original-ouseful-blog-archive/pages/ It looks like the best archive is at the UK Web Archive, but I can’t see a way of getting a bulk export from that? https://www.webarchive.org.uk/wayback/archive/20170623023358/http://ouseful.open.ac.uk/blogarchive/010828.html

Secondly, bots; VLE bots… Doing some maintenance on TM351, I notice it has callouts to other OU courses, including TU100, which has been replaced by TM111 and TM112. It would be handy to be able to automatically discover references to other courses made from within a course to support maintenance. Using some OU-XML schema markup to identify such references would be sensible? The OU-XML document source structure should provide a veritable playground for OU bots to scurry around. I wonder if there are any, and if so, what do they do?

PS via Richard Nurse, reminding me that Memento is also useful when trying to track down original content and/or retrieve content for broken link pages from the Internet Archive: UK Web Archive Mementos search and time travel.

Richard also comments that “OU modules are being web archived by OU Archive – 1st, mid point and last presentation -only have 1 in OUDA currently https://www.open.ac.uk/library/digital-archive/web/warc:11589fbe-f596-4730-ac81-98eb10d4042e … staff login only – on list to make them more widely available but prob only to staff given 3rd party rights in OU courses“. Interesting…

PPS And via Herbert Van de Sompel, a list of archives accessed via time travel, as well as a way of decorating web links to help make them a bit more resilient: Robust Links – Link Decoration.

By the by, Richard, Kevin Ashley and @cogdog/Alan also point me to the various browser extensions that make life easier adding pages to archives or digging into their history. Examples here: Memento tools. I’m not sure what advice the OU Library gives to students about things like this; certainly my experience of interactions with students, academics and editors alike around broken links suggests that not many of them are aware of the Internet Archive / UK web Archive, Wayback Machine, etc etc?

OUseful.info – where the lede is usually buried…

Health and Safety At Work – AI

Via @willperrin, a statement about the Health and Safety at Work Act in the context of AI (Industrial Health and Safety: Artificial Intelligence:Written question – HL8200, asked 23rd May, 2018):

Department for Work and Pensions
Industrial Health and Safety: Artificial Intelligence

Asked by Lord Stevenson of Balmacara
Asked on: 23 May 2018
HL8200

To ask Her Majesty’s Government what assessment they have made of the extent to which section 6 of the Health and Safety at Work etc. Act 1974 applies to artificial intelligence or machine learning software that is used in the workplace to (1) control or animate physical things in the workplace, (2) design articles for use in the workplace, or (3) support human decision-making processes running on computers under the control of the employer with an impact on people’s health and safety; and whether, in each case, testing regimes exist as set out in section 6(1)(b) of that Act.

And the response?

Answered by: Baroness Buscombe
Answered on: 05 June 2018

Section 6 of the Health and safety at Work etc. Act 1974 places duties on any person who designs, manufacturers, imports or supplies any article for use at work to ensure that it will be safe and without risks to health, which applies to artificial intelligence and machine learning software. Section 6(1)(b) requires such testing and examination as may be necessary to ensure that any article for use at work is safe and without risks but does not specify specific testing regimes. It is for the designer, manufacturer, importer or supplier to develop tests that are sufficient to demonstrate that their product is safe.

The Health and Safety Executive’s (HSE) Foresight Centre monitors developments in artificial intelligence to identify potential health and safety implications for the workplace over the next decade. The Centre reports that there are likely to be increasing numbers of automated systems in the workplace, including robots and artificial intelligence. HSE will continue to monitor the technology as it develops and will respond appropriately on the basis of risk.

(By the by, what happened to the capitalisation of the Act name in the answer?)

My idle tweets to HSC on this never got far, so this answer is interesting…

Thinks: one to add to the RSS feed, a standing query on mentions of artificial intelligence in written questions and answers (feed).

Jupyter Notebooks for Scheduling HESA Data Returns? If It’s Good Enough for Netflix…

I’ve had so much of the Jupyter Kool Aid I now wonder of most computer related activities “could notebooks help with that?“.

Adopting a new technology like notebooks is not necessarily an easy thing to do, because it can transform workflows and allows, perhaps even encourages, people to do things differently. Adopting a technology that, to get the most from it, requires change means that right there you have a blocker to adoption: change has a high activation energy.

If I was going to draw a diagram (in a reproducible way), I might start by poking around the LaTEX endiagram package to see if I could tweak the labels to something I wanted (or disable labels and add my own as an overplot).

As an end user development environment, notebooks give you web browser access to fully blown programming languages, an environment to run then in all manner of interactive widgets, and through other Jupyter warez, various means of deploying them as interactive web apps or web APIs.

Enter stage left this OU job ad for a Developer (HESA Data Futures):

HESA Data Futures is a new project being set up to deliver systems to support the move from annual retrospective returns of student data to the Higher Education Statistics Agency (HESA) to in-year continuous data submission from 2019/20. We are looking for an enthusiastic Developer to work as part of a small team to redevelop our current systems.

You will be responsible for designing, building, testing and implementing software components for the University’s HESA Data Futures system. The University is participating in the HESA Data Futures pilots and you will be actively engaging with HESA and other institutions participating in the pilot.

You will have experience of developing and maintaining systems to a high standard using languages such as SAS, Python, SQL, XML. The project is at an early stage and you will be encouraged to use your experience to influence the choice of software platform, programming language, database and tools.

Hmmm… The project is at an early stage and you will be encouraged to use your experience to influence the choice of software platform, programming language, database and tools.

Thinks… Netflix use notebooks across the organisation for working with data:

When thinking about the future of analytics tooling, we initially asked ourselves a few basic questions:

What interface will a data scientist use to communicate the results of a statistical analysis to the business?

How will a data engineer write code that a reliability engineer can help ensure runs every hour?

How will a machine learning engineer encapsulate a model iteration their colleagues can reuse?

We also wondered: is there a single tool that can support all of these scenarios?

Netflix Tech Blog: Scheduling Notebooks at Netflix.

The answer…? Yep, you guessed it….

In particular, to support reporting tasks, Netflix contributed to the papermill project. As the docs tell it, Papermill lets you:

parameterize notebooks

execute and collect metrics across the notebooks

summarize collections of notebooks

This opens up new opportunities for how notebooks can be used. For example:

Perhaps you have a financial report that you wish to run with different values on the first or last day of a month or at the beginning or end of the year, using parameters makes this task easier.

Do you want to run a notebook and depending on its results, choose a particular notebook to run next? You can now programmatically execute a workflow without having to copy and paste from notebook to notebook manually.

Do you have plots and visualizations spread across 10 or more notebooks? Now you can choose which plots to programmatically display a summary collection in a notebook to share with others.

The list of papermill committers includes at least one person with a Netflix affiliation, and I only checked two…

See also the Scheduling Notebooks post for more detail of how papermill is used at Netflix.

Perhaps not surprisingly, the way in to the Jupyter ecosystem for Netflix was data analysts using them, but now they provide a unifying interface across all areas of the organisation (Beyond Interactive: Notebook Innovation at Netflix):

Data Scientist: run an experiment with different coefficients and summarize the results

Data Engineer: execute a collection of data quality audits as part of the deployment process

Data Analyst: share prepared queries and visualizations to enable a stakeholder to explore more deeply than Tableau allows

Software Engineer: email the results of a troubleshooting script each time there’s a failure

For a more detailed dive, listen to episode 54 of the Data Engineering podcast: Using Notebooks As The Unifying Layer For Data Roles At Netflix with Matthew Seal.

As for the coding requirements? 1) “everyone should learn to code” ;-); 2) a line at a time is often all you need (this was the basis of our free online Learn to Code for Data Analysis shortcourse). Try it and see…

Here’s a 5 minute keynote from this year’s Jupytercon which also describes how notebooks are currently used at Netflix, although you may want to skip the first two minutes, which is largely a Netflix ad (Beyond Interactive: Scaling Impact with Notebooks at Netflix – Michelle Ufford (Netflix)):

There’s another, more techie talk from a year ago here: Jupyter at Netflix – Kyle Kelley (Netflix).

So this makes me wonder: would Jupyter notebooks provide a useful candidate for processing and submitting HESA returns on the one hand, but also for analysing, interpreting and storytelling around the data internally?

In passing, I note that I attended part of a briefing from IET(?) on same course data where Excel, Tableau, whatever charts were the basis of the presentation and handouts. Within the course, we use notebooks to analyse our course survey (SEaM analyser) and I’ve used notebooks to prototype automating various bits of grabbing assessment script downloading (Rolling Your Own IT – Automating Multiple File Downloads) and supporting third marking (originally in Excel — I wanted a tool that could be shared with and used by Excel users for an advocacy attempt that went nowhere — but now in Jupyter notebooks (unblogged, as yet… oops…)). Another tool downloads all the marks/tutor comments data for courses I have marking access to into a SQLite database, but that probably breaks all manner of IT policies (I do hash all the personal IDs with a random salt, and I do delete the data as soon as we’re done with it!). I’ve also dabbled in the past with a notebook based reporting script for FutureLearn logs (FutureLearn Data Doodles Notebook and a Reflection on unLearning Analytics and explored a way we could author a FutureLearn course from Jupyter notebooks by splitting monolithic notebooks into page length fragments, grabbing the markdown and POSTing it to FutureLearn, which as I understand it accepts markdown (Authoring Multiple Docs from a Single IPython Notebook)? Only, in the OU, authors don’t have permission to use the FutureLearn authoring tool, so I couldn’t proof of concept that final submission bit…

[REDACTED]

PS by the by, I note I first posted an idle wondering about using notebooks to support devops back in 2015, back before they were known as Jupyter notebooks…: Literate DevOps? Could We Use IPython Notebooks To Build Custom Virtual Machines?.

PPS an example of notebooks being used in Gitlab devops:

Reproducible Modifiable Inset Maps

Over the weekend, I starting having a look at generating static maps (rather than interactive web maps) using matplotlib/Basemap, with one eye on the reproducible educational materials production idea, and the ways in which Basemap might be useful to authors creating new materials that are capable of being reused and/or maintained, with modification.

One of the things that struck me was how authors may want to produce different sorts of map. Basemap has importers for several different flavours of map tile that can essentially be treated as map styles, so once you have defined your map, you should be able to tile it in different ways without having to redefine the map.

It’s also worth noting how easy it is to change the projection…

Another thing that struck me was how maps-on-maps (inset maps?) can often help provide a situate the wider geographical context of a region that is being presented in some detail.

I couldn’t offhand find an off the shelf function to create inset maps, so I hacked my own together:

There are quite a few hardwired defaults baked in and the generator could be parameterised in all sorts of ways (eg changing plot size, colour themes, etc.)

Also on my to do list for the basic maps is a simple way of adding things like great circle connectors between two points, adding clear named location points, etc etc.

You can find my work in progress/gettingstarted demos on this theme on Azure Notebooks here.

If you’re interested, here’s what I came up with for the inset maps… It’s longer than it needs to be because it incorporates various bits and pieces for rendering default / demo views if no actual regions are specified.

from operator import itemgetter

from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1.inset_locator import zoomed_inset_axes
from mpl_toolkits.axes_grid1.inset_locator import mark_inset
import numpy as np

def createInsetMap(base=None, inset=None,
                   zoom=None, loc=1, baseresolution=None,
                   insetresolution=None, connector=True):

    ''' Create an inset map for a particular region. '''

    fig = plt.figure()
    ax = fig.add_subplot(111)

    #Create a basemap

    #Add some default logic to provide some demos
    #If no args, use a World map
    if base is None or base.lower=='world':
        baseargs={'projection':'cyl','lat_0':0, 'lon_0':0}
    #If set UK base, then we can add a more detailed inset...
    elif isinstance(base,str) and base.lower()=='uk':
        baseargs={'llcrnrlon':ukllcrnrlon,'llcrnrlat':ukllcrnrlat,
                  'urcrnrlon':ukurcrnrlon,'urcrnrlat':ukurcrnrlat,
                  'resolution':'l'}
    else:
        #should really check base is a dict
        baseargs=base

    if baseresolution is not None:
        baseargs.update({'resolution':baseresolution})

    map1=Basemap(**baseargs)

    map1.drawmapboundary(fill_color='#f8f8ff')
    map1.drawcoastlines()

    plt.xticks(visible=False)
    plt.yticks(visible=False)

    #Now define the inset map

    #This default is to make for some nice default demos
    #With no explicit settings, inset UK on World map
    if base is None and inset is None:
        insetargs={'llcrnrlon':ukllcrnrlon,'llcrnrlat':ukllcrnrlat,
                  'urcrnrlon':ukurcrnrlon,'urcrnrlat':ukurcrnrlat,
                  'resolution':'l'}
        zoom=10
        loc=3
    #If the base is UK, and no inset is selected, demo an IW inset
    elif (isinstance(base,str) and base.lower()=='uk') and inset is None:
        insetargs={'llcrnrlon':iwllcrnrlon,'llcrnrlat':iwllcrnrlat,
                   'urcrnrlon':iwurcrnrlon,'urcrnrlat':iwurcrnrlat,
                   'resolution':'h'}
        zoom=10
        loc=2
    else:
        #Should really check inset is a dict...
        insetargs=inset

    axins = zoomed_inset_axes(ax, zoom, loc=loc)

    #The following seem to be set automatically?
    #axins.set_xlim(llcrnrlon, llcrnrlat)
    #axins.set_ylim(llcrnrlat, urcrnrlat)

    if insetresolution is not None:
        insetargs.update({'resolution':insetresolution})

    map2 = Basemap(ax=axins,**insetargs)

    #map2.drawmapboundary(fill_color='#7777ff')
    map2.fillcontinents(color='#ddaabb', lake_color='#7777ff', zorder=0)
    map2.drawcoastlines()
    map2.drawcountries()

    #Add connector lines from marked area on large map to zoomed area
    if connector:
        mark_inset(ax, axins, loc1=2, loc2=4, fc="none", ec="0.5")

    plt.show()

So, what other fragments of workflow, or use case, might be handy when creating (reusable / modifiable) maps in an educational context?

PS inset diagrams/maps in ggplot2 / ggmaps. See also Inset maps with ggplot2.

A Visit to Lublin…

I’ve never been to Poland before, but a wedding invite to a traditional Polish wedding – a short ceremony followed by a 12 hour party with food every couple of hours or so, with vodka on tap, and a second day of celebrations the following day – took us to Lublin (tourist guide) last week for a few days.

I’m not sure how building regs work but I was intrigued to see that some of the buildings just outside the Old Town appeared to be strapped together for mutual support.

(There were people continually milling through the gate, so I couldn’t get a photo to the ground that wasn’t busy with folk…) Another aerial sculpture had a person on a tightrope…

In the Lithuanian Square, a wonder set of water lights that my crappy old phone camera couldn’t really cope with…

The fountains and lights were independently programmable, and gave a wonderful show. Overflowing them, some art I couldn’t decipher the caption for, but that I took to be a commentary on banking… (The Lubliners seem keen on art that makes you look up…)

Our evening trip to the following day’s wedding venue caused some concern in the telling… our taxi had pulled into a shopping center carpark midway through the journey, and we’d be ushered out of the car and into another waiting taxi with no word of explanation to us. We assumed it as one of those things rather than paying heed to the joke(?) in the joining instructions about crazy cabs and kidnapping…

As I was (naively) photoing a warbus with a committed driver (in travel, the mundane yet misaligned become ever more interesting)…

… the taxi driver uttered something unintelligible to us and then pointed at a (live) Google translation of explanation – a family emergency had meant the other taxi driver had needed to be elsewhere…

Back in Lublin, we did the sites… More looking up at the angles:

I also found the facade of this church unsettling in the way that it gave you the eye…

20180917_122531_29811201987_o copy

We also got to look up at (and inside) Lublin Cathedral, and then down at it from the Trinitarian Tower. Rather than lug bags up and down we popped them in a (very cheap) locker in the tourist information office in the Old Town.

Inside the tower were lots of interesting angles to be seen, as well as a bell close-up…

One day I may get a proper camera… bit even so, some of the weird visual effects my crappy old phone camera comes out with when trying to capture panoramic scenes (I wanted to capture the red and white tower – that didn’t really work out…) endear me to it at times…

Lublin Old Town itself is quite compact — about the same size as Newport, on the island, perhaps — but offers plenty to do for a weekend.

20180914_120517_30808593298_o

And some of the food you can find there is superb (as well as very affordable). The Jewish menu restaurant on the center left right(?) in the view below is well worth a visit… (the chicken broth was crystal clear and the cinnamon spiced apple drink I should really try making from some of our own apples). Fish was also much in evidence, and well worth asking for off the seasonal menus. Apples were in season too, and I couldn’t help but notice a guest cider in one of the bars from… Westons…

20180914_120333_44679202621_o copy

The English translation menus also offered much that I suspect @nogbad would have delighted in photographing… “Food additives” is one way of calling it maybe…

From the Polish menus, I missed several opportunities to snap telling of the burgery options on offer…

Another of the sites of Lublin is Lublin Castle, home, as with many sites nearby around (including the nearby concentration camp, which we couldn’t face), to institutional, living memory, mass murder.

I wonder if planning regs keep tall building below the horizon for this viewing point, or wether it’s just turned out like that?

Inside the Castle, some haunting corridors and artwork, but also several galleries of historical portraits and artefacts, and exhibition space for more contemporary artworks.

The current exhibition (Tadeusz Myslowski. Studio / Workshop, Lublin Castle, May – September 2018) is of a new to me artist, Tadeusz Myslowski, and it blew me away…

Geometrics based on New York cityblocks, and other geometric pieces, reminded me of the osmnx package for working with OpenStreetMap data .

A collection of radial geometrics were just beautiful along with black and white city block progressions….

And the sculptures were impressive too…

There were also a couple of nice surprises from other artists, including a Mondrian (the white is so white, and the colour so colour), and a Bridget Riley, although being behind glass it suffered really badly from reflections, presumably from the artist’s own collection, which I’d love to see.

I’m really regretting not taking the opportunity to pick up copies of the two or three books of his work in the Castle shop.. Bah… I should have exited through the gift shop…

By the by, I happened to notice a set up for an event in the park from the Castle Tower… Not sure who was playing…

In a weird way, the highlight of the trip – aside from the wedding – was a visit to Dom Slow, aka The House of Words, which lies just over the road from the Cathedral and the Trinitarian Tower.

A museum of printing, we opted for, and we treated to, a compelling two hour English tour. Starting with a history of a clandestine press started in Lublin in 1974(?), where I missed the opportunity of grabbing a photo of Susan(?IIRC), a duplicator that moved around the town that was used to create copies of the underground news print (“Susan is hungry” -> “we need more ink”, and other such subterfuges), a brief history of the various Solidarity organisations and their means of communication, and a flick through various banned texts bound in false covers, making it look to all intents and purposes as if the reader were brushing up on their Communist teachings rather than indulging in a Shakespearean sonnet), we then moved on to a fine collection of original printing machines, many of which are still used today to print limited edition poster runs and small books. (The center includes it’s own small paper factory, as well as a book bindery.)

One of the things I was surprised to learn was how early keyboards were added to printing presses to set the type, although I suspect the one on their original Linotype machine does not itself include the original keys…

As well as running original presses (although not, unfortunately, the Linotype), print has to be set. I’ve seen printers’ boxes in antique shops and at craft fayres filled with all sorts of tat, but it was nice to see some still be used in earnest.

The detail of some of the set type for special posters was incredible. As was the detail of some original copper etchings, that are still used for small print run specialist books.

To appreciate the detail, you’ll just have to go and see for yourself…

Dom Slow was a real find, and it would have been good to be able to spend some money in there on an (English) history of the underground press and maybe one or two posters. Exit through gift shop is not just a way of extracting tourist zlotys, it also provides a convenient collection that tourists can browse for thematic/subject related “specialist” books and guides… (I’ve since tracked down an interesting sounding book on the topic — Duplicator Underground : The Independent Publishing Industry in Communist Poland, 1976-89 — and am waiting for it now…)

Another interesting thing we learned from the tour is how (oral) storytelling still seems to be an important means of cultural communication. This has perhaps been kept relevant by circumstance, not least German and Soviet Russian occupation. The past is not so very far away.

I’m now looking for an opportunity to go back to Lublin for another weekend, not least to try to capture and write down some of the tales from Dom Slow, as well as learn a bit more of the history of the town.

If you’re interested, flights can be had from Luton to Lublin with Wizz Air. In Lublin, we stayed at the IBB Grand Hotel Lublinianka.

Thanks to Basia and James for the invite:-)

Build More of Your Own Learning Tools

In Build Your Own Learning Tools I described how I was inspired to try to build a simple colour palette explorer to complement an OpenLearn unit on art history that asked learners various questions about the colour themes used in particular paintings.

This is in part driven by the Jupyter notebook solutionism obsession that’s currently consuming me, where I’m trying to put together various demos that show how Jupyter notebooks might be used as an authoring tool to support the creation – and enhancement – of reproducible (which is also to say, easily modifiable) (open) educational resources.

Over the last couple of days, I’ve started looking at cltk, a classics fork of the nltk Python natural language toolkit. This provides access to a wide range of classical texts in a variety of languages, and text analysis tools to work with them. I’m struggling to find easy ways in to work with this package, so progress is a bit slower than I’d liked in just familiarising myself with it, and I’m yet to look at it in the context of some actual Latin or Greek open teaching material (eg Discovering Ancient Greek and Latin or Getting started on classical Latin).

As I’ve been trying to familiarise myself with the package, I’ve been reflecting on things that may be helpful to me as a learner if I was trying to get started with reading Latin or Greek texts.

One thing would be accessing texts: cltk does have a wide range of corpuses available, but I’ve struggled to find any index files or metadata files to help me know what’s in them and how to retrieve them.

Apols for screenshots rather than code, and incomplete code screenshots at that. A link to the notebook is provided at the end of the post if you want the src.

One you have found and loaded a text, it’s easy to search:

You can also find concordances:

There is a trained named entity tagger, although it seems to be a bit basic/ ropey. It’s something to work with, though:

On my to do list generally is learn How to Train your Own Model with NLTK and Stanford NER Tagger? (for English, French, German…).

Latin declensions are something I remember from my schooldays. Enumerating them may be handy when creating resources, and might also be useful to students. One thing I did find, though, was a lack of documentation about how to decode the, anyone?, anyone?, person/tense information? Anyone, anyone? Really… Anyone? (I imagine v1spia--- is verb1st person singular, present, maybe but then what. Also, isn’t is conjugation for verbs, rather than declension? )

From a declension/conjucation/whatever it is, we can do a lookup, but to do this you need to know the root(?), and for it to be useful, you need to know how to read the grammar(?) code string:

So how do I properly decode those strings? Docs anywhere? Maybe even a simple py function that turns a string like v2pfia--- into words?

Another issue I’m guessing students face when reading classical texts, and that educators must grapple with when teaching people how to read the texts in a prosodically meaningful way, is how to spilt the words into syllables and how to sound them out.

Splitting texts into syllables may help with this, and is also likely to be useful when working out the meter of a verse, for example?

Transliteration into a phonetic alphabet may also be useful, although this requires that the learner also knows how to read and sound out characters in that alphabet. (The one used in cltk (IPA phonetic transliteration alphabet) differs from the one used in the OpenLearn texts I skimmed (any idea what that one is called? Anyone, anyone? I’m not sure if cltk offers other alphabets? Or how you’d train a system to use one?)

As I mentioned, I’ve been struggling to find many useful docs/tutorials for working with cltk, and haven’t managed to find any meaningful corpus metadata (or a recipe for building it in a standard way). If you can point me to anything useful, please do so via the comments…

You can find my current work in progress notebook on Azure notebooks: Getting Started With Notebooks/4.2.0 Classics.ipynb.

Helluva Job – Still Not Running Libvirt vagrant plugin on a Mac…

Note to self after struggling for ages trying to install <span class="s1">vagrant-libvirt plugin on a Mac…

Revert to vagrant 2.0.0 then follow recipe given by @ccosby…

Then more ratholes…

Try to start a libvirtd daemon: /usr/local/sbin/libvirtd which I had hoped would write a socket connection file to somewhere that I could use as the basis of a connection (? /var/run/libvirt/libvirt-sock ) but that didn’t seem to work?:-(

Help (/usr/local/sbin/libvirtd --help) suggests:

Configuration file (unless overridden by -f):

$XDG_CONFIG_HOME/libvirt/libvirtd.conf

Sockets:

$XDG_RUNTIME_DIR/libvirt/libvirt-sock

TLS:

CA certificate:     $HOME/.pki/libvirt/cacert.pem

Server certificate: $HOME/.pki/libvirt/servercert.pem

Server private key: $HOME/.pki/libvirt/serverkey.pem

PID file:

$XDG_RUNTIME_DIR/libvirt/libvirtd.pid

but $XDG_RUNTIME_DIR/ doesn’t appear to be set and I can’t see anything in local dir… Setting it to /var/run/ doesn’t seem to help? So I’m guessing I need a more complete way of starting libvirtd such as passing a process definition/config file?

Take, Take, Take…

Alan recently posted (Alan recently posts a lot…:-) about a rabbit hole he fell down when casually eyeing something in his web stats (Search, Serendipity, Semantically Silent).

Here’s what I see from my hosted WordPress blog stats:

traffic from Google but none of the value about what folk were searching for, although Google knows full well. WordPress also have access to that data from the Google Analytics tracking code they force into my hosted WordPress blog, but I don’t get to see it unless I pay for an upgrade…
traffic from pages on educational sites that I can’t see because they require authentication; I don’t even know what the course was on… So how can I add further value back to support that traffic?
occasional links from third party sites back in the day when people blogged and included links…