Browser Based “Custom Search Engines”

Back in the days when search was my first love of web tech, and blogging was a daily habit, I’d have been all over this… But for the last year or two, work has sucked blogging all but dry, and by not writing post(card)s I don’t pay as much attention to my digital flâneury now as I used to and as I’d like.

There are still folk out there who do keep turning up interesting stuff on a (very) regular basis, though, and who do manage to stick to the regular posting which I track, as ever, via a filter I curate: RSS feed subscriptions.

So for example, via @cogdog (Alan Levine’s) CogDogBlog post A Tiny Tool for Google Image Searches set to Creative Commons, I learn (?…) about something I really should immediately just be able to recall off the top of my head, and yet also seems somehow familiar…:

 In Chrome, create a saved search engine under Preferences -> Search Engine -> Manage Search Engines or directly chrome://settings/searchEngines.

Alan Levine

So what’s there?

Chrome browser preferences: search engine settings

First up, we notice that this is yet another place where Google appears to have essentially tracked a large part of my web behaviour, creating Other search engine links for sites I have visited. I guess this is how it triggers behaviours such as “within site search” in the location bar:

Within-site search in Chrome location bar

…which I now recall is (was?) called the omnibar, from this post from just over 10 years ago in February 2011: Coming Soon, A Command Line to the Web in Google Chrome?.

That post also refers to Firefox’s smart keywords which were exactly what I was trying to recall, and which I’d played with back in 2008: Time for a TinyNS? (a post which also features guess who in the first line…).

Firefox: smart kewords, circa 2008

So with that idea brought to mind again, I’ll be mindful of opportunities to roll my own again…

Alan’s recent post also refers to the magic that is (are) bookmarklets. I still use a variety of these all the time, but I haven’t been minded to create any in months and months… in fact, probably not in years…

My top three bookmarklets:

  • pinboard bookmarklet: social bookmarking; many times a day;
  • nbviewer bookmarklet: reliably preview Jupyter notebooks from Github repos and gists, with js output cells rendered; several times a day;
  • OU Library ezproxy: open subscription academic content (online journals, ebooks etc.) via OU redirect (access to full text, where OU subscription available).

What this all makes me think is that the personal DIY productivity tools that gave some of us, at least, some sort of hope, back in the day, have largely not become common parlance. These tools are likely still alien to the majority, not least in terms of the very idea of them, let alone how to write incantations of your own to make the most of them.

Which reminds me. At one point I did start to explore bookmarklet generator tools (I don’t recall pitching an equivalent for smart keyword scripts), in a series of posts I did whilst on the Arcadia Project:

An Introduction to Bookmarklets
The ‘Get Current URL’ Bookmarklet Pattern
The ‘Get Selection’ Bookmarklet Pattern

Happy days… Now I’m just counting down the 5000 or so days to retirement from the point I wake up to the point I go to sleep.

Thanks, Alan, for the reminders; your daily practise is keeping this stuff real…

Fragment: More Typo Checking for Jupyter Notebooks — Repeated Words and Grammar Checking

One of the typographical error types that isn’t picked up in the recipe I used in Spellchecking Jupyter Notebooks with pyspelling is the repeated word error type (for example, the the).

A quick way to spot repeated words is to use egrep on the command line over a set of notebooks-as-markdown (via Jupytext) files: egrep -o  "\b(\w+)\s+\1\b" */.md/*.md

I do seem to get some false positives with this, generating an output file of the report and then doing a quick filter on that wouls tidy that up.

An alternative route might be to extend pyspelling and look at tokenised word pairs for duplicates. Packages such as spacy also support things like Rule-Based Phrase Text Extraction and Matching at a token-based, as well as regex, level. Spacy also has extensions for hunspell [spacy_hunspell]. A wide range of contextual spell checkers are also available (for example, neuspell seems to offer a meta-tool over several of them), although care would need to be taken when it comes to (not) detecting US vs UK English spellings as typos. For nltk based spell-checking, see eg sussex_nltk/spell.

Note that adding an autofix would be easy enough but may make for false positives if there is a legitimate repeated word pair in a text. Falsely autocorrecting that, then detecting the created error / tracking down the incorrect deletion so it can be repaired, would be non-trivial.

Increasingly, I think it might be useful to generate a form with suggested autocorrections and checkboxes pre-checked by default that could be used to script corrections might be useful. It could also generate a change history.

For checking grammar, the Java based LanguageTool seems to be one of the most popular tools out there, being as it is the engine behind the OpenOffice spellchecker. Python wrappers are available for it (for example, jxmorris12/language_tool_python).

Partly Solipsist Conversational Blogging Over Time

This is blog is one of the places where I have, over the years, chatted to not just to myself, but also to others.

Microblogging tools, as they used to be called — things like Twitter — are now often little more than careless public resharing sites for links and images and “memes” (me-me-me-me, look what I can do in taking this object other people have repurposed and I have repurposed too).

Blog posts, too, are linky things, or can be. Generally, most of my blog posts include links becuase they situate the post in a wider context of “other stuff”. The stuff that prompted me to write the post, the stuff that I’ve written previously that relates to or informs my current understanding, other stuff around the web that I have gone out and discovered to see what other folk have written about the thing I’m writing about, stuff that riffs further on something I don’t have space or time or inclination to cover in any more depth than a casual tease of a reference, or that I’d like to follow up later, stuff that future references posts I haven’t written yet, and so on.

So, this post was inspired by a conversation I heard between Two Old Folks in the Park Waxing On about something or other, and also picks up on something I didn’t mention explicitly in something I posted yesterday, but touched on in passing: social bookmarking.

Social bookmarking is personal link sharing: saving a link to something you think you might want to refer to later in a collection that you deliberately add to, rather than traipse through your browser search history.

At this point, I was going to link to something I thought I’d read on Simon Willison’s blog about using datasette to explore browser search history sqlite files, but I can’t seem to find the thing I perhaps falsely remember? There is a post on Simon’s blog about trawling through Apple photos sqlite db though.

That said, I will drop in this aside about how to find and interrogate your Safari browser history as a SQLite database, and how to find the myriad number of Google Chrome data stashes, which I should probably poke through in a later post. And in aut0-researching another bit of this post, I also note that I have actually posted on searching Google history sqlite files using datasette: Practical DigiSchol – Refinding Lost Recent Web History.

The tool I use for social bookmarking now is pinboard (I note in passing from @simonw’s dogsheep personal analytics page that there is a third-party dogsheep aligned pinboard-to-sqlite exporter, but as one of the old folk muttering, Alan Levine, mentioned, I also have fond memories of de.licio.us (are the .’s in the right place?!), the first social bookmarking tool I got into a daily habit of using.

One of the handy things about those early web apps was that they had easy to use public APIs that didn’t require any tokens or authentication: you just created a URL and pulled back JSON data. This meant it was relatively easy to roll your own applications, often as simple single page web apps. One tool I created off the back of de.licio.us was deliSearch, that let you retrieve a list of tagged links from delicsious for a particular tag, then do an OR’d search over the associated pages, or the domains they were on, using Yahoo search. (For several years now, if you try running more than a couple of hand crafted, advanced searches using logical constructs and/or advanced search limits using Google web search, you get a captcha challenge under the assumption that you’re a robot.)

deliSearch led to several other search experiments, generalised as searchfeedr. This would let you roll a list of links from numerous sources, such as social bookmark feeds, all the links in a set of course materials, or even just the links on a web page such as this one, and roll your own search engine over them. See foer example, this presentation from ILI 2007, back in those happy days when I used to get speaking gigs: Search Hubs and Custom Search Engines (ILI2007).

Something else I picked up from social bookmarking sites was the power of collections recast as graphs over people and/or tags. Here’s an early example of a crude view over folk using the edupunk tag on delicious circa 2008 (eduPunk Chatter):

edupunk bookmarks

And here was another experiment from several years later (Dominant Tags in My Delicious Network), looking at the dominant tags used across folk in my delicious network defined as the folk I followed on delicious at the time:

Another experiment used a delicious feature that let you retrieve the 100 most recent bookmarks saved with a particular tag and data on on who bookmarked it an what other tags they used, to give us a snapshot at a very particular point in time around a particular tag (Visualising Delicious Tag Communities Using Gephi). Here’s the activity around the ds106 tag one particular day in 2011:

And finally, pinboard also has a JSON API that I think replicated many of the calls that could be made to the delicious API. I thought I’d done some recipes updating some of my old delicious hacks to use the pinboard API bit offhand, I can’t seem to find them anywhere. Should’a bookmarked them, I guess!

PS in getting on for decades of using WordPress, I don’t think I have never mistakenly published a page rather than a post. With the new WordPress-com UI, perhaps showing me some new features I donlt want, I made that mistake for the first time I can think of today. WordPress-com hosted blog new authoring experience sucks, and if it’s easy to make publishing mistakes like that, I imagine other f**d-up ness can follow along too.

Running SQLite in a Zero2Kubernetes (Azure) JupyterHub Spawned Jupyter Notebook Server

I think this is an issue, or it may just be a quirk of a container I built for deployment via JupyterHub using Kubernetes on Azure to run user containers, but it seems you that SQLite does things with file locks that break can the sqlite3 package…

For example, the hacky cross-notebook search engine I built, the PyPi installable nbsearch, (which is not the same as the IBM semantic notebook search of the same name, WatViz/nbsearch) indexes notebooks into a SQLite database saved into a hidden directory in home.

The nbsearch UI is published using Jupyter server proxy. When the Jupyter noteobook server starts, the jupyter-server-proxy extension looks for packages with jupyter-server-proxy registered start hooks (code).

If the jupyter-server-proxy setup fails for for one registered service, it seems to fail for them all. During testing of a deployment, I noticed none of the jupyter-server-proxy services I expected to be visible from the notebook homepage New menu were there.

Checking logs (via @yuvipanda, kubectl logs -n <namespace> jupyter-<username>) it seemed that an initialisation script in nbsearch was failing the whole jupyter-server-proxy setup (sqlite3.OperationalError: database is locked; related issue).

Scanning the JupyterHub docs, I noted that:

> The SQLite database should not be used on NFS. SQLite uses reader/writer locks to control access to the database. This locking mechanism might not work correctly if the database file is kept on an NFS filesystem. This is because fcntl() file locking is broken on many NFS implementations. Therefore, you should avoid putting SQLite database files on NFS since it will not handle well multiple processes which might try to access the file at the same time.

This relates to setting up the JupyterHub service, but it did put me on the track of various other issues perhaps related to my issue posted variously around the web. For example, this issueAllow nobrl parameter like docker to use sqlite over network drive — suggests alternative file mountOptions which seemed to fix things…

Simple Link Checking from OU-XML Documents

Another of those very buried lede posts…

Over the years, I’ve spent a lot of time pondering the way the OU produces and publishes course materials. The OU is a publisher and a content factory, and many of the production modes model a factory system, not least in terms of the scale of delivery (OU course populations can run at over 1000 students per presentation, and first year undergrad equivalent modules can be presented (in the same form, largely unchanged) twice a year for five years or more.

One of the projects currently being undertaking internally is the intriguingly titled Redesigning Production project, although I still can’t quite make sense (for myself, in terms I understand!) of what the remit or the scope actually is.

Whatever. The project is doing a great job soliciting contributions through online workshops, forums, and the painfully horrible Yammer channel (it demands third party cookies are set and repeatedly prompts me to reauthenticate. With the rest of the university moving gung ho to Teams, that a future looking project is using a deprecated comms channel seems… whatever.) So I’ve been dipping my oar in, pub bore style, with what are probably overbearing and overlong (and maybe out of scope? I can’t fathom it out…) “I remember when”, “why don’t we…” and “so I hacked together this thing for myself” style contributions…

So here’s a little something inspired by a current, and ongoing, discussion about detecting broken links in live course materials: a simple link checker.

# Run a link check on a single link

import requests

def link_reporter(url, display=False, redirect_log=True):
    """Attempt to resolve a URL and report on how it was resolved."""
    if display:
        print(f"Checking {url}...")
    
    # Make request and follow redirects
    r = requests.head(url, allow_redirects=True)
    
    # Optionally create a report including each step of redirection/resolution
    steps = r.history + [r] if redirect_log else [r]
    
    report = {'url': url}
    step_reports = []
    for step in steps:
        step_report = (step.ok, step.url, step.status_code, step.reason)
        step_reports.append( step_report )
        if display:
            txt_report = f'\tok={step.ok} :: {step.url} :: {step.status_code} :: {step.reason}\n'
            print(txt_report)

    return step_reports

That bit of Python code, which took maybe 10 minutes to put together, will take a URL and try ro resolve it, keepng track of any redirects along the way as well as the status from the final page request (for example, whether the page was code 200 successfully loaded or whether a 404 page not found was encountered. Other status messages are also possible.

[UPDATE: I am informed that there is VLE link checker to check module links availabe from the adinstration block on a module’s VLE site. If there is, and I’m looking in the right place, it’s possibly not something I can see or use due to permissioning… I’d be interested to see what sort of report it produces though:-)]

The code is a hacky recipe intended to prove a concept quickly that stands a chance of working at least some of the time. It’s also the sort of thing that could probably be improved on, and evolved, over time. But it works, mostly, now, and could be used by someone who could create their own simple program to take in a set of URLs and iterate through them generating a link report for each of them.

Here’s an example of the style of report it can create using a link that was included in the materials with as a Library proxied link (http://ieeexplore.ieee.org.libezproxy.open.ac.uk/xpl/articleDetails.jsp?arnumber=4376143) that I cleaned to give a none proxied link (note to self: I should perhaps create a flag that identifies links of that type as Library proxied links; and perhaps also flag another link type at least, which are library managed (proxied) links keyed by a link ID value and routed vie https://www.open.ac.uk/libraryservices/resource/website:):

[(True,
  'http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4376143',
  301,
  'Moved Permanently'),
 (True,
  'https://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=4376143',
  302,
  'Moved Temporarily'),
 (True,
  'https://ieeexplore.ieee.org/document/4376143/?arnumber=4376143',
  200,
  'OK')]

So.. link checker.

At the moment, module teams manually check links in web materials published on the VLE over many many pages. To check a hundred linkes spread over a hierachical tree of pages to depth two or three takes a lot of time and navigation.

More often than not, dead links are reported by students in module forums. Some links are perhaps never clicked on and have been broken for years, but we wouldn’t know it, or have been clicked on but been unreported. (This raises another question: why do never-clicked links remain in the materials anyway? Reporting about link activity is yet another of those stats we could and should act on internally (course analytics, a quality issue) but we don’t (the institution prefers to try to shape students by tracking them using learning analytics, rather than improving things we have control over using course analytics. We analyse our students, not our materials, even if our students’ performance is shaped by our materials. Go figure.)

This is obviously not a “production” tool, but if you have a set of links from a set of course materials, perhaps collected together in a spreadsheet, and you had a Python code environment, and you were prepared to figure our how to paste a set of URLs into a Python script, and you could figure out a loop to iterate through them and call the link checker, you could automate the link checking process in some sort of fashion.

So: tools to make life easier can be quickly created for, and made available to (and can also be created or extended by) folk with access to certain environments that let them run automation scripts and who have the skills to use the tools provided (or the skills and time to make them for themselves).

Is It Worth the Time?
Is it worth the time? https://xkcd.com/1205/

By the by, anyone who has been tempted, or actually attempted, to create their own (end user development) automation tools will know that even though you know it should only take a half hour hack to create a thing, that half an hour is elastic:

Having created that simple link checker fragment to drop into the “broken link” Redesigning Production forum thread, in part to demonstrate that a link checker that works at the protocol level can identify a range of redirects and errors (for example, ‘content not available in your region’ / HTTP 451 Unavailable For Legal Reasons is one that GDPR has resulted in when trying to access various US based news sites), I figured I really should get round to creating a link checker that will trawl through links automatically extracted from one or more OU-XML documents in a local directory. (I did have code to grab OU-XML documents from the VLE, but the OU auth process has changed since I last use that code which means I need to move the scraper from mechanicalsoup to selenium…) You can find the current OU-XML link checker command line tool here: https://github.com/innovationOUtside/ouxml-link-checker

So, we now have a link checker that anyone can use, right? Well, not really… It doesn’t work like that. You can use the link checker if you have Python 3 installed, and you know how to go onto the command line to install the package, and you know what copying the pip install instruction I posted in the Yammer group won’t work because the Github url is shortened by an ellipsis, and if you call “pip” and Python 2 has the focus on pip you’ll get an error, and when you try to run the command to run the link checker on the command line you know how to navigate to, or specify, the path (including paths with spaces…) and you know how to open a CSV file and/or open and make sense of a JSON file with the full report, and you can get copies of the OU-XML files for the materials you are interested in and get them onto a path you can call the link checker command line command with in the first place, then you have access to a link checker.

So this is why it can take months rather than minutes to make tools generally available. Plus there is the issue of scale – what happens if folk on hundreds of OU modules start running link checkers over the full set of links referenced in a each of their courses on a regular basis? If (when) the code breaks parsing a document, or trying to resolve a particular URL, what does the user do then. (The hacker who created it, or anyone else with the requisite skills) could possibly fix the script quite quickly, even if just by adding in an exception handler or excluding particular source documents or URLs and remembering they hadn’t checked those automatically.)

But it does also raise the issue that quick fixes that will save chunks of time that some, maybe even many, eventually, could make use of right now aren’t generally available. So every time a module presents, some poor soul on each module has to manually check, one at a time, potentially hundreds of links in web materials published on the VLE spread over many many pages published in a hierachical tree to depth two or three.

PS As I looked at the link checker today, deciding whether I should post about it, I figured it might also be useful to add in a couple of extra features, specifically a screenshot grabber to grab a snapshot image of the final page retrieved from each link, and a tool to submit the URL to a web archiving service such as the Internet Archive or the UK Web Archive, or create a proxy link to an automatically archived versio of it using something like the Mementoweb Robust Links service. So that’s the tinkering for my next two coffee breaks sorted… And again, I’ll make them generally available in a way that probably isn’t…

And maybe I should also look at more generally adding in a typo and repeated word checker, eg as per More Typo Checking for Jupyter Notebooks — Repeated Words and Grammar Checking?

PPS the quality question of never-clicked links also raises a question that for me would be in scope as a Redesigning Production question and relates to the issue of continual improvement of course material, contrasted with maintenance (fixing broken links or typos that are identified, for example) and update (more significant changes to course materials that may happen after several years to give the course a few more years of life).

Our TM351 Data Managemen and Analysis module has been a rare beast in that we have essentially been engaged in a rolling rewrite of it ever since we first presented it. Each year (it presents once a year), we update the software and reviewing the practical activities distributed via Jupyter notebooks which take up about 40% of the module study time. (Revising the VLE materials is much harder because there is a long, slow production process associated with making those updates. Updating notebooks is handled purely within the module team and without reference to external processes that require scheduling and formal scheduled handovers.)

To my mind, the production process for some modules at least should be capable of supporting continual improvement, and move away from “fixed for three years then significant update” model.

Sketching a Non-linear Software Installation Guide (Docker) Using TiddlyWiki

Writing software installation guides is a faff. On OU modules, where students provide their own machines, we face a range of issues:

  • student provided machines could be any flavour Windows, Mac or Linux machine from the last five (or more) years;
  • technical skills and confidence in installing software cannot be assumed (make of that what you will! ;-)
  • disk space may be limited (“so am I expected to delete all my photos or buy a new computer?”);
  • the hardware configuration may be “overspecced” and cause issues (“I’m trying to run it on my gamer PC with a brand new top of the range GPU card and I’m getting an error…”), although more typically, hardware may be rather dated…

The following is, in part, a caricature of “students” (and probably generalises to “users” in a more general case!) and is not intended to be derogatory in any way…

In many cases, software installation guides, when the installation works and you have an experienced and confident user can be condensed to a single line, such as “download and install X from the official X website” or “run the following command on the command line”. Historically, we might have published an installer, for the dominant Windows platform at least, so that students just had to download and double click the installer, but even that route could cause problems.

This is an example of a just do this” sort of instruction, and as anyone who has had to provide any sort of tech support ever, just unpacks a long way. (Just knock me up quick Crème brûlée… The eggs are over there…” Erm…? “Just make a custard…” Erm….? “Just separate the egg yolks…” Erm… ? Later… Right, just set up a bain-marie… Erm…?! “..until it’s just set….” Erm…? Much later… “(Sighs…) Finally, now just caramelise the…” Erm…? Etc.)

More generally, we tend to write quite lengthy and explicit guides for each platform. This can make for long and unwieldy instructions, particularly if you try to embed instruction inline for “predictable” error messages. (With 1k+ students on a module per presentation, even a 1% issue rate is 10 students with problems that need sorting in a module’s online Technical Help forum.)

Another problem is that the longer, and apparently simpler, the instructions, the more likely that students will start to not follow the instructions, or miss a step, or misread on of the instructions, creating an error that may not manifest itself until something doesn’t work several steps down the line.

Which is why we often add further weight to instructions showing screen captures of before, “do this” and after views of each step. For each platform. Which takes time, and raises questions of how you get screenshots for not your platform or not your version of a particular Operating system.

In many cases we also create screencasts, but again these add overhead to production, raise the question of which platform you produce them for, and will cause problems if the screencast vision varies from the actuality of what a particular student sees.

(Do not underestimate: a) how literal folk can be following instructions and how easily they freeze if something they see in guidance is not exactly the same as what they seen their on screen, whilst at the same time b) not exactly following instructions (either by deliberately or in error) and also c) swearing blind they did follow each instruction step when it comes to tech support (even if you can see they didn’t from a screenshot they provided; and in which case they may reply they did the step the first time they tried but not the second because they thought they didn’t need to do it again, or they did do it on a second attempt when they didn’t need to and that’s why it’s throwing an “already exists” error etc.).)

So, a one liner quickstart guide can become pages and pages and pages and pages of linked documents in an online installation guide and that can then go badly for folk who would have been happy with the one liner. Plus the pages and pages and pages of instruction then need testing (and maintaining over the course life, typically 5 years; plus the guide may well be written diring course production a year or more before the first use date by students). And in pages and pages and pages and pages of instruction, errors, or omission or ordering errors or something can slip through. Which causes set up issues in turn.

If we then try to simplify the materials and remove troubleshooting steps out of the install guide, for example, and into a troubleshooting section, that makes life harder for students who encounter “likely” problems that we have anticipated. And so on.

And as to why we don’t just always refer to “official” installation guides: the reason is because they are often based on the “quick start by expert users” principle and often assume a recent and updated platform on which to install the software. And we know from experience that such instructions are in many cases not fit for our purposes.

So… is there a better way? A self-guided (self-guiding?) installation guide, maybe, perhaps built on the idea of “create your own adventure” linked texts? To a certain extent, any HTML base software guide can do this; but often, there is a dominant navigation pane the steers the order in which people navigate a text, when a more non-linear navigation path (by comparison with a the strcuture of an explicit tree based hierarchical menu, for example) may be required.

For years, I’ve liked the idea of TiddlyWiki [repo], a non-linear browser based web notebook that lets you easily transclude content in a dynamically growing linear narrative (for example, a brief mention here: Using WriteToReply to Publish Committee Papers. Is an Active Role for WTR in Meetings Also Possible?). But at last, I’ve finally got round to exploring how it might be useful (or not?!) as the basis of a self-directed software installation guide for our Docker based, locally run, virtual computing enviornments.

Note that the guide is not production ready or currently used by students. It’s a quick proof of concept for how it might work that I knocked up last week using bits of the current softwatre guide and some forum tech support responses.

To try it out, you can currently find it here: https://ouseful-testing.github.io/docker-tiddly-test/

So what is TiddlyWiki? A couple of things. Firstly, it’s novel way of navigating a set of materials within a single tab of a web browser. Each section is defined in its own subpage or section, and is referred to as a tiddler. Clicking a link doesn’t “click you forward” to a new page in the same tab/window or a new tab/window; by default, it open displays the content in a block immediately below the current block.

You can thus use the navigation mechanism to construct a linear narrative (referred to as the story river) dynamically, with the order of chunks in the document determined by the link you clicked in the previous chunk.

If you trace the temporal history of how chunks were inserted, you can also come up with other structures. Because a chunk is inserted immediately below the block that contains the link you clicked, if repeatedly click different links from the same block you get a different ordering of blocks in terms of accession into the document: START, START-BLOCK1, START-BLOCK2-BLOCK1, and so on.

If you don’t like that insertion point, a TiddlyWiki control panel setting lets you choose alternative insertion points:

Tiddler opening behaviour: as well as inserting immediately below, you can choose immediately above or at the top or the bottom of the story river.

The current page view can also be modified by closing a tiddler, which removes it from the current page view.

If you click on a link to a tiddler that is already displayed, you are taken to that tiddler (by scrolling the page and putting the tiddler into focus) rather than opening up a duplicate of it.

The Tiddlywiki has several other powerful navigation features. The first of these is the tag based navigation. Tiddlers can be tagged, and clicking on a tag pops up a menu of similarly tagged tiddlers (I haven’t yet figured out if/how to affect their presentation order in the list).

Tag based navigation in TiddlyWiki

A Tag Manager tool, rasied from the Tools tab gives a summary of what tags have been used, and to what extent. (I need to play with the colours!)

The TiddlyWiki tag manager.

Another form of navigation is based on dynamically created lists of currently open tiddlers, as well as recently opened (and potentially now closed) tiddlers:

Currently opened tiddlers; note also the tab for recently opened tiddlers.

By default, browser history is not affected by your navigation through the TiddlyWiki, although another control panel setting does let you add steps to the browser history:

Update browser history setting.

A powerful search tool across tiddlers is also available.

TiddlyWiki search shows tiddlers with title matches and content matches.

To aid accessibility, a full range of keyboard shortcuts for both viewing, and editing, TiddyWiki, are available.

Sample of TiddlyWiki keyboard shorcuts viewed from the Control Panel Keyboard Shortcuts tab.

One feature I haven’t yet made use of is the abilit to transclude one tiddler within another. This allows you to create reusable blocks of content that can be inserted in multiple other tiddlers.

To control the initial view of the TIddyWiki, the first tab of the Control Panel allows you to define the default tiddlers to be opened when the TiddlyWiki is first viewed, and the order they should appear in.

TiddlyWiki control panel, displaying the Infor tab and current default open tiddlers and their start order.

The view of the wiki at https://ouseful-testing.github.io/docker-tiddly-test/ is a standalone HTML document, generated as an export from a hosted TiddlyWiki, that is essentially being used in an offline mode.

Menu showing option to export offline TiddlyWiki.

The “offline” TiddlyWiki is still editable, but the changes are not preserved when you leave the page (although extensions are available to save a TiddlyWiki in browser storage, I think, so you may be able to update your own copy of an “offline” TiddlyWiki?).

To run the “online” TiddlyWiki, in a fully interactive read/write/save mode, I am using based on instructions I found here: How to use TiddlyWiki as a static website generator in 3 steps:

  • install node.js if you don’t already have it installed;
  • Run: npm install -g tiddlywiki
  • Initialise a new TiddlyWiki: tiddlywiki installationGuide --init server; this will create a new directory, installationGuide, hosting your wiki content;
  • Publish the wiki: tiddlywiki installationGuide --listen;
  • By default, the wiki should now be available at: http://127.0.0.1:8080/ . When you create and edit tiddlers, change settings, etc., the changes will be saved.

New tiddlers can be created by editing a pre-existing tiddler, creating a link to the new (not yet created) tiddler, and then clicking the link. This will created a not yet created tiddler, or open the current version of it if it does already exist.

You can edit a currently open tiddler by clicking on it’s edit button.

The tiddler editor is a simple text editor that uses it’s own flavour of text markup although a toolbar helps handle the syntax for you.

TiddlyWiki editor.

Image assets are loaded from an image gallery:

Insert image into tiddler.

The gallery is populated via an import function in the Tools tab.

Tools panel, with import button highlighted.

Looking at the structure of the TiddlyWIki directory, we see that each tiddler is saved to its own (small) .tid file:

View over some TiddlyWiki tdiddler (.tid) files showing “system” tiddlers (filenames prefixed with $_) and user created tiddlers.

The .tid files are simple text files with some metadata at the top and then the tiddler content.

Text file structure of a tiddler.

It strikes me that it should be easy enough to generate this files.

Looking at a report such as one of my rally results reports, there are lots of repeaated elements with a simple structure:

Rally results: lots of elements that could be tiddlers?

That report is generated using knitr from various Rmd templated documents. I wonder if I could knit2tid and then trivially create a TiddlyWiki of rally result tiddlers?

Fragment: Open Leaning? Pondering Content Production Processes

A long time ago I read The Toyota Way; I remember being struck at the time how appealing many of the methods were to me, even if I couldn’t bring them to mind now. My reading was, and is, also coloured by my strong belief in the OU’s factory model of production (or the potential for the same), even though much of the time module teams operate as cottage industries.

Even in the first few pages, a lot still resonates with me:

We place the highest value on actual implementation and taking action. There are many things one doesn’t understand and therefore, we ask them why don’t you just go head and take action; try to do something? You realize how little you know and you face your own failures and you simply can correct those failures and redo it again and at the second trial you realize another mistake or another thing you didn’t like so you can redo it once again. So by constant improvement, or, should I say, the improvement based upon action, one can rise up to the higher level of practice and knowledge.

Fujio Cho, President, Toyota Motor Corporation, 2002, quoted in The Toyota Way, JK Liker, 2004.

As I start to rereading the book, more than fifteen years on, I realise quite a few of the principles were ones I already implicitly aspired to at the time, and which have also stuck with me in some form or other:

  • hansei, reflection, to try to identify shortcomings in a project or process. (Back in the day, when I still chaired modules, I remember scheduling a “no blame” meeting to try to identify things that had gone wrong or not worked so well in the production of a new module; folk struggled with even the idea of it, let alone working it. I suspect that meeting had been inspired by my earlier reading of the book.) This blog (and its previous incarnation) also represent over fifteen years of personal reflection;
  • jidoka, “automation with a human touch” / “machines with human intelligence”, which includes “build[ing] into your equipment the capability of detecting problems and stopping itself” so that humans can then work on fixing the issue, and andon, visual alerting and signalling systems, with visual controls at the place where work is done (for example, visualising notebook structure).
  • nemawashi, discussing problems and potential solutions with all those affected; I am forever trying to interfere with other people’s processes, but that’s because they affect me;
  • genchi genbutsu, which I interpret as trying to understand through doing, getting your hands dirty and making mistakes, as well as “problem solving at the actual place to see what is really goong on”, which I interpet as a general situational awareness through personal experience of each step of the process (which is why it makes sense to try doing someone else’s job every so often, perhaps?)
  • kaizen, continuous improvement, a process we try to embody, with reflection (hansei) in the continual rewrite process we have going in TM351, which continually reflects on the process (workflow), as well as the practice (eg pedagogy) and the product (the content we’re producing (teaching and learning materials));
  • heijunka, leveled out production in terms of volume and variety, which I get the sense we are not so good at, but which I don’t really understand;
  • standardised processes and interfaces, which I interpret in part as some of our really useful building blocks, such as the OU-XML gold master document format that is in many respects a key part of our content production system even if our processes are not as efficiently organised around it as they might be, and what I regarded as one of the OU’s crown jewels for many years: course codes.
  • continuous process flow “to bring problems to the surface”: we suck at this, in part because of various waterfall processes we have in place, as well as the distance from production of a particular piece of content to first presentation to the end user customer (the student) can be two or more years. You can have two iterations of a complete Formula One car in that period, and 40+ iterations of pieces on the car between race weekends in the same period. In the OU, we have a lot of stuck inventory (for example, materials that have been produced and are still 18 months form student first use);
  • one piece flow, which I now realise has profoundly affected my thinking when it comes to “generative production” and the use of code to generate assets at the point of use in our content materials; for example, a line of code to generate a chart that references and is referenced by some surrounding text (see also Educational Content Creation in Jupyter Notebooks — Creating the Tools of Production As You Go).

I also think we have some processes backwards; I get the feeling that the production folk see editing as a pull process on content from authors; with my module team cottage industry head on (and I know this is a contradiction and perhaps contravenes the whole Toyota Way model), I take a moduel team centric view, and see the module team as the responsible party for getting a module in front of students, and as such they (we) have a pull requirement on editorial services.

I’m really looking forward to sitting down again with The Toyota Way, and have also just put an order in for a book that takes an even closer look at the Toyota philosophy: Taiichi Ohno’s Toyota Production System: Beyond Large-Scale Production.

PS via an old post on Open Course Production I rediscover (original h/t Owen Stephens) some old internal reports on various aspects of Course Production: Some Basic Problems, Activities and Activity Networks, Planning and Scheduling and The Problem of Assessment. Time to reread those too, I think. Perhaps along with Daniel Weinbren’s The Open University: A History.

Supporting Remote Students: Programming Code Problems

One of the issues of working in distance education is providing individual support to individual students. In a traditional computer teaching lab environment, students encoutering an issue might be able to ask for help from teaching assistants in the lab; in a physical university, students might be able to get help in a face to face supervisory setting. In distance education, we have spatial separation and potentially temporal separation, which means we need to be able to offer remote and asynchronous support in the general case, although remote, synchronous support may be possible at certain times.

Related to this, I just spotted preview of a Docker workflow thing, Docker dev environments, for sharing work in progress:

The key idea seems to be that you can be working inside a container on your machine, grab a snapshot of the whole thing, shove it on to dockerhub (we really need an internal image hub….) and let someone else download it and check it out. (I guess you could always do this in part but you maybe lost anything in mounted volumes ,which this presumably takes care of.)

This would let a student push a problem and let their tutor work inside that environment and then return the fixes. (Hmm… ideally, if I shared an image with you, ideally you’d fix that, update the same image, and I’d pull it back???? I maybe need to watch the video again and find some docs!)

One thing that might be an issue is permissions – limiting who can access the pushed image. But that would presumably be easier if we ran our own image hub/registry server? I note that Gitlab lets you host images as part of a project ( Gitlab container registry ) which maybe adds another reason as to why we should at least be exploring potential for an internal Gitlab server to support teaching and learning activities?

(By the by, I note that Docker and Microsoft are doing a lot of shared development, eg around using Docker in the VS Code context, hooks into Azure / ACI (Azure Container Instances) etc.)

In passing, here are various other strategies we might explore for providing “live”, or at least asynchronous, shared coding support.

One way is to gain access to a student’s machine to perform remote operations on it, but that’s risky for various reasons (we might break something, we might see something we shouldn’t etc etc); it also requires student and tutor to be available at the same time.

Another way is via a collaborative environment where live shared access is supported. For example, using VS Code Live Share or the new collaborative notebook model in JupyterLab (see also this discussion regarding the development of the user experience around that feature). Or the myriad various commercial notebook hosting platforms that offer their own collaborative workspaces. Again, note that these offer a synchronous experience.

A third approach is to support better sharing of notebooks so that a student can share a notebook with a tutor or support forum and get feedback / comment / support on it. Within a containerised environment, where we can be reasonably sure of the same environment being used by each party, the effective sharing of notebooks allows a student to share a notebook with a tutor, who might annotate it and return it. This supports an asynchronous exchange. There are various extensions around to support sharing in a simple JupyterHub environment (eg https://github.com/lydian/jupyterlab_hubshare or potentially https://github.com/yuvipanda/jupyterhub-shared-volumes ), or sharing could be achieved via a notebook previewing site, perhaps with access controls (for example, Open Design Studio has a crude notebook sharing facility is available to, but not really used, by our TM351 Data Management and Analysis students, and there are codebases out there for notebook sharing that could potentially by reused internally behind auth (eg https://github.com/notebook-sharing-space/ipynb.pub )).

A fourth approach is now the aforementioned Docker container dev sharing workflow.

On Not Hating Jupyter…

Having spent most of yet another weekend messing around with various Jupyter related projects, not least OpenJALE (still a WIP), an extensions guide for my Open Jupyter Authoring and Learning Environment, frustration at one of the things I was linking to breaking in a MyBinder launch caused me to tweet in frustration.

This morning, seeing a collection of liked tweets noting how much I apparently hate the whole Jupyter project, I checked back to see what I said.

Stupidly, I then deleted the tweet.

Crap.

What the tweet said, and this isn’t true, was how much I hated Jupyter every time I encountered it, showing a screenshot of a failed MyBinder launch breaking on a JupyterLab dependency.

The break was in a launch of one of my own repos, I might add, where I had been trying to install a JupyterLab extension to provide a launcher shortcut to a jupyter-server-proxy wrapped application.

For those of you who don’t know, jupyter-server-proxy is a really, really useful package that lets you start up and access web applications running via a Jupyter notebook server. (See some examples here, from which the following list is taken.)

The jupyter-server-proxy idea is useful in several respects:

  • a container running a Jupyter server and jupyter-server-proxy only needs to expose only a single http port (the one that the notebook / JupyterLab is accessed via). All other applications can be proxied along the same path using the same port;
  • many simple web applications applications do not have any authentication; proxying an application behind a Jupyter server means you can make use of the notebook server authenticator to provide a challenge before providing access to the application;
  • the jupyter-server-proxy will start an application when it is first requested, so applications do not need to be started when the environment is started; applciations are only started when requested. If a repeated request is made for an application that has already been started, the user will be taken directly to it.

The extension I was loading provided an icon in the JupyterLab launcher so the app could be accessed from that environment as well as from the classic notebook environment.

I don’t use JupyterLab very much, preferring the classic notebook UI for a lot of reasons that I properly need to document, but I was trying to play into that space for folk who do use it.

JupyterLab itself, the next generation Jupyter interface, is a brilliant technology demonstrator, helping push the development of Jupyter server protocols and demonstratring their use.

And I hate it as a UI. (Again, I need to properly document why.)

And I get really frustrated about how over the years it has, and perhaps continues, to break numerous unrelated demos.

Until eighteen months or so, when work work started to suck all my time, I’d been posting an occasional Tracking Jupyter newsletter (archive), spending a chunk of time trying to keep track of novelty and emerging use cases in the Jupyterverse. Jupyter projects are far wider than the classic notebook and JupyterLab UI, and when viewed as part of a system offer a powerful framework for makeing arbitrary computing environments at scale available to multiple users. In many respects, the UI elements are the least interesting part, even if, as in my org, “Jupyter” tends to equate with “notebook”.

As part of the tracking effort, I’d scour Github repos for things folk were working on, trying to launch each one (each one) using MyBinder. (Some newsletter editions referenced upwards of fifty examples. Fifty. Each one;-) Some worked, some didn’t. Some I filed issues against, some PRs, some I just cloned and tried to get working as a quick personal test. Items I shared in the newsletter I’d pretty much always tried out and spent a bit of time familiarising myself with. These were not unqualified link shares.

One of the blockers to getting things working in MyBinder was missing operating system or Python package dependencies. In many cases, if a Python package is in a repo on Github you can just paste the repo URL into MyBinder and the package will install correctly. In some cases, it doesn’t and the fix is adding one or two really simple text files (requirements.txt for Python packages, apt.txt for Linux packages) to the repo that install any essential requirements.

That’s an easy fix, and quick to do if you fork the repo, add the files, and launch MyBinder from your fork.

But perhaps the most frustrating blocker, and one I encountered on numerous occasions, and still do, was a MyBinder launch fail caused by a JupyerLab dependency mismatch.

Now I know, and appreciate, that JupyterLab is a very live project. And while I personally don’t get on with the UI (did I say that already?!) I do appreciate the effort that goes into it, at least in the sense that I see it as a demonstrator and driver of Jupyter server protocols and the core Jupyter (notebook) server itself, which can lead to many non-JupyterLab related developments.

(For example, the move to the new Jupyter server from the original notebook server is very powerful, not least in terms of making it easier to launch arbitrary application containers via JupyterHub or Binderhub that can use the new server to send the necessary lifesigns and heartbeats back to the hub to keep the container running.)

My attacks against JupyterLab are not intended as ad hominem attacks or as disparaging to the developers; the work they do is incredible. They are a statement of my preferences about a particular user interface in the context of the impact it may have on uptake of the wider Jupyter project.

If I had encountered JupyterLab, rather than then classic notebook, eight years ago, I would not have thought it useful for the sorts of open online education I’m engaged with.

If things had started with JupyterLab, and not classic notebook, I’m not convinced that there would be the millions of notebooks there are now on Github.

I am happy to believe that the JupyterLab UI has gone through peak complexity in terms of first contact and that when it replaces the classic notebook to become the default UI it will not be overly hostile to users. But I remain to be convinced that it will be as relatively straightfoward for non-developers with a smattering of basic HTML and Javascript skills to develop for as the classic notebook UI was, and is.

I am reminded of the earlier days of Amazon, Google, Twitter et al, when their APIs were easy to use, didn’t require keys and authentication etc. With a few Yahoo Pipes you could build all many of things armed with nothing other than a few simple URL patterns and a bit of creative thinking. Then the simple APIs got complex, required various sorts of auth, and the play for anyone other than seasoned developers with complex toolchains stopped.

So: failed builds. Over the years, many of the failed builds I have encountered, many of the failed demos from repos labeled with a MyBinder button, have resulted from mismatches, somewhere, in JupyterLab versions.

The complexity of JupyterLab (from my perspective, as a non-developer, and no familiarity with node or typescript) means I would struggle to know if, how or what dependencies to fix things, even I had the time to.

But more pressing is the effect of JupyterLab dependencies and package conflicts breaking things. (Pinning dependencies doesn’t necessarily help either. MyBinder puts in place recent packages in the core environment it builds, so users are dependent on it in particular ways. As far as simplicity of use goes (which I take as a key aspiration for MyBinder), pinned requirements is just way too complicated for most people anyway. But the bigger problem is, there are certain things (like the core Jupyter environment MyBinder provides) that you may not be able to pin against.)

Now I may be misreading the problem, but it’s based on seeing literally hundreds of error messages over the years that suggest JupyterLab package conflicts cause MyBinder build fails.

And this is a probem. Because if you are trying to lobby for uptake of Jupyter related technologies, and you give folk a link for something you tried yesterday that worked, and they try it today and it fails because of a next generation user interface package conflict that has nothing to do with the classic notebook demonstration I’m trying to share, then you may have lost your one chance (at least for the foreseeable future) to persuade that person to take that Jupyter step.

So, as ever, reflective writing has helped me refine my own position. There are three things that I have issues with relating JupyterLab:

  • the complexity of the UI (but this seems to be being simplified as the UI matures away from peak developer scaffolding);
  • the complexity of the development environment (I keep trying to make sense of JupyterLab extensions but have to keep on giving up; IANAWD);
  • (the new realisation): the side effects in breaking unrelated demos launched via MyBinder.

(Another issue I have had over the years was the very long, slow build time that resulted from JupyterLab related installs in MyBinder builds. This has improved a lot over recent versions of JupyterLab (again, the scaffolding seems to be being taken down) but I think it has also caused a lot of harm in the meantime in terms of the way it has impacted trying to demonstrate Jupyter notebooks or their application that themselves have no direct JupyterLab requirement.)

Now, it might be that this is a side effect of how MyBinder works, (and I don’t mean to attack or disparage the efforts of the MyBinder team), and may not be a JupyterLab issue per se, but it does impact on the Jupyter user experience.

At the end of the day, at the end pretty much of every day for the last seven years, there’s rarely been a day where I haven’t spent some time, if not many hours, using Jupyter services.

So to be clear, I do not hate Jupyter (that was a typo). But I do have real issues with JupyterLab as the default UI which caters more to the development aesthetic than the narrative computational notebook reader, on the way it can impact negatively on my Jupyter user experience, and on the way I believe it makes it harder for folk to engage with at the level of contributing their own extensions.

PS as another reflection, I know that I do myself reputational harm, may cause reputational harm, and may offend individuals via my tweets and blog posts. First, the account is me, but it’s also an angry-frustrated-by-tech-but-hopeful-for-it persona. Second, I often admit ignorance, my opinions change and I do try to correct errors of fact. Third, attacks are never intended as ad hominem attacks, they are typically against process in context, how “the system” has led to, allowed, enabled or forced certain sorts of behaviour or action. (If I say “why would you do that?” the you is a generic “anyone” acting in that particular context.) And fourth, my own tweets and blog posts typically have little direct impact that I can see in terms of RTs, likes or comments. On the rare occasions they do, they often result in moments of reflection, as per this blog post…

Custom _repr_ printing in Jupyter Notebooks

A simple trick that I’ve been sort of aware from years, but never really used: creating your own simple _repr_ functions to output pretty reports of Python objects (I’m guessing, but haven’t yet checked, that this should work for a Jupyter Book HTML publishing workflow too).

The formulation is:

html_formatter = get_ipython().display_formatter.formatters['text/html']
html_formatter.for_type(CLASS, HTML_REPORT_FUNCTION);

Via: IPython — Formatters for third-party types.

Via a tweet, a simple example of adding repr magic to a class:

class HelloWorld():
  def __repr_html__(self):
    return "<div>Hello world!</div>"

#Then:
a = HelloWorld()
a