Category: Anything you want

Digital Dementia – Are Google Search and the Web Getting Alzheimer’s?

According to the Alzheimer’s Society, Alzheimer’s disease – one of the most common forms of dementia – memory lapses tend to be one of the first symptoms sufferers become aware of along with “difficulty recalling recent events and learning new information”.

One of the things I have been aware of for some time but only started trying to pay more attention to recently, is how Google search increasingly responds to many of my tech related web queries with results that are dated 2013 and 2014. In addition, the majority of traffic to my blog is directed to a few posts that are themselves several years old, and that were shared – through blog links and links from other public websites at the time they were posted.

(I also note that Google web search is increasingly paranoid. If I run search limited queries, for example using the site: or inurl: or filetype: search limits, it often interrupts the search with a dialog asking if I am a robot.)

So I’m wondering, has Google web search,  and the web more generally, got a problem?

Google’s early days of search, that helped promote it’s use, were characterised by a couple of things that I remember from discovering via the MetaCrawler web search engine that aggregated results from several other web search engines: one was that the results were relevant, the other was that the Google search engine results came back quickly.

Part of Google’s secret sauce at the time was PageRank, an algorithm mined the link structure of the web – how websites linked to pages on other sites – to try to work out which pages the web thought were important. The intuition was that folk link to things they are (generally) happy to recommend, or reference as in some way authoritative, and so by mining all these links you could rank pages, and websites, according to how well referenced they were, and how well regarded the linking sites were in turn.

Since its early days, Google has added many more ranking factors (that is, decision criteria it uses to decide which results to put at the top of a search results listing for a particular query) to its algorithm.

To the extent that Google can generate a significant proportion of a websites traffic from its search results pages, this led to many websites engaging in “search engine optimisation”, where they try to identify Google’s secret ranking factors and maximise their webpages’ scores against them. It also means that the structural properties of webcontent itself may be being shaped by Google, or at least web publishers’ ideas of what the Google search engine favours.

If it is true that many of the pages from 2013 or 2014 are the most appropriate web results for the technical web searches I run, this suggests that the web may have a problem: as a memory device, new memories are not being laid down (new, relevant content, is not being posted).

On the other hand, it may be that content is still being laid down (I still post regularly to this blog, for example), but it is being overlooked – or forgotten – by the gateways that mediate our access to it, which for many is Google web search.

To the extent that Google web search still uses PageRank, this may reflect a problem with the web. If other well regarded sites don’t link to a particular web page, then the link structure of the web that gives sites their authority (based, as it is in PageRank, on the quality of links incoming from other websites) is impoverished, and the old PageRank factors, deeply embedded in the structure of the web that holds over from 2013 or 2014, may dominate. Add in to the mix that one other ranking factor is likely to be the number of times a link is followed from the Google search results listing (which in turn is influenced by how high up the results the link appears), and you can start to see how a well told story, familiar in the telling, keeps on being retold: the old links dominate.

If the new “memories” are still being posted into the web, then why aren’t they appearing in the search results? Some of them may do, at least in the short term. Google’s web crawlers never sleep, so content is being regularly indexed, often shortly after it was posted. (I still remember a time when it could take days for a web page to be indexed by the search engines; nowadays it can be near instant.) If a ranking factor is recency (as well as relevance), a new piece of content can get a boost if a search to which it is relevant is executed soon after the content is posted.

Recently posted content may also get a boost from social media shares (maybe?), in which a link to a piece of content is quickly – and easily – shared via a social network. The “half-life” of links shared on such media is not very long, links typically being shared soon after they are first seen, and then forgotten about.

Such sharing causes a couple of problems when it comes to laying down structural “web memories”. For example, links shared on any given social media site may not be indexable, either usefully, or at all,  either in whole, or in part, by the public web search engines, for several reasons:

  • shares are often “ephemeral”, in that they may disappear (to all intents and purposes) from the social network after a short period of time. (Just try searching for a link you saw shared on Twitter three of four weeks ago, if you can remember one from that far back…).
  • the sheer volume of links shared on global social networks can be overwhelming;
  • the authority of people sharing links may be suspect, and the fact that links are shared by large numbers of unauthoritative actors may swamp signal in noise. (There is also the issue of the number of false actors on social media – easily created bot accounts, for example, slaved to sharing or promoting particular sorts of content.)

Whilst it’s never been easier to “share” a link, or highlight it (through “favouriting” or “liking”), the lack of effort in doing so is reflected by the lack on interest reflected in the deeper structure of the web. If you don’t add your recommendations to the structural web, or contribute content to it, it starts to atrophy. However, if you take the time to make a permanent mark in the structure of the web, by posting a blog post to a lasting, public domain, with a persistent URL that others can link to, and in turn embed your content in a contextually meaningful way by linking to other posts that you value as useful context related to the content of your own post, you can help build new memories and help the web keep digital dementia at bay.

See also: The Web Began Dying in 2014, Here’s How and The Web We Lost /via @charlesarthur

First Attempt At Using IPywidgets in Jupyter Notebooks to Display V-REP Robot Simulator Telemetry

Having got a thing together that lets me use some magic to load a V-REP robot simulator scene, connect to it and control a robot contained inside it, I also started to wonder about we could build instrumentation on the Jupyter notebook client side.

The V-REP simulator itself has graph objects that can record and display logged data within the simulator:

But we can also capture data from the simulator as part of the Python control loop running via a notebook.

(I’m not sure if streaming data from the simulator is possible, or how to go about either setting that up in the simulator connection or rendering it in the notebook?)

So here’s my quick starter for 10 getting a simple data display running in a notebook using IPython widgets.

So here’s a simple text display to give a real time (ish) view of a couple of sensor values:

As the robot runs, the widget values update in real-time-ish .

I couldn’t figure out offhand how to generate a live-updating chart, and couldn’t quickly see how to return data from inside the magic cell as part of the magic function. (In fact, I’m not convinced I understand at all the scoping in there!)

But it seems as if we set a global variable inside the magic cell, we can get data out and plot it when the simulation is stopped:

Example notebook here.

If anyone can show me how to create and update a live chart, that would be fantastic:-)

Oh How I Have Failed Thee, Jupyter Notebooks…

Although I first came across Jupyter – then IPython – notebooks in October 2012 (I think…), it took me another six months or so before I started playing regularly with them and pitched them for the then nascent TM351 course (geeknotes/history). We decided to explore the notebooks when the course/module team first met around about October 2013. Four years ago. Notebooks were also adopted for the Learn to Code for Data Analysis FutureLearn course (H/T to Michel Wermelinger for driving that) and only get the briefest of look-ins in the new level 1 course TM112 (even after I showed we could probably get turtle running in them…).

But to my shame I haven’t lobbied more on campus, and haven’t done the rounds giving talks and workshops and putting together meaningful demos.

Which is possibly the sort of activity that this newly advertised, and hugely attractive, role at the University of Edinburgh (h/t @PhilBarker) is designed to support – eLearning Officer Computational Notebooks.

Do you have a sound knowledge of technology and an enthusiasm for evaluating new approaches in education? We are looking for a learning technologist with a passion for communication and relationship management to lead a pilot of Jupyter notebooks for learning and teaching at the University of Edinburgh.

Jupyter notebooks are open-source web applications that enable learners to create, share and reuse computational narratives. Based within the central Information Services you will work closely with academic colleagues in Science and Engineering. You will analyse user requirements, advise on and support the use of Jupyter and evaluate the success of the pilot.

After clicking on the Apply button, we get to some more detail. Part of the purpose of the job is to “scope, assess demand and support requirements for a computational notebook (Jupyter Notebook) service”, something we’re trying to push through in a very limited form in the OU in order to support the TM112 notebook activity.

Here’s how the responsibilities unpick:

  1. To help academic and support staff make best use of learning technology services (in this case Jupyter Computational Notebook Service) and where required supporting and managing service change. Documenting use cases and sharing good practice and innovative solutions to improve the user experience. (Approx % of time 40%)
  2. To work with the user community and project partners in academic departments in order to continually improve the services and range of tools on offer. To maintain an up-to-date knowledge of the broader e-learning landscape in order to influence strategic direction and to develop innovative and appropriate use of learning technologies. (Approx % of time 30%)
  3. To participate and lead user and partner engagement events, in order to promote collaboration, knowledge sharing and greater awareness of services. To organise testing, training and workshops to support users. To represent the University and its interests both internally and externally. (Approx % of time 20%)
  4. Contribute to process improvement within both ISG and the wider University. Liaise and negotiate within members of University committees, user forums and working groups to formulate policy in accordance with the university strategic aims for learning and teaching. (Approx % of time 10%)

(On process improvement, I think Jupyter notebooks can provide a useful authoring environment (along with things like “written diagrams“) for “reproducible”  (which is to say, maintainable) course materials in the OU context. An approach I have had a total lack of success in promoting.)

I couldn’t help but try out a quick search for other notebooks related job ads, and turned up a handful of research posts, including one for a Bioinformatics Training Developer at the University of Cambridge – Cancer Research UK Cambridge Institute. The job duties and requirements provide an interesting complement to the skills required of a data journalist:

The training courses and summer schools already established are very popular and have gained a strong reputation. In this role, you will further develop the existing courses to reflect new advances. You will also create and deliver new training courses and materials in scientific data analysis and visualization, … . You will be responsible for assessing the training needs of research scientists and shaping a programme to meet those needs. This is an excellent opportunity to develop and apply new training approaches, making use of technologies such as R/Python Notebooks, Shiny web applications and Docker.

The successful candidate will have a degree in a scientific or computational discipline and preferably a postgraduate degree (MSc or PhD) and/or significant experience in Bioinformatics or Computational Biology, including the analysis of omics datasets using R. The role requires a high level of interpersonal and organizational skills and previous experience in preparation and delivery of training courses is essential. Strong practical skills in R and/or Python are highly desirable, including the use of version control systems, e.g. GitHub. …

[My emphasis.]

It’s maybe also worth mentioning here the current consultation around the draft Data Scientist Integrated Degree Apprenticeship (level 6) standard. Please comment if you can…

PS I popped together a feed for a search for “notebooks” on jobs.ac.uk using fetchrss.com to try to keep track of future upcoming academic job ads making mention of notebooks.

Writing Diagrams (Incl. Mathematical Diagrams)

Continuing an occasional series of posts on approaches to “writing” diagrams in a textual form and then letting the machine render them, here are some recent examples that caught my eye…

Via this Jupyter notebook on inverse kinematics, I came across Asymptote, “a standard for typesetting mathematical figures, just as TeX/LaTeX is the de-facto standard for typesetting equations” (file suffix: .asy). The language uses LaTeX for labels, and is a hight level programming language in it’s own right – which means is can do calculations in it’s own right as part of the diagram creation process.

Asymptote is also available via IPython magic, as demonstrated in this Asymptote demo notebook:

The inverse kinematics notebook is also worth reviewing in a couple of other respects. Firstly, it demonstrates embedding of another “written diagram” approach, using Graphviz:

One of the easiest ways to use Graphviz scripting in Jupyter notebooks is via some IPython Graphviz magic.

It also demonstrates how to use Sympy to to “implement” equations relating to the diagram and then generating animations based on them. (I still think it would be nice if we could unify the various maths rendering and calculating scripts.)

Way back when I learned to programme, I remember being given “railroad diagrams” (though I’m not sure they were called that? Syntax diagrams, maybe?) that described the programming language grammar defined in BNF in a visual way. Here’s a tool for generating them:

It’s a bit shiny, and a bit of pain that it doesn’t just take BNF. On the other hand, this Railroad Diagram Generator looks far more powerful:

Unfortunately, it looks to be web only and there’s no source. However, if you’re okay running Java, here’s an alternative – Chrriis/RRDiagram:

I did find a Python railroad diagram generator, Syntrax [code], but it didn’t accept BNF.

Along similar lines to the blockdiag tools I’ve described before is the purely in browser mermaid.js.

Supported chart types include flowcharts, sequence diagrams and Gantt charts.

Finally, another Grammar of Graphics style charting language – Brunel – for generating d3 output (among other things?). It can be used in notebooks, but it does require Java to be installed (as you might expect of something with an IBM relationship…?!)

PS although I think of writing diagrams more in the sense of generating rendered diagrams from text (which makes generating the diagram reproducible and maintable), these are maybe also relevant as a capture step:

Secrets and Lies Amongst Facebook Friends – Surprise Party Planning OpSec

Noting that: surprise parties can be organised and co-ordinated on Facebook between the friends and family of the person who will be subject to the surprise using private groups and private events. Potential party goers can be mined used the friends list of the subject, as well as the friends themselves.

Observing that: Facebook users seem to quickly get the hang of operational security (opsec) using the “public” medium of Facebook to mount a clandestine operation against one of the members of the same social circle.

Wondering whether: the Facebook algorithm either helps maintain that form of social/party planning opsec, or could possibly threaten it. For example, if someone accidentally makes a public post about the upcoming surprise party, does the Facebook algo suppress showing that post to the target (algorithmically, noting that a group with a particular social circle seem to be actively excluding one of the people you might expect to be in that circle), or might it prioritise showing that post to the target (algorithmically, on the the grounds that this person should normally be included in a discussions within a particular social circle and for some reason appears to have been excluded  – which Facebook can spot and fix…)

From the Archive – Browser OS and the Facebook Feed Mixing Desk

Skimming through my blog history looking for examples from my past of search thinkses, I came across several things I’d forgotten about; so as part of what may become and occasional series of posts that trip back in time, here are a couple of things I came across, one that imagined the future, another that is maybe revealing from the past.

First up, from my original blog, Micro-Info, a post about Browser OS – A Single Application Operating System.

I’m not sure how this would go down with my colleagues who still believe that everyone has a desktop or laptop computer, rather than a phone, tablet, Chromebook style computers (what’s the Microsoft equivalent?) as their primary computing device…

What do I really need from my operating system, if I’m doing everything through a browser?

Part of the point behind BOS  [Browser Operating System] is that we expect to be online most of the time, ideally with a persistent connection. Once I have the BOS customised for my hardware set-up, I don’t really need a thousand and one drivers available, just in case I add a periheral [sic], if I know I’m going to be able to install the appropriate driver from the web.

What I see for BOS, therefore, is a simple, if hefty, installation profiling client that looks at my system, works out what’s there, gets the drivers I need, and bundles them for me with a single application – my heavyweight browser – in a customised BOS installer.

And that’s what I install.

Just one application – the browser. Only the drivers I need. And only the supporting functions I need to get the browser to work on my particular system.

Okay – so I thought that I’d still be plugging things into the computer and that it would require drivers for them… But the browser-centricity…? Hmmm… (for those of you who don’t particular following operating systems, see things like Chrome_OS.)

So, the second thing that caught my eye: the Facebook feed mixing desk, this time captured by the archive of the original OUseful.info blog:

Move the sliders and tune the content of your Facebook feed. What’s revealing about this is that the user was given some control over the ranking factors for posts that appear in the feed, along with an indication of what those ranking factors were.

So for folk who today don’t understand that the content they see is tuned (forgive the pun!) by Facebook algorithms, this provides a visual metaphor for what’s going on and who has the control. Because you can bet that: a) there are many more ranking factors now; and b) it’s up to Facebook how the faders are set. And it also hints at the oft unconsidered point: c) whose ear are the faders tuning the mix to?

(By the by, see the ad? 1 minute response time?!)

Contextualised Search Result Displays

One of the prevalent topics covered in the early days of this blog concerned how to appropriate search tools and technologies and explore how they could be used as more general purpose technologies. Related to this were posts on making the most of document collections, or exploiting technologies complementary to search that returned results or content based on context.

For example:

There’s probably more…

(Like looking for shared text across documents to try to work out the provenance of a particular section of text as a document goes through multiple versions…)

Whatever…

So where are we at now…?

Mulling over recent updates to Parliamentary search, I started wondered about ranking and the the linear display of results. I’ve always quite liked facet limits that will filter out a subset of results returned by the search term based on a particular attribute. For example, in an Amazon search, we’re probably all familiar with entering a general search term then using category filters / facets in the left hand sidebar to narrow results down to books, or subject categories within books.

Indeed, the faceted search basis of  “search + filter” is one that inspired many of my own hacks.

As well as linear displays of ranked results (ranked how is always an issue), every so often multi-dimensional result displays appear. For example, things like Bento box displays (examples) were all the rage in university library catalogues several years ago, where multiple topical results panels display results from different facets or collections in different boxes distributed on a 2D grid. I’m not sure if they’re still “a thing” or whether preferences have gone back to a linear stream of results, perhaps with faceting to limit results within a topic? I guess one of the issues now is limited real estate on portrait orientation mobile phone displays compared to more expansive real estate you get in a landscape oriented large screen desktop display? (Hmmm, thinks… are Netvibes and PageFlakes still a thing?)

Anyway, I’ve not been thinking about search much for years, so in a spirit of playfulness, here’s a line of thinking I think could be fun to explore: contextualised search results, or more specifically, contextualised search result displays.

This phrase unpacks in several ways depending on where you think the emphasis on “contextualised” lies. (“contextualised lies”… “contextualies”…. hmmm…)

For example, if we interpret contextualised in the sense of context sensitive relative to the “natural arranging” of the results returned, we might trivially think of things like map based displays for displaying the results of a search where search items are geotagged. Complementary to this are displays where the results have some sort of time dependency. This is often displayed in the form of a date based ranking, but why not display results in a calendar interface or timeline (e..g. tracking Parliamentary bill process via a timeline)? Or where dates and locations are relevant to each resource, return the results via a calendar map display such as TimeMapper (more generally, see these different takes on storymaps). (I’ve always thought such displays should have to modes: a “show all” mode, and then a filtered mode, e.g. that shows just results for a particular time/geographical area limit.)

(One of the advantages of making search results available via a feed is that tinkerers can then easily wire the results into other sorts of display, particularly when feed items are also tagged, eg with facet information, dates, times, names of entities identified in the text, etc.)

A second sense in which we might think of contextualised search result displays is to identify the context of the user based on their interests. Given a huge set of linear search results, how might they then group, arrange or organise the results so that they can work with them more effectively?

Bento box displays offer a trivial start here for the visual display, for example by grouping differently faceted results in their own results panel. Looking at something like Parliamentary search, this might mean the user entering a search term and the results coming back in panels relating to different content types: results from research briefings in one panel, for example, from Hansard in other, written questions / answers in a third, and so on.

(Again, if results from the search engine are available as a tagged feed, it should be easy enough to roll your own display? Hmm… thinks.. what Javascript libraries would you use to display such a thing nowadays?)

It might also be possible to derive additional information from the results. For example, if results are tagged with members associated with a result (on a committee, asked that question, was the person speaking whose result was returned in the Hansard result), then a simple ranked facet of who the members interested in the topic across all the resource types might identify that person as someone interested in the topic (expert search / discovery also used to be a big thing, I seem to remember?).

In terms of trying to imagine differently contextualised displays, what sorts of user / user interest might there be? Off the top of my head, I can imagine:

  • someone searching for a topic “in general”: so just give them a list of stuff ranked however the search algo ranks it;
  • someone searching for a topic in general, organised by format or type (e.g. research briefing, written question/answer, parliamentary debate, committee report, etc), in which case a faceted display or bento box display might work;
  • someone searching for something in response to a news item, in which case they might want something ordered by time and maybe boosted by news mentions as a ranking factor (reminds me of trying to track media mentions of press releases and my press release / poll report CSE);
  • someone searching around the activities of an MP, in which case, you might want something like TheyWorkForYou member pages or perhaps a calendar or timeline view of their activity, or a narrative chart (e.g. with one line for a member, then other lines for the sorts of interaction they have with a topic – committee, question, debate – with each node linking to the associated document);
  • someone trying to track something in the context of the progress of a piece of legislation (or committee inquiry), in which case you may want a timeline, narrative chart or storyline style view; and maybe a custom search hub that searches over all documents relating to that piece of evolving legislation;
  • someone interested in people interested in a topic – expert search, in other words;
  • someone interested in the engagement of a person or organisation with Parliamentary processes, such as witness appearances at committee, submissions to written evidence, etc; it would also be handy if this turned up government relations, such as an association with a government group (it would be nice of that was a register, with each group having a formal register entry that included things like members…). Showing the different sorts of process, and the stage of the process at which the interaction or mention occurred could also be useful….

There are probably more…

Anyway, perhaps thinking about search could be fun again… So: does the new Parliamentary search make feeds available? And when are the Release TBC items listed on explore.data.parliament.uk going to be available?!:-)