Category: Anything you want

Helluva Job – Still Not Running Libvirt vagrant plugin on a Mac…

Note to self after struggling for ages trying to install <span class="s1">vagrant-libvirt plugin on a Mac…

Revert to vagrant 2.0.0 then follow recipe given by @ccosby


Then more ratholes…

Try to start a libvirtd daemon: /usr/local/sbin/libvirtd  which I had hoped would write a socket connection file to somewhere that I could use as the basis of a connection (? /var/run/libvirt/libvirt-sock ) but that didn’t seem to work?:-(

Help (/usr/local/sbin/libvirtd --help) suggests:

Configuration file (unless overridden by -f):





CA certificate:     $HOME/.pki/libvirt/cacert.pem

Server certificate: $HOME/.pki/libvirt/servercert.pem

Server private key: $HOME/.pki/libvirt/serverkey.pem

PID file:


but $XDG_RUNTIME_DIR/ doesn’t appear to be set and I can’t see anything in local dir… Setting it to /var/run/ doesn’t seem to help? So I’m guessing I need a more complete way of starting libvirtd such as passing a process definition/config file?

Take, Take, Take…

Alan recently posted (Alan recently posts a lot…:-) about a rabbit hole he fell down when casually eyeing something in his web stats (Search, Serendipity, Semantically Silent).

Here’s what I see from my hosted WordPress blog stats:

  • traffic from Google but none of the value about what folk were searching for, although Google knows full well. WordPress also have access to that data from the Google Analytics tracking code they force into my hosted WordPress blog, but I don’t get to see it unless I pay for an upgrade…
  • traffic from pages on educational sites that I can’t see because they require authentication; I don’t even know what the course was on… So how can I add further value back to support that traffic?
  • occasional links from third party sites back in the day when people blogged and included links…

See also: Digital Dementia – Are Google Search and the Web Getting Alzheimer’s? etc…

More Cookie Notices…

One of the cookie acceptance notices I’ve started noticing on first visits to several sites lately comes from TrustArc:


This categorises cookies into three classes — required, functional and advertising — and lets you make decisions on whether to accept those cookies at the category level or the individual provider level if look at the Detailed Settings:


However, opting out at the category level doesn’t necessarily mean you have opted out of all the cookies provided in that category:


So whilst I like things like the TrustArc display in principle, it would be nicer if it also had a percentage bar display, for example, showing the percentage of cookie providers that were successfully opted out from in that category?

I’m not quite sure how cookie opt-outs work. I can see how it’d work on the user side – e.g. follow a link provided by someone like IgnitionOne – to a page that sets opt-out cookies in your browser, but how about a publisher setting the opt-out from a third party on your behalf? This description — OpenX – how the cookie-based opt-out mechanism works — suggests that the publisher calls a link and then “OpenX sets the user opt-out cookie and performs a test to see that the user accepted the cookie“, but how do the bits of data flow and what state or flags are set where? Another one to add to the “I don’t understand how this actually works; try to find out at some point” list…

Data Ethics – Guidance and Code – And a Fragment on Machine Learning Models

By chance, I notice that the Department for Digital, Culture, Media & Sport (DDCMS) have published a guidance on a Data Ethics Framework, with an associated workbook, that is intended to guide the design of appropriate data use in government and the wider public sector.

The workbook is based around providing answers to questions associated with seven principles:

  1. A clear user need and public benefit
  2. Awareness of relevant legislation and codes of practice
  3. Use of data that is proportionate to the user need
  4. Understanding of the limitations of the data
  5. Using robust practices and working within your skillset
  6. Making work transparent and being accountable
  7. Embedding data use responsibly

It’s likely that different sector codes will start to appear, such as this one from the Department of Health & Social Care (DHSC): Initial code of conduct for data-driven health and care technology. In this case, the code incorporates ten principles:

1 Define the user
Understand and show who specifically your product is for, what problem you are trying to solve for them, what benefits they can expect and, if relevant, how AI can offer a more efficient or quality-based solution. Ensure you have done adequate background research into the nature of their needs, including any co-morbidities and socio-cultural influences that might affect uptake, adoption and ongoing use.

2 Define the value proposition
Show why the innovation or technology has been developed or is in development, with a clear business case highlighting outputs, outcomes, benefits and performance indicators, and how exactly the product will result in better provision and/or outcomes for people and the health and care system.

3 Be fair, transparent and accountable about what data you are using
Show you have utilised privacy-by-design principles with data-sharing agreements, data flow maps and data protection impact assessments. Ensure all aspects of GDPR have been considered (legal basis for processing, information asset ownership, system level security policy, duty of transparency notice, unified register of assets completion and data privacy impact assessments).

4 Use data that is proportionate to the identified user need (data minimisation principle of GDPR)
Show that you have used the minimum personal data necessary to achieve the desired outcomes of the user need identified in 1.

5 Make use of open standards
Utilise and build into your product or innovation, current data and interoperability standards to ensure you can communicate easily with existing national systems. Programmatically build data quality evaluation into AI development so that harm does not occur if poor data quality creeps in.

6 Be transparent to the limitations of the data used and algorithms deployed
Show you understand the quality of the data and have considered its limitations when assessing if it is appropriate to use for the defined user need. When building an algorithm be clear on its strengths and limitations, and show in a transparent manner if it is your training or deployment algorithms that you have published.

7 Make security integral to the design
Keep systems safe by integrating appropriate levels of security and safeguarding data.

8 Define the commercial strategy
Purchasing strategies should show consideration of commercial and technology aspects and contractual limitations. You should only enter into commercial terms in which the benefits of the partnerships between technology companies and health and care providers are shared fairly.

9 Show evidence of effectiveness for the intended use
You should provide evidence of how effective your product or innovation is for its intended use. If you are unable to show evidence, you should draw a plan that addresses the minimum required level of evidence given the functions performed by your technology.

10 Show what type of algorithm you are building, the evidence base for choosing that algorithm, how you plan to monitor its performance on an ongoing basis and how you are validating performance of the algorithm.

One of the discussions we often have when putting new courses together is how to incorporate ethics related  issues in a way that makes sense (which is to say, in a way that can be assessed…) One way might to apply things like the workbook or the code of conduct to a simple case study. Creating appropriate case studies can be a challenge, but via an O’Reilly post, I note that in a joint project between the Center for Information Technology Policy and the Center for Human Values, both at Princeton, has recently produced a set of fictional case studies that are designed to elucidate and prompt discussion about issues in the intersection of AI and Ethics that cover a range of issues: an automated healthcare app (foundations of legitimacy, paternalism, transparency, censorship, inequality);  dynamic sound identification (rights, representational harms, neutrality, downstream responsibility); optimizing schools issues: (privacy, autonomy, consequentialism, rhetoric); law enforcement chatbots (automation, research ethics, sovereignty).

I also note that a recent DDCMS Consultation on the Centre for Data Ethics and Innovation has just closed… One of the topics of concern that jumped out at me related to IPR:

Intellectual Property and ownership Intellectual property rights protect – and therefore reward – innovation and creativity. It is important that our intellectual property regime keeps up with the evolving ways in which data use generates new innovations. This means assigning ownership along the value chain, from datasets, training data, source code, or other aspects of the data use processes. It also includes clarity around ownership, where AI generates innovations without human input. Finally, there are potentially fundamental questions around the ownership or control of personal data, that could heavily shape the way data-driven markets operate.

One of the things I think we are likely to see more of is a marketplace in machine learning models, either sold (or rented out?) as ‘fixed’ or ‘further trainable’, building on the the shared model platforms that are starting to appear; (a major risk here, of course, is that models with built in biases – or vulnerabilities – might be exploited if bad actors know what models you’re running…). For example:

  • Seedbank [announcement], “a [Google operated] place to discover interactive machine learning examples which you can run from your browser, no set-up required“;
  • TensorFlow Hub [announcement], “a [Google operated] platform to publish, discover, and reuse parts of machine learning modules in TensorFlow“;
  • see also this guidance on sharing ML models on Google Cloud
  • [announcement], “the first infrastructure- and workflow-agnostic machine learning platform.“. No, me neither…

I couldn’t offhand spot a marketplace for Amazon Sagemaker models, but I did notice some instructions for how to import Your Amazon SageMaker trained model into Amazon DeepLens, so if model portability is a thing, the next thing Amazon will likely to is to find a way to take a cut from people selling that thing.

I wonder, too, if the export format has anything to do with ONNX, “an open format to represent deep learning models?


(No sign of Google there?)

How the IPR around these models will be regulated can also get a bit meta. If data is licensed to one party so they can train a model, should the license also cover who might make use of any models trained on that data, or how any derived models might be used?

And what counts as “fair use” data when training models anyway? For example, Google recently announced Conceptual Captions. “A New Dataset and Challenge for Image Captioning“. The dataset:

consist[s] of ~3.3 million image/caption pairs that are created by automatically extracting and filtering image caption annotations from billions of web pages.

So how were those images and text caption training data gathered / selected? And what license conditions were associated with those images? Or when compiling the data set, did Google do what Google always does, which is conveniently ignore copyright because it’s only indexing and publish search results, not actually re-publishing material (allegedly…).

Does that sort of consideration fall into the remit of the current Lords Communications Committee inquiry into The Internet: to regulate or not to regulate?, I wonder?

A recent Information Law and Policy Centre post on Fixing Copyright Reform: How to Address Online Infringement and Bridge the Value Gap, starts as follows:

In September 2016, the European Commission published its proposal for a new Directive on Copyright in the Digital Single Market, including its controversial draft Article 13. The main driver behind this provision is what has become known as the ‘value gap’, i.e. the alleged mismatch between the value that online sharing platforms extract from creative content and the revenue returned to the copyright-holders.

This made me wonder, is there a “mismatch” somewhere between:

a) the data that people share about themselves, or that is collected about them, and the value extracted from it;
b) the content qua data that web search engine operators hoover up with their search engine crawlers and then use as a corpus for training models that are used to provide commercial services, or sold / shared on?

There is also a debate to be had about other ways in which the large web cos seem to feel they can just grab whatever data they want, as hinted at in this report on Google data collection research.

The Growing Popularity of Jupyter Notebooks…

It’s now five years since we first started exploring the use of Jupyter notebooks — then known as IPython notebooks — in the OU for the course that became (and still is) TM351 Data management and analysis.

At the time, the notebooks were evolving fast (they still are…) but knowing the length of time it takes to produce a course, and the compelling nature of the notebooks, and the traction they already seemed to have, it felt like a reasonably safe technology to go with.

Since then, the Jupyter project has started to rack up some impressive statistics. Using Google BigQuery on public datasets, we can find how many times the Jupyter Python package is installed from PyPi, the centralised repository for Python package distribution, each month by querying the the-psf:pypi.downloads dataset (if you don’t like the code in this post, how else do you expect me to demonstrably source the supporting evidence…?!):

  STRFTIME_UTC_USEC(timestamp, "%Y-%m") AS yyyymm,
  COUNT(*) as download_count
    DATE_ADD(CURRENT_TIMESTAMP(), -1, "year"),

WHERE file.project="jupyter"
GROUP BY yyyymm

(Our other long term bet for TM351 was on the pandas Python package, and that has also gone from strength to strength…)

Google BigQuery datasets also include a Stack Overflow dataset (Stack Overflow is a go to technical question-and-answer site for developers), so something like the following crappy query will count the jupyter-notebook tagged questions appearing each year:

SELECT tags, COUNT(*) c,  year
 SELECT SPLIT(tags, '|') tags, YEAR(creation_date) year
  FROM [bigquery-public-data:stackoverflow.posts_questions] a
  WHERE YEAR(creation_date) >= 2014 AND tags LIKE '%jupyter-notebook%'
WHERE tags='jupyter-notebook'
GROUP BY year, tags

I had thought we might be able to use BigQuery to query the number of notebooks on Github (a site widely used by developers for managing code repositories and, increasingly, and educators for managing course notes/resources), but it seems that the Github public data tables [(bigquery-public-data:github_repos)] only represent a sample of 10% or so of public projects?

FWIW, here are a couple of sample queries on that dataset anyway. First, a count of projects identified as Jupyter Notebook projects:

FROM [bigquery-public-data:github_repos.languages]
WHERE = "Jupyter Notebook"

And secondly, a count of .ipynb notebooks:

FROM [bigquery-public-data:github_repos.files]
#filter on files with a .ipynb suffix
WHERE RIGHT(path,6) = ".ipynb"

We can also look to see what the most popular Python packages imported into the notebooks are using a recipe I found here

numpy, matplotlib and pandas, then, perhaps not surprisingly… Here’s the (cribbed) code for that query:


#The query finds line breaks and then tries to parse import statements
  CASE WHEN package LIKE '%\\n",' THEN LEFT(package, LENGTH(package) - 4)
    ELSE package END as package, n
  SELECT REGEXP_EXTRACT( line, r'(?:\S+\s+)(\S+)' ) as package, count(*) as n
  FROM (
    SELECT SPLIT(content, '\n \"') as line
    FROM (SELECT * FROM [bigquery-public-data:github_repos.contents]
      WHERE id IN (
        SELECT id FROM [bigquery-public-data:github_repos.files]
         WHERE RIGHT(path, 6) = '.ipynb'))
       HAVING LEFT(line, 7) = 'import ' OR LEFT(line, 5) = 'from ')
  GROUP BY package

(I guess a variant of the above could be used to find out what magics are most commonly loaded into notebooks?)

That said, Github also seem to expose a count of files directly, as described in parente/nbestimate, from where the following estimates of growth in the the numbers of Jupyter notebook / .ipynb files on Github are taken:

The number returned by making the following request is approximate – maybe we get the exact count if we make an authenticated request?;ref=searchresults&amp;type=Code

On Docker hub, where various flavours of official notebook container are hosted, there’ve been over a million pulls of the datascience notebook:


Searching Docker hub for jupyter notebook also identifies over 5000 notebook related Docker container images at the current count, of which over 1500 are automatically built from Github repos.

Mentions of Jupyter notebooks are also starting to appear on ITJobswatch, a site that tracks IT and programming job ads in the UK:

By the by, if  you fancy exploring other sorts of queries, this simple construction will let you run a Google web search to look for .ipynb files that mention the word course or module on websites…

filetype:ipynb (course OR module)

Anyway, in the same way that it can take many years to get anything flying regularly on NASA space flights, OU courses take a couple of years to put together, and innovations often have to be proved to work for a couple of course presentations in another course before a second course will take them on; so it’s only now that we’re starting to see an uptick of interest in using Jupyter notebooks in other courses… but that interest is growing, and it’s looking like there’s a flurry of courses about to start production that are likely to be using notebooks.

We may have fallen a little bit behind places like UC Berkeley, where 50% of undergrads now take a core, Jupyter notebook powered, datascience course, with subject specifc bolt-on mini-modules, and notebooks now provide part of the university core infrastructure:

…but even so, there could still be exciting times ahead…:-)

Even more exciting if we could get the library to start championing it as a general resource…


Robot Workers?

A lazy post that does nothing much more than rehash and link bullet points from the buried lede that is someone else’s post…

It seems like folk over at the Bank of England have been in the news again about robots taking over human jobs (Bank of England chief economist [Andy Haldane] warns on AI jobs threat); this follows on from a talk earlier this year by Mark Carney at the Public Policy Forum in Toronto [slides] and is similar in kind to other speeches coming out of the Bank of England over the last few years (Are Robots Threatening Jobs or Are We Taking Them Ourselves Through Self-Service Automation?).

The interview(?) was presumably in response to a YouGov survey on Workers and Technology: Our Hopes and Fears associated with the launch of a Fabian Society and Community Commission on Workers and Technology.

(See also a more recent YouGov survey on “friends with robots” which asked “In general, how comfortable or uncomfortable do you think you would be working with a colleague or manager that was a robot?” and “Please imagine you had received poor service in a restaurant or shop from a robot waiter/ shop assistant that is able to detect tone and emotion in a human’s voice… Do you think you would be more or less likely to be rude to the robot, than you would to a human waiter/ shop assistant, or would there be no difference? (By ‘rude’, we mean raising your voice, being unsympathetic and being generally impolite… )“.)

One of the job categories that is being enabled by automation is human trainers that help generate the marked up data that feeds the machines. A recent post on The Lever, “Google Developers Launchpad’s new resource for sharing applied-Machine Learning (ML) content to help startups innovate and thrive” [announcement] asks Where Does Data Come From?. The TLDR answer?

  • Public data
  • Data from an existing product
  • Human-in-the-loop (e.g. a human driver inside an “autonomous” vehicle)
  • Brute force (e.g. slurping all the data you can find; hello Google/Facebook etc etc)
  • Buying the data (which means someone is also selling the data, right?)

A key part of many machine learning approaches is to use labelled datasets that the machine learns from. This means taking a picture of a face, for example, that a human has annotated with areas labelled “eyes”, “nose”, “mouth”, and then training the ‘pootah to try to identify particular features in the photographs that allow the machine to identify those labels with those features, and hopefully the corresponding elements in a previously unseen photo.

Here’s a TLDR summary of part of the Lever post, concerning where these annotations come from:

  • External annotation service providers
  • Internal annotation team
  • Getting users to generate the labels (so the users do folk in external annotation service providers out of a job…)

The post also identifies several companies that provide external annotation services… Check them out if you want to get a glimpse of a future of work that involves feeding a machine…

  • Mechanical Turk: Amazon’s marketplace for getting people to do piecemeal bits of work for pennies that other people often sell as “automated” services, which you might have though meant “computerised”. Which it is, in that a computer assigns the work to essential anonymous, zero hours contract workers. Where it gets really amusing is when folk create bots to do the “human work” that other folk are paying Mechanical Turk for
  • Figure Eight: a “Human-in-the-Loop Machine Learning platform transforms unstructured text, image, audio, and video data into customized high quality training data”… Sounds fun, doesn’t it? (The correct answer is probably “no”);
  • Mighty AI: “a secure cloud-based annotation software suite that empowers you to create the annotations you need for developing and validating autonomous perception systems”, apparently.. You get a sense of how it’s supposed to work from the blurb:
    • “Mighty Community”, a worldwide community that provides our customers with timely, high-quality annotations, offloading the burden to find, train, and manage a pool of annotators to generate ground truth data.
    • Expert global community allows for annotations 24 hours/day
    • Training on Mighty Tools eliminates annotator on-boarding time
    • Available at a moment’s notice to instantly scale customer annotation programs
    • Community members covered by confidentiality agreement
    • Automated annotation management process with Mighty Studio
    • Close integration with Mighty Quality eliminates the need to find and correct annotation errors
  • Playment: “With 300,000+ skilled workers ready to work round-the-clock on Playment’s mobile and web app, we can generate millions of labels in a matter of hours. As the workers do more work, they get better and Playment is able to accomplish much more work in lesser time.” (And then when they’ve done the work, the machine does the “same” work with reduced marginal cost… Hmm, thinks, how do the human worker costs (pennies per task) compare with the server costs for large ML services?)

Happy days to come, eh…?

Legislating For Mandatory Software Updates

From the Automated and Electric Vehicles Act 2018, I notice the following:

4 Accident resulting from unauthorised software alterations or failure to update software

(1) An insurance policy in respect of an automated vehicle may exclude or limit the insurer’s liability under section 2(1) for damage suffered by an insured person arising from an accident occurring as a direct result of—
(a) software alterations made by the insured person, or with the insured person’s knowledge, that are prohibited under the policy, or
(b) a failure to install safety-critical software updates that the insured person knows, or ought reasonably to know, are safety-critical.

(2) But as regards liability for damage suffered by an insured person who is not the holder of the policy, subsection (1)(a) applies only in relation to software alterations which, at the time of the accident, the person knows are prohibited under the policy.

(3) Subsection (4) applies where an amount is paid by an insurer under section 2(1) in respect of damage suffered, as a result of an accident, by someone who is not insured under the policy in question.

(4) If the accident occurred as a direct result of—
(a) software alterations made by an insured person, or with an insured person’s knowledge, that were prohibited under the policy, or
(b) a failure to install safety-critical software updates that an insured person knew, or ought reasonably to have known, were safety-critical,the amount paid by the insurer is recoverable from that person to the extent provided for by the policy.

(5) But as regards recovery from an insured person who is not the holder of the policy, subsection (4)(a) applies only in relation to software alterations which, at the time of the accident, the person knew were prohibited under the policy.

(6) For the purposes of this section—
(a) “software alterations” and “software updates”, in relation to an automated vehicle, mean (respectively) alterations and updates to the vehicle’s software;
(b) software updates are “safety-critical” if it would be unsafe to use the vehicle in question without the updates being installed.

It looks like this is the first time that “software update” appears in legislation..

Is this the start of things to come?

And what tools exist on the and websites to make it easier to keep track of such phrases in passed, enacted and draft legislation?

PS Michael posted some useful thoughts lately on why the Parliament website isn’t for everyone. It also reminded of an important rule when writing bids for outreach activities: “if you claim the project event is for everybody, that means it’s targeted at nobody and won’t be funded”. After all, who should go, and why?

Algorithms That Work, A Bit… On Teaching With Things That Don’t Quite Work

Grant Potter picked up on a line in yesterday’s post on Building Your Own Learning Tools:

including such tools as helpers in an arts history course would help introduce students to the wider question of how well such algorithms work in general, and the extent to which tools or applications that use them can be trusted

The example in point related to the use of k-means algorithms to detect common colour values in images of paintings in order to produce a “dominant colour” palette for the image. The random seed nature of the algorithm means that if you run the same algorithm on the same image multiple times, you may get a different palette each time. Which makes a point about algorithms, if nothing else, and encourages you to treat them just like any other (un)reliable witness. In short, the tool demoed was a bit flakey, but no less useful (or instructive) for that…

Grant’s pick-up got me thinking that about another advantage of bringing interactive digital — algorithmic — tools into the wider curriculum using things like responsive Jupyter notebooks: by trying to apply algorithms to the analysis of things people care about, like images of paintings in art history, like texts in literature, the classics, or history, like maps in geography and politics, and so on, and showing how they can be a bit rubbish, we help students see the limits of the power of similar algorithms that are applied elsewhere, and help them develop healthy — and a slightly more informed — scepticism about algorithms through employing them in a context they (academically) care about.

Using machine “intelligence” for analysis or critique in arts or humanities also shows how many perspectived those subject matters can be, and how biases can creep in. (We can also use algorithmic exploration to dig one level deeper in the sciences. In this respect, when anyone quotes an average at me I think of Anscombe’s Quartet, or for a population trend, Simspon’s Paradox.)

Introducing computational tools as part of the wider curriculum, even in passing, also helps reveal what’s possible with a simple (perhaps naive) application of technology, and the possible benefits, limits and risks associated with such application.

(Academic algorithm researchers often tout things like a huge increase in performance on some test problem from 71.2% success to 71.3% for example, albeit at the expense of using a sizeable percentage of the world’s computers to run the computation. Which means 3 times in 10 it’s still wrong and the answer still needs checking all the time to catch those errors. If I can do the same calculation to 60% success on my Casio calculator watch, I still get a benefit 60% of the time, and the risk of failure isn’t that much different. You pays your money and you takes your choice.)

This also relates to “everyone should learn to code” dogma too. Putting barebones computational tools inline into course materials show’s how you can often achieve some sort of result with just a line or two of code, but more importantly, it shows you what sorts of thing can be done with a line or two of code, and what sorts of thing can’t be done that easily…

In putting together educational materials, we often focus on teaching people the right way to do something. But perhaps when it comes to code, rather than “everyone should learn to code”, maybe we need “everyone should experience the limitation of simple algorithms”? The solutionist good reason for using computational in methods the curriculum would be to show how useful computers can be (for some things) for automating, measuring, sorting and filtering. The resistance reason is that we can start to show people what the limits of the algorithms are, and how the computer answer isn’t always (isn’t often?) the right one.

My Festival Round-Up – Beautiful Days, 2018

Belatedly, a round-up of my top bands from Beautiful Days, our best festival (once again…) of the year…

Thursday is get in day, and no formal music events, with the stages kicking off on Friday. I left the obligatory Levellers acoustic session to catch the full Rews set opening up the main stage, and a cracking start to the festival proper:

They’ve got some great riffs and infectious lyrics, but I think the festival set list order could do with a tweak… A lot of the songs have an early hook, presumably to catch the interest of the Spotify / Youtube generation before they click away, but in a festival set when you’ve caught the interest of the audience, there’s an opportunity to give a tune bit more space and play with the instrumental breaks a bit more. The audience was asked for an opinion on a new song and I gave it a thumbs up, in general, but in need of a bit of reworking: let it build a bit more… Dropping one of the songs from the set list to give more space to songs that were rocking would have made this an even better set for me, but it was still a blast… They’re touring later in the year, so if you get a chance to see ’em, take it whilst the tickets are still affordable…

A quick saunter over to the Big Top to see the Army’s Justin Sullivan on a solo set, then back to the main stage for My Baby, the stand out act of the weekend for me… I’m not generally in to the dance thing, but they were just incredible, building songs beautifully and just FRICKIN’ AWESOME…

If you ever get a chance to see them in a festival setting, don’t miss it… Rews could learn a lot from them…

I’d also happily spend another lazy afternoon sipping Long Island Iced Tea and foot-tapping along with Kitty, Daisy and Lewis

For myself and many of the old gang I met up with at the festival, Feeder are the band whose songs you may remember well – and they have a lot of them – but for which you could never remember who played them. Solid, slick, but forgettably memorable…

Then it was Hives… meh.. irritating, irritating, irritating; but so irritating that a lot sat through it to see just how much more irritating they could be… I wandered off to see Suzanne Vega, who had a couple of session musicians alongside her stalwart guitarist, but the set was the same as the two piece offering and perhaps more suited to a theatrical setting than a festival alt-stage. TBH, for that hour or two, a little bit of me would rather have been back on the island watching 77/78

A slow start to Saturday – too much iced tea the day before – but whilst Kitty McFarlane’s story backed folk songs were quite beautiful (I’d love to see her at the Quay Arts) it wasn’t quite rockin’ enough to kick start my day, so I regret not catching more of Emily Capell who was boogie woogie -licious -lightful:

I’m not sure whether she has a thing going with any of the Spitfires, who were on after, but it’s the first time I’ve seen an early after support act: a) brag about how they’re the tightest band at the festival, and b) suggest everyone go to the clean toilets a walk away by the acoustic tent for a poo when then next main stage band are on…

Undeterred, I still gave the Spitfires a chance – should’ve brought me docks… It’d be good to see them at Strings to give the Orders a bit of a lesson in how to be’ave…

I’m still not sure how I get on with 3 Daft Monkeys… perhaps depends on the mood I’m in when I catch ’em… would be nice to see them again at Rhythm Tree.

Elephant Sessions, on the other hand, I have no doubts about… The video below doesn’t do the power of them justice… played loud in a big tent: brull-yunt. It’d be a bit of a trek to get them down to the Island but I’d give them a floor to sleep on any day…

(I also thought this was interesting – their tech spec / stage plan.)

Saturday headliners were the Manics… whilst I admire their performance, persistence and continued inventiveness, I’ve never really got on with them. I should perhaps have made it down to the front (Beautiful Days is a beautifully sized festival – you can get as close to the stage as you fancy and have a bit of a jig, though it can get a little boisterous from time to time…).

Sunday opened with homage to The Bar Steward Sons of Val Doonican, the second band I now rate from Barnsley (Hands Off Gretel being the other…). Lyrical reworkings of well known tunes, Paint ‘Em Back had me in stitches, and the audience management was incredible – an inflatable boat trip surf completely round the Big Top, and the whole tent, and a packed tent at that – down low for a jump; a perfick Sunday awakening…

The Bar Stewards took inspiration from another “comedy” act who were playing in the pub, Hobo Jones and the Junkyard Dogs.

We gave them a bit of a listen over a(nother) cocktail whilst sat on the hill, before heading back for Dub Pistols:

I skipped out with the last 2 or 3 three songs to go (not least because I needed a rest!) to check out Skinny Lister, but that was a mistake… should’a stayed at the Pistols for the full set… Oysterband a bit later didn’t really work for me either…

So that was pretty much it, then, (I missed Gogol Bordello in favour of hitting the tent to replenish the cocktail readymix bottles) until days end with the Levellers full set. I’d watched them spotting the lights the night before – although the lighting guy hadn’t been in an overly chatty mood (“concentrating….”)

Post festival shopping list:

  • Rews, My Baby, Elephant Sessions, Emily Capell, maybe the Spitfires.

I also saw a lot of Ferocious Dog t-shirts, so maybe something from them too…

Build Your Own Learning Tools

A long time ago, I realised that one of the benefits of using simple desktop (Lego) robots to teach programming was that it allows you to try – and test – code out in a very tangible way. (Programming a Logo turtle provides a similar, though less visceral, form of direct feedback*.

One of the things I’ve started exploring recently is the extent to which we can create “reproducible” (open) educational resources using Jupyter notebooks. Part of the rationale for this is that if the means of producing a particular asset are provided, then it becomes much easier to reuse the assets with modification. However, I’ve also come to appreciate that having a computational environment to hand also means we can explore the taught subject matter in a wider variety of ways.

In that context, one of the units I am looking at is an art history course on OpenLearn (Making sense of art history). One of the activities asks learners Are the colours largely bright or dull? in a selection of paintings, although I struggled to find a definition of what “bright” and “dull” may be. This got me thinking about how images can be represented, and the extent to which we could create simple tools using powerful libraries to support student exploration of a particular topic. For example, helping them “test” particular images for different attributes in both a mechanical way (based on physical measurements) as well as personal experience.

As another example, the unit introduces the notion of a colour wheel, which made me wonder if I could find a way of filtering the “blueish” colours by doing some image processing on the colour wheel image provided in the materials:

The original image is the one on the left; the “blueish” values filtered image is the one on the right.

(I couldn’t find a general colour filter function – just an example on Stack Overflow for filtering blues… What I did wonder though was about a control that would let you select an area of the colour wheel and then apply that as a filter to another image.)

With such a filter in place, when the course materials suggest an image is “predominantly blue” I can check it to see exactly where it is predominantly blue…

(This also raises questions about our perception of colour, which is another important aspect of art appreciation and perhaps demonstrates certain limitations with computational analysis; which is a Good Thing to demonstrate, right? And it also makes us think about things like cognitive psychology and the art of the artist…)

Another question in the OpenLearn unit asked students to compare the colour palette used in two different images. I tried to make sense of that in terms of trying to build some sort of instrumentation that would identify the dominant palette / colours in an image, which is – and isn’t – as simple as it sounds. From a quick search, it seems that the best approach is to use cluster analysis to try to identify the dominant colour values. Several online recipes demonstrated how to use k-means clustering to achieve this, whilst the color-thief-py package uses a median cut algorithm:

(Another advantage of analysing images in this way is that it may provide us with things we can describe (automatically, as text) when trying to make our materials accessible. For example, Automatically Generating Accessible Text Descriptions of ggplot Charts in R.)

Basic code for generating the palette was quite easy to find, and the required packages (PIL, opencv2, scikit-image) are all preinstalled on Azure notebooks (my demos are here – check the image processing notebook). This meant I was relatively quick to get started with a crappy tool for exploring the images. But tools that provided immediate feedback relating to questions I could ask of arbitrary images, and tools that could be iterated on (for example, improving the palette display by ordering the palette relative to a colour wheel).

One of the tricks I saw in the various palette-cluster demos was to add the palette to the side of an image, which is a neat way of displaying it. This also put me in mind of a two-axis display in which we might display the dominant colour in particular horizontal and vertical bands of an image as a sidebar/bottom bar. That’s on my to do list.

Using techniques such as k-means clustering for the palette analysis also made me think that including such tools as helpers in an arts history course would help introduce students to the wider question of how well such algorithms work in general, and the extent to which tools or applications that use them can be trusted. k-means algorithms typically have a random seed, so each time you run them you may get a different answer (a different palette, in the above case, even when the same algorithm is applied to the same image). This encourages learners to see the computer as providing an opinion about an image rather than a truth. The computer’s palette analysis can thus be seen by the learner as another perspective on how to read an image, say, but not the only way of reading it, nor even a necessarily reliable way of reading it, although one that is “informed” in a particular sense. (The values used to see the k-means clusterer can be viewed as biases of the clusterer that change each time it’s run; many times, these differing biases may not result in significantly different resulting palettes – but sometimes they may…)

Anyway… the point of this post was supposed to be: the computational engine that is available to us when we present educational materials as live Jupyter notebooks means that we can build quite simple computational tools to extend the environment and allow students to interact with, and ask a range of questions of, the subject matter we are trying to engage them with. Because after all, everyone should learn to code / programme, right? Which presumably includes educators…? Which means we can all be ed-techies now…

See also: Jupyter Notebooks, Cognitive Tools and Philosophical Instruments.

* I’ve recently started to learn to play a musical instrument, as well as read music, for the first time, and this also provides a very powerful form of direct feedback. In case you’re wondering: a Derwent Adventurer 20 harp from the Devon Harp Center in Totnes. By the by, there is also an Isle of Wight harp festival in Ryde each year.