Fragment – Product Privacy Labelling For Electronic Devices

Many consumer items have specific labelling requirements associated with them: children’s products, food, tobacco products, for example.

Manufactured products, including but in now way limited to toys, electrical products nad telecommunications equipment, also need to be safe (Product safety for manufacturers).

The labels typically identify things that can cause harm or act as a way to allow the consumer to manage risk. Labels may also be used to try to regulate a means of production through the market (e..g certification schemes, “free range eggs”, etc).

So do our new electronic devices also need to labelled with the environmental sensors they incorporate based on the potential for privacy breaking harms?

This post is a stub for examples where such sensors have not been clearly identified (please let me know via the comments if you come across further examples):

So what sensors should be identified?

  • cameras
  • microphones


Location sensing would be harder to label, because this may be done by the device itself (eg GPS), with help from other services (eg looking up location by cell tower or wifi hotspot localisation), or identified from your device by a remote service (eg IP address based localisation).

So maybe radios should also be clearly identified on the label (i.e. any wireless means by which a device can connect to a communications network). (Radios already have to comply with regulations around electromagnetic interference.)

Is this something the Lords Communications Committee, perhaps, or the Commons Science and Technology Committee has looked at, perhaps?

Tangentially related: Surveillance Camera Commissioner Guidance on the use of domestic CCTV on the one hand, and domestic video surveillance on the other.

Another thought: what starts out as spyware, eg USB cable with hidden microphone may also end up being commoditised as security devices: light bulbs with cameras/microphones for example; then just everyday: of course your music speaker needs to have a microphone in it, and your telly must have a microphone AND a camera, because, well, obvs…

PS  an example UK privacy opinions survey.

PPS internet harms, eg fake news? House of Commons Digital, Culture, Media and Sport Committee — Disinformation and ‘fake news’: Final Report

Gig Night At Strings – Holly Kirby and the Great Outdoors Supported by Doug Alldred and the Silver Lining

Drunk tweeting is one thing, drunk blogging another… so here goes whatever…


Earlier today, I did a 250 mile + ferry sprint back to the Island (“She thinks of nothing but the Isle of Wight, and she calls it the Island, as if there were no other island in the world” etc. Look it up, if you don’t recognise it…) to catch a couple of local bands in a local venue that I’m starting to class as “a proper touring band venue”: Strings.

There weren’t many in on the door, but that’s their loss…

I missed most of the second support (from Platform One stock, maybe?) but caught Doug Alldred and (a chunk of) the Silver Lining‘s support… Ever solid, great conversation, previous BBC Radio 2 airplay, if you had nothing better to do, you’d always do this (that may sound like a negative review; it’s not… If you have nothing better to do, you won’t be disappointed by seeing a Doug Alldred set….).

(The above embedded video sounds a bit like a bedroom demo. If you’ve listened to bedroom demos, you know the live set can be markedly different…)

Tonight’s set put me in mind of 3am in Ronnie Scott’s: missed last train home, something entertaining to do whilst waiting for the next day’s first train…. You wouldn’t not do it again…

Headline was Holly Kirby and the Great Outdoors.

The first time I saw Holly play was an intimate and nervous solo acoustic, a small venue with a maxed couple of dozen audience. Like a shy teen’s bedroom demos, nervously playing her personal songs. But a songstress, nevertheless.

Not hard to imagine Amy Macdonald like, if a band had been there.

Last year (?) Holly started playing with an island backing band ((&) The Great Outdoors), folk style.

(If you fancy an Island break, with a musical interlude, the RhythmTree Festival is a gentle, musically interesting, family friendly weekend… and I’ll see you there…)

Perfick… the first time… but the more I see them play, the more it misses…

Holly’s voice is so fragile, that it can so easily get lost, even with a folk acoustic band (and they’re not unsympathetic).

When I imagine her voice, it’s in the top of the back of my throat, enunciating each word clearly. Classically trained? No idea. But a songstress. But not voice front, throaty and shouty.

I would love to hear an REM acoustic cover of one of Holly’s songs:

Stipe always struck me as fragile, too…

It would be a masterclass, I think, in where the songs could go. (Stipe’s lyrical complexity and execution, the band’s interpretation, and a good sound engineer come producer’s gloss, would just: master to apprentice…. But an apprentice with promise… )

Some of the songs could hit big time…

More than hints of Enya:

But the orchestration, and production, how the swell of the song is managed, how a clever sound engineer can thicken the sound, and etc etc… It needs some more work… But with that more work, it could really( work…

(I, just… what would Holly sound like with classical strings backing, I wonder?!)

From my limited experience of sound engineering a couple of decades ago (a couple of studio tracks, the odd live gig — I am the worst sound engineer I know), I would love to have a couple of multitracks of a couple of Live Outdoors backed tracks to see if I could find a way to visualise them and see the bits that work for me, as well as the bits that don’t…

There’s space in the songs — I don’t know if a female backing singer could thicken the sound whilst still letting Holly’s voice float in space over the top — but it’s so frustrating…

The more I hear the songs played, the more I hear the bits that are missing…

One of the things I learned early on from following bands is that the same band playing the same song: never the same…

But my feeling is that Holly’s could go so much further than they can currently execute…

Which isn’t to say you shouldn’t see them if you get the chance. Because if they can make it click, you may not get the chance…

That said, some of the tunes — and execution — are pretty cracking anyway…

Mike Oldfield multi-instrumentalism, meh… Holly’s way poppier…

I Just Try to Keep On Keeping On Looking at This Virtual(isation) Stuff…

Despite working in a computing department in a distance education institution, where providing course software to students running a wide range of arbitrary devices, in a cross-platform way, can be a challenge at times, the phrase what’s Docker? is still par for the course…

…and at times I wonder why we aren’t more aggressively exploring virtualisation as a way of regaining control over software environments we provide to students and making them available in a variety of ways.

To listen along, open this in another tab…

I don’t want to wait anymore I’m tired of looking for answers

TM351, our data management course, uses a Virtualbox VM to deliver course software to students (Jupyter notebooks, OpenRefine, PostgreSQL, MongoDB). I’ve dabbled on and off with various Dockerised versions — a docker-compose composition, a monolithic multi-application container — and explored a version of the VM that runs on OpenStack.

We were supposed to have a version of the VM running on the Faculty  OpenStack server for ALs (tutors) and “needy” students who struggled with getting the VM running in 18J (that’s October, 2018), but it didn’t happen. I got a test version up by November, and tried to pull together my blockers and questions into an internal document in early December.

So much for that. Not a single question answered, not a single comment questioning any of the points I’ve raised, not any come-backs saying how incomprehensible the document is.

There’s no starting over, no new beginnings, time races on
And you’ve just gotta keep on keeping on

Elsewhere, the TM129 Frankenstein course that’s watched too many Dracula movies keeps on not dying… As with TM351, it faces issues: a Linux desktop is delivered using Virtualbox; and as on TM351, there’s grief every presentation with the least capable students having the most problems getting it running. On Windows, the default Hyper-V hypervisor clashes with Virtualbox (only one of them can run at any one time), and in some circumstances, fixing it requires a trip to the BIOS.

Gotta keep on going, looking straight out on the road
Can’t worry ’bout what’s behind you or what’s coming for you further up the road

But I’m not wrong. I’m happy to take any bet that anyone cares to make about it: virtualisation is what we should be doing because it holds the promise of anywhere running (on student’s machines, on OU servers, on remote servers whether OU provided or BYO); it allows us to control environments, and maintain them at whatever package version numbers we want.

I try not to hold on to what is gone, I try to do right what is wrong
I try to keep on keeping on

Yeah I just keep on keeping on

One of the points of difference between TM129 and TM351 is that TM129 uses desktop Linux whereas TM351 runs a headless Linux server. Desktop access creates it’s own issues, such as how do you provide desktop access to via a browser if the VM is remotely hosted. One way is via an novnc style client, another is RDP (Remote Desktop Protocol).

Take a minute out to listen to the song, mull over just how much you know about virtualisation, and wonder about how we might be able to make use of it in a distance learning context… What don’t you know you that might be preventing you from thinking this through? Your peripheral vision will cue you back into reading along…

Something good comes with the bad

One of the advantages of the desktop, rather than headless, approach is that it provides us with a context in which we can run legacy apps. One of the TM129 applications is an old, old Windows desktop app, RobotLab, packaged under WINE and PlayOnMac to provide cross-platform support. Each year it seems to require another patch, so how much easier if we dumped it into a VM and let folk run it from there.

The desktop approach also means that where an application is a desktop, rather than HTML-GUI fronted app, we can make it available to students, wheresoever they happen to be, on whatsoever platform they happen to be on…

There’s hope, there’s a silver lining

Discussions are currently afoot about how we can make life easier for TM129 students in a TM129 rewrite. One option is that we give them a Linux Live-USB, another that we ship them a Raspberry Pi. My preference would be to provide remote access. But there may be another way of keeping it local…

Show me my silver lining

In Autumn last year, Microsoft seemed to up support for Ubuntu under Hyper-V. As this post — Using Enhanced Mode Ubuntu 18.04 for Hyper-V on Windows 10 — describes, it seems that some flavours of Windows 10 will now let you run Ubuntu with a graphical desktop and access it via the Windows RDP client.

Show me my silver lining

But how does that help TM351, where we use a headless server?

Having no idea who or what or where I am

I don’t remember if I’ve explored this before — not having access to a Windows machine means I tend to shirk on (that is, completely ignore!) Windows integration and test — but this recent post on How to Migrate VirtualBox VMs to Windows 10 Hyper-V could be handy… Rather than just ship the TM351 Virtualbox VM, why don’t we ship a Hyper-V version too.

(The TM351VM build is actually managed under Vagrant, and the original Vagrantfile supported builds for Virtualbox, AWS, Linode and Azure. I was trying to be a Good Citizen by enabling run anywhere, at least insofar as I could test it…)

These shackles I’ve made in an attempt to be free

One thing I find particularly frustrating is that rather than spend an hour putting together a working and reusable demo of how some of these virtualisation demos can fit together (for example, a demo of how to run the Microsoft VS Code desktop electron app in a container, probably by cribbing CoCalc’s collaborative persistent graphical Linux applications to your browser, with integrated clipboard and HiDPI support based on what’s in the CoCalc-Docker image), I have to keep writing things like this because no-one else seems to be willing to play along…

I won’t take the easy road

PS if you’re wondering about the lyrics, they’re from First Aid Kit‘s  beautiful song “Silver Lining”. [Wipes tear from eye…]

I had it playing on repeat as I wrote this post using ListenOnRepeat.


Generating Fake Data – Quick Roundup

When developing or testing a data system, it often makes sense to try it out with some data that looks real, but isn’t real, just in case something goes wrong…

It also means you can test as much as you want without having to expose any real data.

According to this article — Synthetic data generation — a must-have skill for new data scientists — knowing how to create effective test data is one of those new skills folk are going to have to learn.

(We’re about to start looking at producing a new machine learning course, so stumbling across that sort of possible requirement is quite timely…)

So what data can you use?

By chance, whilst searching for something else, I spotted this article describing pydbgen, a simple Python package for generating fake data tables to test simple database systems.

A quick trawl turns up other packages for doing similar things, such as mimesis, or faker, which also inspired this more general R package, charlatan.

You can also generate numerical data with various statistical properties. For example, you can generate test datasets using SciKit Learn; and here’s one of my early attempts at generating 2D numerical data to demonstrate different correlation coefficients.

For text strings, generators are often referred to as ‘lorem ipsum’ generators  (why?). For example, loremipsum or collective.loremipsum. Searching for “sentence generator” will also turn up some handy packages…: markovify or markovipy, for example. If you prefer using neural network models, there are those too: textgenrnn. If you like waffle, here’s a WaffleGenerator.

If it’s technical waffle you want, here’s a classic pure fake computer science paper generator, SciGen is a classic, though it may take a bit of digging to find the required dependencies to run it…

Sometimes a real world source document can be used to bootstrap the production of a related, fake, item. For example, this “semiautomated scientific survey writing tool” that will create a scientific review paper for you: HackingScience. (I wonder, are there educational possibilities there that may help draft materials for, or support researching, new courses?)

The state of the art in text generation was evidenced by a blog post from OpenAI that was doing the rounds this week. gpt-2 looks like a related repo, but with a smaller model and no examples in the README…

Another use of text is for testing OCR (optical character recognition) systems: TextRecognitionDataGenerator. If you need to put some text into images in order to test text extraction from images: SynthText.

If it’s faces you need, then deep learning networks may help. For example, stylegan generates the sort of faces you can see on ThisPersonDoesNotExist. And here are some tips on how to spot fake face photos…

More general image synthesis from text is still a bit ropey, at least in some of the repos I found, but some look okay: text-to-image. If you do get a grainy image, though, I wonder what happens if you then tidy it up using something like this? deep-image-prior.

If you can find a way of generating semantic image sketches, you can generate real images by letting a network fill in the detail: PhotographicImageSynthesis.

If you need text surrounding an image, there are lots of examples of generating tags from images, but how about then using that to generate more sentence-like captions? image2story.

Many of these packages can generate plausible looking data for a wide definition of data, although they won’t necessarily model the mess of the real data; (any mess you build in will be a model of messy data, but not necessarily a realistic one). This is something to bear in mind when testing. You should be particularly careful with how you use them if you are testing machine learning models against them, and expect weird things to happen if you make like Ouroboros and use them to train models…

Notebook Practice – Data Sci Ed, 101 With a Nod To TM351

One of the ways our Data management and analysis (TM351) course differs from the module it replaced, a traditional databases module, was the way in which we designed it to cover a full data pipeline, from data acquisition, through cleaning, management (including legal issues), analysis, visualisation and reporting. The database content takes up about a third of the course and covers key elements of relational and noSQL databases. Half the course content is reading, half is practical. To help frame the way we created the course, we imagined a particular persona: a postdoc researcher who had to manage everything data related in a research project.

Jupyter notebooks, delivered with PostgreSQL and MongoDB databases, along with OpenRefine, inside a VirtualBox virtual machine, provide the practical environment. Through the assessment model, two tutor marked assessments as continuous assessment, and a final project style assessment, we try to develop the idea that notebooks can be used to document small data investigations in a reproducible way.

The course has been running for several years now — the notebooks were stilled called IPython notebooks when we started — but literature we can use to post hoc rationalise the approach we’ve taken is now starting to appear…

For example, this preprint recently appeared on arXiv — Three principles of data science: predictability, computability, and stability (PCS),  Bin Yu, Karl Kumbier, arxiv:1901.08152 — and provides a description of how to use notebooks in a way that resembles the model we use in TM351:

We propose the following steps in a notebook

1. Domain problem formulation (narrative). Clearly state the real-world question one would like to answer and describe prior work related to this question.

2. Data collection and relevance to problem (narrative). Describe how the data were generated, including experimental design principles, and reasons why data is relevant to answer the domain question.

3. Data storage (narrative). Describe where data is stored and how it can be accessed by others.

4. Data cleaning and preprocessing (narrative, code, visualization). Describe steps taken to convert raw data into data used for analysis, and Why these preprocessing steps are justified. Ask whether more than one preprocessing methods should be used and examine their impacts on the final data results.

5. PCS inference (narrative, code, visualization). Carry out PCS inference in the context of the domain question. Specify appropriate model and data perturbations. If necessary, specify null hypothesis and associated perturbations (if applicable). Report and post-hoc analysis of data results.

6. Draw conclusions and/or make recommendations (narrative and visualization) in the context of domain problem.

As early work for a new course on datascience kicks off, a module I’m hoping will be notebook mediated, I’m thinking now would be a good time for us to revisit how we use the notebooks for teaching, and how we expect the students the use them for practice and assessment, and pull together some comprehensive notes on emerging notebook pedagogy from a distance HE perspective.

So How Do I Export Styled Pandas Tables From a Jupyter Notebook as a PNG or PDF File?

In order to render tweetable PNGs of my WRC rally stage chartables, I’ve been using selenium to render the table in its own web page and then grab a screenshot (Converting Pandas Generated HTML Data Tables to PNG Images), but that’s really clunky. So I started to wonder: are there any HTML2PNG converters out there?

This post has some handy pointers, including reference to the following packages:

  • HTML2Canvas, which allows you to take “[s]creenshots with JavaScript” and export them as PNG files;
  • TableExport, which seems to work with jspdf (“the leading HTML5 client solution for generating PDFs”) and jsPDF-AutoTable (a “jsPDF plugin for generating PDF tables with javascript”) to allow you to export an HTML table as a PDF.

The html2canvas route is also demonstrated in this Stack Overflow answer offered in answer to a query wondering how to Download table as PNG using JQuery.

So now I’m wondering a couple of things about styled pandas tables in Jupyter notebooks.

Firstly, would some magic __repr__ extensibilty that provides a button for exporting a styled pandas table as a PNG or a PDF be handy?

Secondly, how could the above recipes be woven into a simple end user developed ipywidgets powered app (such as the one I used to explore WRC stage charts) to provide a button to download a rendered, styled HTML table as a PNG or PDF?

Anyone know of such magic / extensions already out there? Or can knock up a demo, ideally a Binderised one,  showing me how to do it?

Coding for Graphics and a BBC House Style

Since I discovered the ggplot2 R graphics package, and read Leland Wilkinson’s The Grammar of Graphics book that underpins its design philosophy, it’s hugely influenced the way I think about the creation of custom graphics.

The separation of visual appearance from the underlying graphical model is very powerful. The web works in a similar way: the same HTML coded website can presented in a myriad different ways by the application of different CSS styles, without making any changes to the underlying HTML at all. Check out the CSS Zen Garden for over 200 examples of the same web content, presented differently using CSS styling.

A recent blog post by the BBC Visual and Data Journalism team — How the BBC Visual and Data Journalism team works with graphics in R — suggests they had a similar epiphany:

In March last year, we published our first chart made from start to finish using ggplot2.

Since then, change has been quick.

ggplot2 gives you far more control and creativity than a chart tool and allows you to go beyond a limited number of graphics. Working with scripts saves a huge amount of time and effort, in particular when working with data that needs updating regularly, with reproducibility a key requirement of our workflow.

In short, it was a game changer, so we quickly turned our attention to how best manage this newly-discovered power.

The approach they took was to create a cookbook in which to collect and curate useful recipes for creating particular types of graphic: you can find the BBC R Cookbook here.

The other other thing they did was create a BBC News ggplot2 house style: bbplot.

At the OU, where graphics production is still a cottage industry involving academics producing badly drawn sketches that get given to artists who return proper illustrations in a house style that need a tweak here or a tweak there communicated by the author making red pen annotations to printed out versions of the graphic etc etc, I keep on wondering why we don’t use powerful code based graphics packages for writing diagrams, and why we don’t have a house style designed for use with them.

PS by the by, via Downes, a post on The Secret Weapon to Learning CSS, that references another post on Teaching a Correct CSS Mental Model by asking what are “the mental patterns: ways to frame the problem in our heads, so we can break problems into their constituent parts and notice recurring patterns” that helps a CSS designer ply their craft. Which is to say: what are the key elements of The Grammar of Webstyle?