Via several tweets today, a story in the Guardian declaring Robots threaten 15m UK jobs, says Bank of England’s chief economist:
The Bank of England has warned that up to 15m jobs in Britain are at risk of being lost to an age of robots where increasingly sophisticated machines do work that was previously the preserve of humans.
The original source appears to be a speech (“Labour’s Share”) given by Andrew G Haldane, Chief Economist of the Bank of England to the Trades Union Congress, London, 12 November 2015 and has bits and pieces in common with recent reports such as this one on The Future of Employment: how susceptible are jobs to computerisation? or this one asking Are Robots Taking Our Jobs, or Making Them?, or this on The new hire: How a new generation of robots is transforming manufacturing, or this collection of soundbites collected by Pew, or this report from a robotics advocacy group on the Positive Impact of
Industrial Robots on Employment. (Lots of consultancies and industry lobby groups seem to have been on the robot report bandwagon lately…) There’s also been a recent report that seems to have generated some buzz lately from Bank of America/Merrill Lynch on Creative Disruption, which also picks up on several trends in robotics.
But I wonder – is it robots replacing jobs through automating out, or robots replacing jobs by transferring work from the provider of a service or good directly on to the consumer, turning customers into unpaid employees? That is, what proportion of these robots actually self-service technologies (SSTs)? So for example, have you ever:
- used a self-service checkout in a supermarket rather than waiting in line for a cashier to scan your basketload of goods, let alone bought a bag of crisps or bottle of water from a (self-service) vending machine?
- used a self-service banking express machine or kiosk to pay in a cheque, let alone used an ATM to take cash out?
- used a self-service library kiosk to scan out a library book?
- used a self-service check-in kiosk or self-service luggage drop off in an airport?
- used a self-service ticket machine to buy a train ticket?
- collected goods from a (self-service) Amazon locker?
- commented in a “social learning” course to support a fellow learner?
- etc etc
Who’s taken the
jobwork there? If you scan it yourself, you’re an unpaid employee…
Via a couple of tweets, it seems that 1-click launching of runnable docker container compositions to the cloud is almost possible with Tutum – deploy to Tutum button [h/t @borja_burgos] – with collections of ready–to-go compositions (or in Tutum parlance, stacks) available on stackfiles.io [h/t @tutumcloud].
The deploy to Tutum button is very much like the binder setup, with URLs taking the form:
The repository – such as a github repository – will look for tutum.yml, docker-compose.yml and fig.yml files (in that order) and pre-configure a Tutum stack dialogue with the information described in the file.
The stack can then be deployed to a one or more already running nodes.
The stackfiles.io site hosts a range of pre-defined configuration files that can be used with the deploy button, so in certain respects it acts much the same way as a the panamax directory (Panamax marketplace?)
One of the other things I learned about Tutum is that they have a container defined that can cope with load balancing: if you launch multiple container instances of the same docker image, you can load balance across them (tutum: load balancing a web service). At least one of the configurations on stackfiles.io (Load balancing a Web Service) seems to show how to script this.
One of the downsides of the load balancing, and indeed the deploy to Tutum recipe generally is that there doesn’t seem to be a way to ensure that server nodes on which to run the containers are available: presumably, you have to start these yourself?
What would be nice would be the ability to also specify an autoscaling rule that could be used to fire up at least one node on which to run a deployed stack? Autoscaling rules would also let you power up/power down server nodes depending on load, which could presumably keep the cost of running servers down to a minimum needed to actually service whatever load is being experienced. (I’m thinking of occasional, and relative low usage models, which are perhaps also slightly different from a normal web scaling model. For example, the ability to fire up a configuration of several instances of OpenRefine for a workshop, and have autoscaling cope with deploying additional containers (and if required, additional server nodes) depending on how many people turn up to the workshop or want to participate).)
There seems to be a discussion thread about autoscaling on the Tutum site, but I’m not sure there is actually a corresponding service offering? (Via @tutumcloud, there is a webhook triggered version: Tutum triggers.)
One final thing that niggles at me particularly in respect of personal application hosting is the ability to “stash” a copy of a container somewhere so that it can be reused later, rather than destroying containers after each use. (Sandstorm.io appears to have this sorted…) A major reason for doing this would be to preserve user files. I guess one way round it is to use a linked data container and then keep the server node containing that linked data container alive, in between rounds of destroying and starting up new application containers (application containers that link to the data container to store user files). The downside of this is that you need to keep a server alive – and keep paying for it.
What would be really handy would be the ability to “stash” a container in some cheap storage somewhere, and then retrieve that container each time someone wanted to run their application (this could be a linked data container, or it could be the application container itself, with files preserved locally inside it?) (Related: some of my earlier notes on how to share docker data containers.)
I’m not sure whether there are any string’n’glue models that might support this? (If you know of any, particularly if they work in a Tutum context, please let me know via the comments…)
After a fun chat with Jim Groom this morning – even after all these years, we’ve still never met in person – I thought I should get round to finishing off this post, modified slightly in light of today’s chat…
A couple of months ago, I signed up for some online webhosting from Reclaim Hosting (which I can heartily recommend:-), in part because I wanted to spend a bit of time hacking some #opendata related WordPress plugins (first attempt described here); and my hosting on WordPress-dot-com doesn’t allow much in the way of tech customisation…
Reclaim offers web hosting, which is to say: a place to park several domains of my own, host blogs of my own customisation, manage email associated with my domains, handle analytics and logging, and publish a variety of other web style applications of my own choosing.
This is great for web hosting BUT the applications on offer are, in the main, applications associated with web stuff. As compared to applications associated with scientific, engineering, or digital humanities coursework, for example; (“scholarly apps”, perhaps?!) So for example, for OUr Data Management and Analysis (TM351) course, students will be running Jupyter notebooks, OpenRefine, MongoDB and PostgreSQL (I had hoped early on that RStudio might make it in there too, but that was over ambitious!;-) It’s not surprising that some of these apps also appear on the ResBaz Cloud.
Jupyter, OpenRefine and RStudio share the common feature of presenting graphical user interfaces via a browser. MongoDB and PostgreSQL, on the other hand, along with services like the Apache Tika Document Text Extraction Service, provide “headless” services via an http port. Which is to say – they work over the web, and, if appropriate, they can be accessed via a browser.
So here’s what I want, what I think I really, really want: an online application hosting provider. Reclaim lets me do the web social and web publishing stuff, but at the moment I can’t 1-click install my web-runnable course software there. Nor can I easily share my own “scholarly app” creations: for example, I could pay $9 a month to host shiny apps I’ve built in RStudio on ShinyApps.io, but if I just wanted to share a little something with my friends that I’d built on a course for a day or two, that would probably be overkill compared to hosting it briefly on my own site. If I’d built a cool Jupyter notebook and wanted to let you have a play with it, I could share the notebook file with you and you could then download it and run it on your own notebook server, assuming you know how to do that, but it might also be nice if you could 1-click launch an interactive version of it on my site. (Actually, there is a string’n’glue solution to this: I could pop the notebook onto github and then run it via binder.)
So looking around for bits of stick’n’string’n’glue that could perhaps be glued together to let me do this, what I quite like is to have my own online, course-app running StrinGLE (remember StringLE…? A learning environment, made from string’n’glue, where you could actually do stuff as well as be given stuff to read…).
On the one hand, the social webhosting side, I’d have my webhosting apps cPanel; on the other, to meet my course related scientific computing needs, I;d have something like Kitematic:
Note there may be some overlap in the applications. More what I’m thinking about are uses cases where the applications operate on a different timescale. The web hosting apps I start once and they run for ever. I want my blog to be there all the time, and I want my email to work all the time. The personal apps are more like applications that only run when I’m using them: RStudio, or a Jupyter notebook. That is, I start them/launch them when I want to use them, then shut them down when I’ve done, ideally persisting any files I’ve been working on somewhere until the next time I want to use the application. Containers are ideal for this because you can start them when you need them, then throw them away when your study session is done.
So that’s one take – a Kitematic complement to cPanel that lets me fire up applications for short term use, whilst persisting my files in a storage area part of my online hosting, which is perhaps even synched to something like Dropbox.
Here’s another take – imagine what this might mean…:
In this case, imagine that the binder button takes an image on a dockerhub and launches it via my web host. So the binder button takes me to a central clearing house where I have an authenticated account that I’ve configured with details of my web host. Clicking the binder button says to the binder server: “authenticated person X wants to run a container based on image Y”, and the binder server looks up my host details, and fires up a container there, with a URL as a subdomain of my domain.
I could imagine something like Tutum – recently acquired by docker – being able to support something like this: from Tutum, I can currently start up servers (droplets) on a third party host (I use Digital Ocean for this), and then deploy containers from dockerhub on those servers. At the moment it takes a few clicks in Tutum to set up the various settings and start the servers, but it could perhaps all be streamlined in to a few setup screens for the first time I launch a container application, and the parameters saved to a config file that could be used by default on future starts of the same application? So a tutum button, rather than a binder button, on dockerhub perhaps?
As to security, I think that running arbitrary containers fills IT folk with security dread, so it may make more sense to only support containers based on images held in a trusted repository, such as an institutional repository. This does put something of an application gatekeeper role back on the institution, but the institution could be a trusted commercial or community partner. (I wonder: is there support for trusted/signed docker images?)
As to how achievable this is – I wish I had time to explore and play with the Tutum API a little! In the meantime, Jim mentioned the rather intriguing sounding sandstorm.io:
What this seems to be is an app store where the apps are Linux virtual machines, packaged using vagrant…: Sandstorm application packaging.
From a quick peek, it seems that a Sandstorm application is a Linux image built up from a Sandstorm base image and a set of user defined shell scripts. (UPDATE: for a description of how the Sandstorm.io approach differs from docker, see Why doesn’t Sandstorm just run Docker apps?) Rather than running a single application within a single container, and then linking containers to make application compositions, it looks as if Sandstorm containers may run several applications that talk to each other within the container? State can also be persisted, so whilst application running containers are destroyed if you close a browser session running against the container, the state is recoverable if you launch another container from the same image. Which means that the Sandstorm folk have got the user-authentication thing sussed? (Sandstorm know I’m me. When I fire up a Jupyter container, they can link it to my stash of notebook files.) Hmm…
My TM351 VM build files are based on puppet – with a few shell scripts – orchestrated by vagrant. I wonder how hard it would be to create a version of the TM351 virtual machine that could be deployed via Sandstorm? Hmm…
PS Hmm.. it seems that a “Deploy to Tutum” button already exists (h/t @borja_burgos), though I’ve not had time to look at this properly yet… Exciting:-)
PPS and via @tutumcloud, stackfiles.io – a bit like Panamax compositions, deployable via Tutum… Thinks: so I should be able to do a stack for TM351 linked containers…:-)
A couple of news stories came to my attention today, one via a news stand, one via a BBC news report that – it turns out – was recapitulating an earlier story.
Both of them demonstrate how you the user of the online service are only a “valued customer” insofar as you help generate revenue.
In the first case, it seems that Amazon – doyens of good employment and tax practice, ahem – are going to start suing folk who publish fake reviews on the site.
Ah, bless… Amazon fighting on behalf of the consumer…
A more timely report – including a copy of the complaint – is posted on Geekwire.
Apparently, “[d]efendants are misleading Amazon’s customers and tarnishing Amazon’s brand for their own profit and the profit of a handful of dishonest sellers and manufacturers. Amazon is bringing this action to protect its customers from this misconduct, by stopping defendants and uprooting the ecosystem in which they participate.”.
It seems that each reviewer of a product has agreed to and is bound by the Conditions of Use of the Amazon site, so I guess that before posting a review, we should all be reading the Ts & Cs… Of course, Amazon ensures that folk using any of its websites have read – and understood – all the terms and conditions of the relevant site. Ahem, again… Cough, splutter, choke….
In case you’re interested, here are the US Terms and Conditions, under acceptance of which “you agree that the Federal Arbitration Act, applicable federal law, and the laws of the state of Washington” apply, presumably because even though Amazon is a Delaware corporation, (not that Delaware not being the most transparent of jurisdictions is likely to have anything to do with that?!) its principal place of business in Seattle, Washington. Sort of. In the UK, for example, which is to say the EU, it’s based in Luxembourg, presumably to help its tax position…? It must be nice being a big enough company to choose what jurisdiction to put what part of your business in, so you can, erm, maximise the benefits…
Although the current case is playing out against Amazon.com users, in case you’re interested, here are the UK Ts & Cs. Read and digest… Remember: you almost undoubtedly signed up to them… Ahem…
(Another by the by on partially related matters – it seems like if you work for Facebook in the UK, business has been good and you can expect a pretty good bonus, but if you’re a member of HM’s Inland Revenue, there’s not been that much business as far as tax is concerned (Facebook paid £4,327 corporation tax despite £35m staff bonuses). In passing, I wonder if Facebook pay the cleaning and ancillary staff that service Facebook’s UK premises a living wage, or whether we should be sleeping happy that soon enough these employees won’t be receiving as much “UK taxpayer’s money” in the form of (soon to be cut) working tax credits, that form of corporate welfare payment used to support companies at state expense, in lieu of them paying a reasonable wage (even leaving tax affairs aside)…) I’m not sure who’s taking the p**s more – international corporations or UK Gov…?
Here’s the root of the second story that caught my eye, in which it seems that successful online gamblers are deemed persona non grata if they’re no longer “revenue positive”, or whatever the phrase is.
So bear in mind that when companies collect your personal data, the benefit that they ultimately want to derive is to the company, not to you, the individual user. Sucker…
One of the blogs on my “must read” list is Bill Slawski’s SEO by the Sea, which regularly comments on a wide variety of search related patents, both recent and in the past, obtained by Google and what they might mean…
The US patent system is completely dysfunctional, of course, acting as way of preventing innovative competition in a way that I think probably wasn’t intended by its framers, but it does provide an insight into some of the crazy bar talk ideas that Silicon Valley types thought they might just go and try out on millions of people, or perhaps already are trying out.
As an example, here are a couple of recent patents from Facebook that recently crossed my radar.
Images uploaded by users of a social networking system are analyzed to determine signatures of cameras used to capture the images. A camera signature comprises features extracted from images that characterize the camera used for capturing the image, for example, faulty pixel positions in the camera and metadata available in files storing the images. Associations between users and cameras are inferred based on actions relating users with the cameras, for example, users uploading images, users being tagged in images captured with a camera, and the like. Associations between users of the social networking system related via cameras are inferred. These associations are used beneficially for the social networking system, for example, for recommending potential connections to a user, recommending events and groups to users, identifying multiple user accounts created by the same user, detecting fraudulent accounts, and determining affinity between users.
Which is to say: traces of the flaws in a particular camera that are passed through to each photograph are unique enough to uniquely identify that camera. (I note that academic research picked up on by Bruce Schneier demonstrated this getting on for a decade ago: Digital Cameras Have Unique Fingerprints.) So when a photo is uploaded to Facebook, Facebook can associate it with a particular camera. And by association with who’s uploading the photos, a particular camera, as identified by the camera signature baked into a photograph, can be associated with a particular person. Another form of participatory surveillance, methinks.
Note that this is different to the various camera settings that get baked into photograph metadata (you know, that “administrative” data stuff that folk would have you believe doesn’t really reveal anything about the content of a communication…). I’m not sure to what extent that data helps narrow down the identity of a particular camera, particularly when associated with other bits of info in a data mosaic, but it doesn’t take that many bits of data to uniquely identify a device. Like your web-browser’s settings, for example, that are revealed to webservers of sites you visit through browser metadata, and uniquely identify your browser. (See eg this paper from the EFF – How Unique Is Your Web Browser? [PDF] – and the associated test site: test your browser’s uniqueness.) And if your camera’s also a phone, there’ll be a wealth of other bits of metadata that let you associate camera with phone, and so on.
Facebook’s face recognition algorithms can also work out who’s in an image, so more relationships and associations there. If kids aren’t being taught about graph theory in school from a very young age, they should be… (So for example, here’s a nice story about what you can do with edges: SELECTION AND RANKING OF COMMENTS FOR PRESENTATION TO SOCIAL NETWORKING SYSTEM USERS. Here’s a completely impenetrable one: SYSTEMS, METHODS, AND APPARATUSES FOR IMPLEMENTING AN INTERFACE TO VIEW AND EXPLORE SOCIALLY RELEVANT CONCEPTS OF AN ENTITY GRAPH.)
Here’s another one – hinting at Facebook’s role as a future publisher:
An online publisher provides content items such as advertisements to users. To enable publishers to provide content items to users who meet targeting criteria of the content items, an exchange server aggregates data about the users. The exchange server receives user data from two or more sources, including a social networking system and one or more other service providers. To protect the user’s privacy, the social networking system and the service providers may provide the user data to the exchange server without identifying the user. The exchange server tracks each unique user of the social networking system and the service providers using a common identifier, enabling the exchange server to aggregate the users’ data. The exchange server then applies the aggregated user data to select content items for the users, either directly or via a publisher.
I don’t really see what’s clever about this – using an ad serving engine to serve content – even though Business Insider try to talk it up (Facebook just filed a fascinating patent that could seriously hurt Google’s ad revenue). I pondered something related to this way back when, but never really followed it through: Contextual Content Server, Courtesy of Google? (2008), Contextual Content Delivery on Higher Ed Websites Using Ad Servers (2010), or Using AdServers Across Networked Organisations (2014). Note also this remark on the the University of Bedfordshire using Google Banner Ads as On-Campus Signage (2011).
(By the by, I also note that Google has a complementary service where it makes content recommendations relating to content on your own site via AdSense widgets: Google Matched Content.)
PS not totally unrelated, perhaps, a recent essay by Bruce Schneier on the need to regulate the emerging automatic face recognition industry: Automatic Face Recognition and Surveillance.
…although I haven’t actually seen the letter yet, but our HoD announcement went round about the latest successful promotions earlier today, so I’m hoping that counts…!
I took considerable persuading to put a case in (again…) but thanks to everyone (particularly David, Karen and Mark, and Andy from attempts passed) who put the hours in improving on the multiple revisions of the case as it progressed through the OU promotions process and supporting me in the process – as well as those Deans and HoDs past who’ve allowed me to get away with what I’ve been doing over the last few years;-)
If anyone wants to see a copy of the case I made, I’m happy to let you have a copy…
Anyway… is this now the time to go traditional, stop blogging, and start working on getting the money in and preparing one or two journal publications a year for an academic readership in the tens?!
Two or three weeks ago, whilst in Cardiff, I noticed one of these things for the first time:
It’s counts the number of cyclists who pass by it and is a great example of the sort of thing that could perhaps be added to a “data walk”, along with several other examples of data revealing street furniture as described by Leigh Dodds in Data and information in the city.
It looks like could be made by a company called Falco – this Falco Cycle Counter CB650 (“[a]lready installed for Cardiff County Council as well as in Copenhagen and Nijmegen”)? (Falco also make another, cheaper one, the CB400.)
From the blurb:
The purpose of the Falco Cycle Counter is to show the number of cyclists on a bicycle path. It shows the number of cyclists per day and year. At the top of the Counter there is a clock indicating time and date. On the reverse it is possible to show city map or other information, alternatively for a two-way bicycle path it is possible to have display on both side of the unit. Already installed for Cardiff County Council as well as in Copenhagen and Nijmegen, three very strong cycling areas, the cycle counter is already proving to be an effective tool in managing cycle traffic.
As with many of these sorts of exhibit, it can phone home:
When configured as a Cycle Counter, the GTC can provide a number of functions depending on the configuration of the Counter. It is equipped with a modem for a SIM card use which provides a platform for mobile data to be exported to a central data collection system.
This makes possible a range of “on-… services”, for example: [g]enerates individual ‘buy-in’ from local people via a website and web feed plus optional Twitter RSS enabling them to follow the progress of their own counter personally.
I was reminded of this appliance (and should really have blogged it sooner) by a post today from Pete Warden – Semantic Sensors – in which he remarked on spotting an article about “people counters” in San Francisco that count passing foot traffic.
In that case, the counters seem to be provided by a company called Springboard who offer a range of counting services using a camera based counting system: a small counting device … mounted on either a building or lighting/CCTV column, a virtual zone is defined and pedestrians and cars who travel through the zone are recorded.
Visitor numbers are recorded using the very latest counting software based on “target specific tracking”. Data is audited each day by Springboard and uploaded daily to an internet server where it is permanently stored.
Target specific tracking software monitors flows by employing a wide range of characteristics to determine a target to identify and track.
Here’s an example of how it works:
As Pete Warden remarked, [t]raditionally we’ve always thought about cameras as devices to capture pictures for humans to watch. People counters only use images as an intermediate stage in their data pipeline, their real output is just the coordinates of nearby pedestrians.
He goes on:
Right now this is a very niche application, because the systems cost $2,100 each. What happens when something similar costs $2, or even 20 cents? And how about combining that price point with rapidly-improving computer vision, allowing far more information to be derived from images?
Those trends are why I think we’re going to see a lot of “Semantic Sensors” emerging. These will be tiny, cheap, all-in-one modules that capture raw noisy data from the real world, have built-in AI for analysis, and only output a few high-level signals.
For all of these applications, the images involved are just an implementation detail, they can be immediately discarded. From a systems view, they’re just black boxes that output data about the local environment.
Using cameras to count footfall appears to be nothing new – for example, the Leeds Data Mill openly publish Leeds City Centre footfall data collected by the council from “8 cameras located at various locations around the city centre [which monitor]numbers of people walking past. These cameras calculate numbers on an hourly basis”. I’ve also briefly mentioned several examples regarding the deployment of related technologies before, for example The Curse of Our Time – Tracking, Tracking Everywhere.
From my own local experience, it seems cameras are also being used (apparently) to gather evidence about possible “bad behaviour” by motorists. Out walking the dog recently, I noticed a camera I hadn’t spotted before:
It’s situated at the start of a hedged both sides footpath that runs along the road, although the mounting suggests that it doesn’t have a field of view down the path. Asking in the local shop, it seems as if the camera was mounted to investigate complaints of traffic accelerating off the mini-roundabout and cutting-up pedestrians about to use the zebra-crossing:
(I haven’t found any public consultations about mounting this camera, and should really ask a question, or even make an FOI request, to clarify by what process the decision was made to install this camera, when it was installed, for how long, for what purpose, and whether it could be used for other ancillary purposes.)
On a slightly different note, I also note from earlier this year that Amazon acquired internet of things platform operator 2lemetry, “an IoT version of Enterprise Application Integration (EAI) middleware solutions, providing device connectivity at scale, cross-communication, data brokering and storage”, apparently. In part, this made me think of an enterprise version of Pachube, as was (now Xively?).
So is Amazon going to pitch against Google (in the form of Nest), or maybe Apple, perhaps building “home things” services around a home hub server? After all, they have listening connected voice for your home already in the form of the voice controlled Amazon Echo (a bit like a standalone Siri, Cortana or Google Now). (Note to self: check out the Amazon Amazon Alexa Voice Services Developer Kit some time…)
As Pete Warden concluded, it seems obvious to me that machine vision is becoming a commodity“. What might we expect as and when listening and voice services also become a commodity?
Related: a recent article from the Guardian posing the question What happens when you ask to see CCTV footage?, as is your right by making a subject access request under the Data Protection Act, picks up on a recently posted paper by OU academic Keith Spiller: Experiences of accessing CCTV data: the urban topologies of subject access requests.