A recent post on the BBC News / Technology blog — Why Big Tech pays poor Kenyans to teach self-driving cars — describes how Kenyan knowledge workers spend 8 hour shifts creating machine learning training data for the likes of Google, Microsoft and VW as employees of Samasource, providers of “humans-in-the-loop to help you build quality ground truth training data for your natural language or computer vision algorithms”. (Seems like I missed Samasource when I blogged about these sorts of companies previously: Robot Workers?)
Now matter how little you pay people, they’re still expensive, so it’s better if you can get free labour. That’s what Captchas do. One of the tasks the Samasource people do is trace around meaningful objects that appear in an image and associate them with labels that describe the thing. “Car”, “bus”, “bicycle” and so on, but if you extract time and attention from folk browsing the web to do it for you, even better.
For example, I got captcha’d the other day hacking URLs on the Bloomberg website (sites often take umbrage and challenge you to prove you aren’t a robot if you do anything other than click links on their site, such as hacking URLs or using advanced search queries).
But by selecting things that appear in a grid, that’s not as good as tracing them surely? Well, it is if you run the test thousands of times, move the grid around a pixel at a time, and do some sums.
It’s much the same with lots of the sites and services you use “for free”. They’re not free to run of course, they may cost tens or even hundred of millions of dollars to put together and deliver, so someone has to pay. Ads cover some of it (the money there is advertising dollars in exchange for targeted audiences (Ad-Tech – A Great Way in To OSINT), and they are constructed by mining user data to find all the people who work in universities and look at pr0n on the bus for a bit of excitement, for example). Surveys have also been used as a ‘partial payment” mechanism (From Paywalls and Attention Walls to Data Disclosure Walls and Survey Walls).
Another partial payment mechanism is your time. For example, when your GPS app sends you on a weird route, it’s quite possibly using you as a guinea pig to see how effective that part of the route is at that time of day. It needs to learn somehow, right? Gives new meaning the rat run, doesn’t it? (Didn’t you think of yourself as a lab rat running a maze before?)
Which leads to a handful of things on my to read list…
First up, Exploring or Exploiting? Social and Ethical Implications of Autonomous Experimentation in AI, the abstract for which reads as follows:
In the field of computer science, large-scale experimentation on users is not new. However, driven by advances in artificial intelligence, novel autonomous systems for experimentation are emerging that raise complex, unanswered questions for the field. Some of these questions are computational, while others relate to the social and ethical implications of these systems. We see these normative questions as urgent because they pertain to critical infrastructure upon which large populations depend, such as transportation and healthcare. Although experimentation on widely used online platforms like Facebook has stoked controversy in recent years, the unique risks posed by autonomous experimentation have not received sufficient attention, even though such techniques are being trialled on a massive scale. In this paper, we identify several questions about the social and ethical implications of autonomous experimentation systems. These questions concern the design of such systems, their effects on users, and their resistance to some common mitigations.
Here’s how they set the scene:
Consider, for example, navigation services that are responsible for providing millions of users with real-time directions. Given the current traffic conditions, these services attempt to suggest optimal routes for drivers. Experimentation is likely a core part of suggesting optimal routes. This is because service providers often lack information about traffic conditions on those routes to which they have purposefully not directed drivers. To determine whether a previously slow route is still slow, these services will deliberately send some users along it.
As I said, on my to-read pile. I’ll try to pull out my own TL:DR nuggets in another post when I have the spare cycles to take it in properly.
Second up, Two Cheers for Corporate Experimentation: The A/B Illusion and the Virtues of Data-Driven Innovation, a much longer, footnoted piece which again I’m not in a mood to read right now… Maybe later…
Finally, a brief review article — What’s Behind Your Navigation App — which perhaps leads to more things to read…
I have such a backlog of half started posts, of which this is one… Normally, I’d have tried to complete it, but I’m losing stuff in the queue, so posting it as is means this bit is done and I may be more likely to get round to reading those papers and doing a part 2…