Google’s Appetite for Training Data

A placeholder post – I’ll try to remember to add to this as and when I see examples of Google explicitly soliciting training data from users that can be used to train its AI models…

Locations – “Popular Times”


For example: Google Tracking How Busy Places are by Looking at Location Histories [SEO by the Sea] which also refers to a patent describing the following geo-intelligence technique: latency analysis.

A latency analysis system determines a latency period, such as a wait time, at a user destination. To determine the latency period, the latency analysis system receives location history from multiple user devices. With the location histories, the latency analysis system identifies points-of-interest that users have visited and determines the amount of time the user devices were at a point-of-interest. For example, the latency analysis system determines when a user device entered and exited a point-of-interest. Based on the elapsed time between entry and exit, the latency analysis system determines how long the user device was inside the point-of-interest.

Pondering a Jupyter Notebooks to WordPress Publishing Pattern: MultiMarker Map Widget

On my to do list for next year is to finally get round to doing something consistently with open data in an Isle of Wight context, probably on One of the things I particularly want to explore are customisable WordPress plugins that either source data from on online data source or that can be configured as part of an external publishing system.

For example, the following code, saved as MultiMarkerLeafletMap2.php and zipped up implements a WordPress plugin that can render an interactive leaflet map with clustered markers.


Plugin Name: MultiMarkerLeafletMap2
Description: Shortcode to render an interactive map displaying clustered markers. Markers to be added as JSON. Intended primarily to supported automated post creation. Inspired by folium python library and Google Maps v3 Shortcode multiple Markers WordPress plugin
Version: 1.0
Author: Tony Hirst

function MultiMarkerLeafletMap2_custom_styles() {

	wp_deregister_style( 'oi_css_map_leaflet' );
	wp_register_style( 'oi_css_map_leaflet', '//',false, '0.7.3' );
	wp_enqueue_style( 'oi_css_map_leaflet' );

	wp_deregister_style( 'oi_css_map_bootstrap' );
	wp_register_style( 'oi_css_map_bootstrap', '//', false, '3.2.0' );
	wp_enqueue_style( 'oi_css_map_bootstrap' );

	wp_enqueue_style( 'oi_css_map_bootstrap_theme');

	wp_enqueue_style( 'oi_css_map_fa');

	wp_enqueue_style( 'oi_css_map_lam');

	wp_enqueue_style( 'oi_css_map_lmcd');

	wp_enqueue_style( 'oi_css_map_lmc');

	wp_enqueue_style( 'oi_css_map_lar');


function MultiMarkerLeafletMap2_custom_scripts() {

	wp_deregister_script( 'oi_script_leaflet' );
	wp_register_script( 'oi_script_leaflet', '//',array('oi_script_jquery'),'0.7.3');
	wp_enqueue_script( 'oi_script_leaflet' );

	wp_deregister_script( 'oi_script_jquery' );
	wp_register_script( 'oi_script_jquery', '//', false, '1.11.3' );
	wp_enqueue_script( 'oi_script_jquery' );

	wp_deregister_script( 'oi_script_bootstrap' );
	wp_register_script( 'oi_script_bootstrap', '//',false,'3.2.0');
	wp_enqueue_script( 'oi_script_bootstrap' );

	wp_deregister_script( 'oi_script_lam' );
	wp_register_script( 'oi_script_lam', '//',array('oi_script_leaflet'),'2.0');
	wp_enqueue_script( 'oi_script_lam' );

	wp_deregister_script( 'oi_script_lmc' );
	wp_register_script( 'oi_script_lmc', '//',array('oi_script_leaflet'),'0.4.0');
	wp_enqueue_script( 'oi_script_lmc' );	

	wp_deregister_script( 'oi_script_lmcsrc' );
	wp_register_script( 'oi_script_lmcsrc', '//',array('oi_script_leaflet'),'0.4.0');
	wp_enqueue_script( 'oi_script_lmcsrc' );

add_action( 'wp_enqueue_scripts', 'MultiMarkerLeafletMap2_custom_scripts' );
add_action( 'wp_enqueue_scripts', 'MultiMarkerLeafletMap2_custom_styles' );

// Add items to header
add_action('wp_head', 'MultiMarkerLeafletMap2_header');
add_action('wp_head', 'MultiMarkerLeafletMap2_fix_css');

function MultiMarkerLeafletMap2_fix_css() {
	echo '<style type="text/css">#map {
      }</style>' . "\n";

function MultiMarkerLeafletMap2_header() {

function MultiMarkerLeafletMap2_call($attr) {
// Generate the map template

	// Default attributes - can be overwritten from shortcode
	$attr = shortcode_atts(array(
									'lat'   => '0',
									'lon'    => '0',
									'id' => 'oimap_1',
									'zoom' => '7',
									'width' => '600',
									'height' => '400',
									'type' => 'multimarker',
									), $attr);

	$html = '<div class="folium-map" id="'.$attr['id'].'" style="width: '. $attr['width'] .'px; height: '. $attr['height'] .'px"></div>
<script type="text/javascript">
      var base_tile = L.tileLayer("https://{s}{z}/{x}/{y}.png", {
          maxZoom: 18,
          minZoom: 1,
          attribution: "Map data (c) OpenStreetMap contributors -"

      var baseLayer = {
        "Base Layer": base_tile

      list of layers to be added
      var layer_list = {
      Bounding box.
      var southWest = L.latLng(-90, -180),
          northEast = L.latLng(90, 180),
          bounds = L.latLngBounds(southWest, northEast);

      Creates the map and adds the selected layers
      var map ="'.$attr['id'].'", {
                                       center:['.$attr['lat'].', '.$attr['lon'].'],
                                       zoom: '.$attr['zoom'].',
                                       maxBounds: bounds,
                                       layers: [base_tile]

      L.control.layers(baseLayer, layer_list).addTo(map);

      //cluster group
      var clusteredmarkers = L.markerClusterGroup();
      //section for adding clustered markers
	// Get our custom fields
	global $post;

	$premarkers=get_post_meta( $post->ID, 'markers', true );
	$markers = json_decode($premarkers,true);
	$legend = get_post_meta( $post->ID, 'maplegendtemplate', true );
	$legendkeys = get_post_meta( $post->ID, 'maplegendkeys', true );

	if (count($markers)>0){
		for ($i = 0;$i < count($markers);$i ++){
			foreach (explode(',', $legendkeys) as $k) {

			$html .='
			     var marker_'.$i.'_icon = L.AwesomeMarkers.icon({ icon: "info-sign",markerColor: "blue",prefix: "glyphicon",extraClasses: "fa-rotate-0"});
      var marker_'.$i.' = L.marker(['.$markers[$i]['lat'].','.$markers[$i]['lon'].'], {"icon":marker_'.$i.'_icon});
      marker_'.$i.'._popup.options.maxWidth = 300;


	$html .= '//add the clustered markers to the group anyway
	return $html;

<?php } add_shortcode('MultiMarkerLeafletMap2', 'MultiMarkerLeafletMap2_call'); ?>

Data is passed to the plugin embedded in a WordPress post via three custom fields associated with the post:

  • markers: a JSON list that contains information associated with each marker;
  • maplegendkeys: a comma separated list of key values that refers to keys in each marker object that are referenced when constructing the popup legend for each marker;
  • maplegendtemplate: a template that is used to construct each popup legend, of the form ‘Asset type: %typ% (%loc%)’, where the %VAR% elements identify key vales VAR associated with object attributes in the markers list.

In the set up I have, the post content – including the plugin code – is generated from a Python script running in a Jupyter notebook that can be posted using the following code fragment:

#!pip3 install python-wordpress-xmlrpc

from wordpress_xmlrpc import Client, WordPressPost
from wordpress_xmlrpc.compat import xmlrpc_client
from wordpress_xmlrpc.methods import media, posts
from wordpress_xmlrpc.methods.posts import NewPost

wpoi = Client(WORDPRESS_BLOG_URL+'/xmlrpc.php', 'robot1', WORDPRESS_API_KEY)

def wp_customPost(client,title='ping',content='pong, <em>pong<em>',custom={}):
    post = WordPressPost()
    post.title = title
    post.content = content
    post.custom_fields = []
    for c in custom:
            'key': c,
            'value': custom[c]
    response =
    return response

A list of objects is created from a pandas dataframe where each object contains the information associated with each marker – we limit the list to only include items for which we have latitude and longitude information:

def itemiser(row):
                     'lon': row['latlong'].split(',')[1],
                     'typ':row['Asset Type Description'],
                     'location':'{}, {}, {}'.format(row['Address 1'], row['Address 2'], row['Post Code']),
    return item

jd1=df[(df['latlong']!='') & (df['latlong'].notnull())].apply(itemiser,axis=1)

A post is then constructed that includes a reference to the plugin (as part of the text of the body of the post) and the data that is to be passed to the custom variables.

import json

#txt contains the content for the blog post
txt="[MultiMarkerLeafletMap2 zoom=11 lat=50.675 lon=-1.31 width=800 height=500]"
<div><em>Data produced by <a href="">Isle of WIght Council</a>.</div>

#jsondata contains the custom variable data that will be associated with the post
jsondata={'markers':json.dumps( jd1.tolist() ),
          'maplegendtemplate':'Asset type: %typ% (%loc%)<br/>Tenure: %tenure%<br/>%location%'}

wp_customPost(wpoi, "Properties on the Isle of Wight Council property register", txt, jsondata)

Simple Interactive View Controls for pandas DataFrames Using IPython Widgets in Jupyter Notebooks

A quick post to note a couple of tricks for generating simple interactive controls that let you manipulate the display of a pandas dataframe in a Jupyter notebook using IPython widgets.

Suppose we have a dataframe df, and we want to limit the display of rows to just the rows for which the value in a particular column matches a particular categorical value. We can create a drop down list containing the distinct/unique values contained within the column, and use this to control the display of the dataframe rows. Adding an “All” option allows us to display all the rows:


import ipywidgets as widgets
from ipywidgets import interactive

items = ['All']+sorted(df['Asset Type Description'].unique().tolist())

def view(x=''):
    if x=='All': return df
    return df[df['Asset Type Description']==x]

w = widgets.Select(options=items)
interactive(view, x=w)

We can also create multiple controller widgets and bind them into a single interactive display function. For example, we might have one control to filter rows according to the value contained within a particular column, and another widget limiting how many rows are displayed by creating two widgets and accessing them via an interactive ipywidgets construction call:


def view2(x='',y=3):
    if x=='All': return df.head(y)
    return df[df['Asset Type Description']==x].head(y)

a_slider = widgets.IntSlider(min=0, max=15, step=1, value=5)
b_select =  widgets.Select(options=items)

The attributes of a widget can also be set dynamically, even to the effect of setting the attribute values of one widget as some ultimate function of the value returned from another widget. For example, suppose we want to limit the rows displayed to range from 0 to the total number of rows associated with a particular “subselected” dataframe. Using the .observe() method applied to a widget, we can call a function whenever that widget is interacted with that acts on its new value:


c_slider = widgets.IntSlider(min=0, max=len(df), step=1, value=5)
d_select =  widgets.Select(options=items)

def update_c_range(*args):
    if d_select.value=='All':
        c_slider.max = len(df)
        c_slider.max = len(df[df['Asset Type Description']==d_select.value])

d_select.observe(update_c_range, 'value')

def view3(x='',y=3):
    if x=='All': return df.head(y)
    return df[df['Asset Type Description']==x].head(y)


This is Why We Need Info/Digital Literacy…

A couple of weeks ago, an interesting enough weekend piece in the Observer on how Google search results may not always give you the “factual” sort of result you might expect: Google is not ‘just’ a platform. It frames, shapes and distorts how we see the world.

This weekend just gone, an absolute piece of tosh: How to bump Holocaust deniers off Google’s top spot? Pay Google.

Let me summarise for you: “How to bump legitimate news headlines from the front page – pay the Observer”


Fragments – Should Algorithms, Deep Learning AI Models and/or Robots be Treated as Employees?

Ever ones for pushing their luck when it comes to respecting codes, regulations, and maybe even the law, Uber hit the news again last when it turned it out at least one of its automated vehicles was caught running a red light (Uber blames humans for self-driving car traffic offenses as California orders halt).

My first thought on this was to wonder who’s nominally in control of an automated vehicle of the vehicle itself detects a likely accident and quickly hands control over to a human driver (“Sh********t… you-drive”), particularly if reaction times are such that even the most attentive human operator doesn’t have time to respond properly.

My second was to start pondering the agency associated with an algorithm, particularly a statistical one where the mapping from inputs to outputs is not necessarily known in advance but is based on an expectation that the model used by the algorithm will give an “appropriate” response based on the training and testing regime.

[This is a v quick and poorly researched post, so some of the references are first stabs as I go fishing… They could undoubtedly be improved upon… If you can point me to better readings (though many appear to be stuck in books), please add them to the comments…]

In the UK, companies can be registered as legal entities; as such, they can act as an employer and become “responsible” for the behaviour of their employees through vicarious liability ([In Brief] Vicarious Liability: The Liability of an Employer).

According to solicitor Liam Lane, writing for HR Magazine (Everything you need to know about vicarious liability):

Vicarious liability does not apply to all staff. As a general rule, a business can be vicariously liable for actions of employees but not actions of independent contractors such as consultants.

This distinction can become blurred for secondees and agency workers. In these situations, there are often two ‘candidates’ for vicarious liability: the business that provided the employee, and the business that received him or her. To resolve this, courts usually ask:

(i) which business controls how the employee carries out his or her work; and

(ii) which business the employee is more integrated into.

Similarly, vicarious liability does not apply to every wrongful act that an employee carries out. A business is only vicariously liable for actions that are sufficiently close to what the employee was employed to do.

The CPS guidance on corporate prosecutions suggests that “A corporate employer is vicariously liable for the acts of its employees and agents where a natural person would be similarly liable (Mousell Bros Ltd v London and North Western Railway Co [1917] 2 KB 836).”

Findlaw (and others) also explore limitations, or otherwise, to an employer’s liability by noting the distinction between an employee’s “frolics and detours”: A detour is a deviation from explicit instructions, but so related to the original instructions that the employer will still be held liable. A frolic on the other hand, is simply the employee acting in his or her own capacity rather than at the instruction of an employer.

Also on limitations to employer liability, a briefing note note from Gaby Hardwicke Solicitors (Briefing Note: Vicarious Liability and Discrimination) brings all sorts of issues to mind. Firstly, on scope:

Vicarious liability may arise not just where an employment contract exists but also where there is temporary deemed employment, provided either the employer has an element of control over how the “employee” carries out the work or where the “employee” is integrated into the host’s business. The employer can be vicariously liable for those seconded to it and for temporary workers supplied to it by an employment business. In Hawley v Luminar Leisure Ltd, a nightclub was found vicariously liable for serious injuries as a result of an assault by a doorman, which it engaged via a third party security company.

The Equality Act 2010 widens the definition of employment for the purposes of discrimination claims so that the “employer” is liable for anything done by its agent, under its authority, whether within its knowledge or not. This would, therefore, confer liability for certain acts carried out by agents such as solicitors, HR consultants, accountants, etc.

In addition, the briefing describes how “Although vicarious liability is predominantly a common law concept, for the purposes of anti-discrimination law, it is enshrined in statute under section 109 Equality Act 2010. This states that anything that is done by an individual in the course of his employment, must be treated as also done by the employer, unless the employer can show that it took all reasonable steps to prevent the employee from doing that thing or from doing anything of that description.

So, I’m wondering… Companies act through their employees and agents. To what extent might we start to see “algorithms” and/or “computational models” (eg trained “Deep Learning”/neural networks models) starting to be treated as legal entities in their own right, at least in so far as they may be identified qua employees or agents when it comes to acting on behalf of a company. When one company licenses an algorithm/model to another, how will any liability be managed? Will algorithms and models start to have their own (employment) agents? Or are statistical models and algorithms (with parameters set) actually just patentable inventions, albeit with very specifically prescribed dimensions and attribute values?

In terms of liability, companies currently seem keen to try on wriggling around accountability by waving their hands in the air when an algorithm occurs. But when does the accountability lie, and where does the agency lie (in the sense that algorithms and models make (automated) decisions on behalf of their operators? Are there any precedents around ruling decision making algorithms as something akin to “employees” when it comes to liability? Or companies arguing for (or against) such claims? Can an algorithm be defined as a company, with its articles and objects enshrined in code, and if so, how is liability then limited as far as its directors are concerned?

I guess what I’m wondering is: are we going to see algorithms/models/robots becoming entities in law? Whether defined as their own class of legal entity, companies, employees, agents, or some other designation?

PS pointers to case law and examples much appreciated. Eg this sort of thing is maybe relevant? Search engine liability for autocomplete suggestions: personality, privacy and the power of the algorithm in a discussion of how an algorithm operator is liable for the actions of the algorithm?

Doing it Local… Maybe Next Year…

New Year coming up, so time to start mulling over a resolution or two that might actually make a difference*. One of the things I meant to do this year – but didn’t get round to – was working on, which I’d planned to start populating as a demonstrator site for local data led news stories generated from national datasets. My thought was if I could get into some sort of habit around that, I might actually get round to starting to build up a data driven wire service for hyperlocals and local monitoring groups (,, and were all purchased and parked for this…they’re still unpopulated…).

One plan I had for trying to sneak this project up on myself was to pick a data release every day (or at least, one a week on my 0.2FTE not-OU day, which keeps getting leeched away, somehow…) from the UK Gov daily “published statistics” feed and write a Jupyter notebook to start to explore it. Over the course of a year, I should have been able to get through a fair few datasets and start to return to them, and further work up ones I’ve visited before, as well as starting to build up some sort of longitudinal collection. (Here’s one false start on that around NHS datasets. Here’s another placeholder for some notebooks I was going to work up for OnTheWight before we fell out over openness!) Never really happened though..:-( On the other hand, I did start to play with company data again, courtesy of an invite from Global Witness to their “person’s of significant control” datadive, as well as a wondering about Trump, and I’m fired up to start playing with that data again. As the to local data stories and toolkits – maybe next year…

To that end, the presence of several other projects that look set to be ramping up next year may prompt me into action as a form of mild competition and “could I do that?” inspiration. One example is Will Perrin’s Local News Engine, another the Bureau of Investigative Journalism Local Data Lab, to be headed by Times data journalist Megan Lucero. (At the time of writing, it’s not too late to apply for a data journalist or data lab developer role. I’m not sure if they’re also open to speculative applications…? Hmmm….) Both of those projects are funded from the Google Digital News Initiative Innovation Fund, but I’m not sure what, if anything, that means…

My year should also be kickstarted (hopefully) energy level wise with a few days at the reproducible research using Jupyter Notebooks curriculum development hackathon. One of the things I’ll be interested in is the extent to which any curriculum – and resources produced for it – can also be used to support training initiatives around the use of reproducible scripts for national-to-local data wrangling notebooks for use by local journalists, watchdogs, researchers etc. (I suspect the user skill levels the workshop/hackathon will be focussing on are a skill level one or two up from a more amateur (and I use the word advisedly…) audience, but it’ll be interesting to see how accessible we can make things…)

This might also provide an opportunity for me to think about more about using “databoxes”, Raspberry Pi SD card images blown with all you need to get up and running immediately with a particular dataset. Think RPi runnable Infinite Intern SD cards

Also lined up (nearly… fingers crossed) is taking a more detailed look at Parliamentary open data, and how that can be used to support wider research and “holding to account”, as well as policy development. Whilst that will probably involve some amount of poking around in the data, seeing what’s there, and what can be done with it, it might also set the scene for rethinking how consultations and Parliamentary research briefings might work as informal learning resources requiring a critical read…

Hmmm… thinks again… there’s not a lot of 0.8 interest in there, is there…?

A change for me this year was starting to follow a band again, after 20 years off – though that wasn’t one of the resolutions last time round… Maybe finding some ways to start getting involved with promoting again should be on the list for next year…

Trump’s UK Company Holdings – And Concerns About Companies House Director Name Authority Files

A couple of days ago I had the briefest of looks at Companies House data to see what the extent of Trump’s declared (current) corporate roles are in the UK. Not many, it seems. Of the companies with which Trump has a declared officer interest, the list of co-directed companies in his UK empire seems small:


In my code as it currently stands, two directors are the same if they have the same director number according to Companies House records (I think! Need to check… can’t remember if I also added a fuzzy match…).

Unfortunately, Companies House has issues with name authority files (they need to talk to the librarians who’ve been grappling with the question of whether two people with the same, or almost the same, name are actually the same person for ages… “VIAF” is a good keyword to start on…). For example, I strongly suspect these are the same person, given that I found them by mining co-directed companies seeded on two separate Trump companies:

(u’qcmgW-bhHd3TT1MSNuqHIjWBxLI’, 1946, u’TRUMP, Donald J’)
(u’8WlV7G8p1ojhFks_i4ljYwW5WvI’, 1946, u’TRUMP, Donald John’)
(u’65Cc7HAVpXHqcLR_-CczJ80C724′, 1946, u’TRUMP, Donald’)

Or how about:

(u’sj7c-OeX84Ww_JJudaY_D-DZDm4′, 1981, u’TRUMP, Ivanka’)
(u’omdexC3tGVn8JnozQ9ZazJL_MT8′, 1981, u’TRUMP, Ivanka’)
(u’PCrNv-j3ABqrisHsT_PKL3yAlc0′, 1981, u’TRUMP, Ivanka’)

FWIW, Companies House seem to be increasingly of the opinion that month and year discriminators on birthday are plenty, and day doesn’t need to be publicly shared any more (if, indeed, it will still be collected). Occasional name/month/year collisions aside, this may be true (if you’re happy to accept the collisions). But until they sort their authority files out, and use a common director ID (reconcilable to a Person of Significant Control identifier from the PSC register) for the same person, they should be providing as much info as possible to help the rest of us reconcile director identifiers from their inconsistent data.

PS I started to doubt myself that Companies House at least attempts to use the same identifier for the same person, but here’s another example that I’m pretty sure refer to the same person… – note the first result associates 35 appointments with the name:


If you click the top link, you’ll see the appointment dates to the various companies are different, so it’s presumably not as if the commonality arises from the appointments all being declared on the same form. I’m not sure how Companies House reconciles directors, actually? Anyone know (let me know via the comments if you do…). For now, I assume it to be something like a (case insensitive?) exact string match on name, birthdate, and maybe correspondence address (or at least, a recognisable part of it)?

The following records, this time from Formula One co-directed companies, presumably relate to the same person (an accountant…):

(u’keWSNSl6V3Zg2FNV7vPy6BBVPVw’, 1968, u’LLOWARCH, Duncan Francis’)
(u’dzIMC8ot_A9rJThNdKQ5yQC-M3Y’, 1968, u’LLOWARCH, Duncan Francis’)
(u’m5FeeEsclwF0s57UkL2NcB6MIBk’, None, u’LLOWARCH, Duncan’)
(u’S9zuBVuv1LXtbR62_r-x9RzJzRE’, None, u’LLOWARCH, Duncan’)
(u’BTfAza-kduWKPnuUYPDd3w2i9fc’, None, u’LLOWARCH, Duncan’)
(u’3e8laCMUijwG6FdTnqGcDqMsXr4′, None, u’LLOWARCH, Duncan’)
(u’1Qgz-VCSMqjZZgyaibcvBAyGKUU’, None, u’LLOWARCH, Duncan’)