Sketching With OpenCorporates – Fragmentary Notes in Context of Thames Water Corporate Sprawl
The Observer newspaper today leads with news of how the UK’s water companies appear to be as, if not more, concerned with running tax efficient financial engines as they are maintaining the water and sewerage network. Using a recipe that’s probably run its course (which is to say – I have some thoughts on how to address some of its many deficiencies) – Corporate Sprawl mapping – I ran a search on OpenCorporates for mentions of “Thames Water” and then plotted the network of companies as connected by directors identified through director dealings also indexed by OpenCorporates:
With the release of the new version 0.2 of the OpenCorporates API, I notice that information regarding directors is now addressable, which means that we should be able to pivot from one company, to its directors, to other companies associated with that director…
To get a feel for what may be possible, let’s run a search on /Thames Water/, and then click through on one of the director links – we can see (through the search interface), records for individual corporate officers, along with sidebar links to similarly named officers (with different officer IDs):
(At this point, I don’t know the extent to which the API reconciles this individual references, if at all – I’m still working my way through the web search interface…)
Let’s assume for a moment that the similarly named individuals in the sidebar are the same as the person whose officer record we are looking at. We notice that as well as Thames Water companies, other companies are listed that would not be discovered by a simple search for /Thames Water/ – INNOVA PARK MANAGEMENT COMPANY LIMITED, for example. (Note that we can also see dates the directorial appointments were valid, which means we may be able to track the career of a particular director; FWIW, offering tools to support ordering directors by date of appointment, or using something resembling a Gantt chart layout, may help patterns jump out of this data…?)
Innova Park Management Company Ltd may or may not have anything to do with Thames Water of course, but there are a couple of bits of evidence we can look for to see whether it is likely that it is part of the Thames Water corporate sprawl using just OpenCorporates data: firstly, we might look to see if this company concurrently shares several directors with Thames Water companies; secondly, we might check its registered address:
(In this case, we also note that /Thames Water/ appears in the previous name of Innova Park Management Company Ltd (something I think that the OpenCorporates search does pick up on?).)
One of the things I’ve mentioned to Chris Taggart before is how geocoding company addresses might give us a good way into to finding colocated companies. One reason for why this might be useful is that it might be able to show how companies evolve through different times and yet remain registered at the same address. It also provides a possible way in to sprawl mapping if many of the sprawl companies are registered at the same address at the same time (though there may be other reasons for companies being registered at the same address: companies may be registered by an accountancy or legal firm, for example, that offers registered address services; or be co-located in a particular building. But for investigations, this may also be useful, for example in cases of tracking down small companies serviced by, erm, creative accountants…)
(By the by, this Google Refine/OpenRefine tutorial contains a cunning trick – geocode addresses using Google maps to get lat/long coordinates, then use a scatterplot facet to view lat/long grid and select rectangular regions within it – that is, it gives you an ad hoc spatial search function… very cunning;-)
Note to self: I think I’ve pondered this before – certainty factors around the extent to which two companies are part of the same sprawl, or two similarly named directors are the same person. Something along the lines of:
– corporate sprawl: certainty = f( number_of_shared_directors, shared_address, similar_words_in_company_name, ...)
– same person (X, Y): certainty related to number of companies both X and Y are directors of that share other directors, share same address, share similar company name.
If we quickly look at the new OpenCorporates API, we see that there are a couple of officers related called: GET officers/search and GET officers/:id.
Based on the above ‘note to self’, I wonder if it’d be useful to code up a recipe that takes an officer ID, fetches the name of the director, runs a name search, then tries to assign a likelihood that each person in the returned set of search results is the same as the person whose ID was supplied in the original lookup? This is the sort of thing that Google Refine reconciliation API services offer, so maybe this is already available via the OpenCorporates reconciliation API?
PS I use the phrase “corporate sprawl” to refer to a similar thing that OpenCorporate’s user-curated corporate_groupings refer to. One thing that interests me is extent to which we can build tools to automatically make suggestions about corporate_grouping membership.
PPS running the scraper, I noticed that Scraperwiki have a job opening for a “data scientist”…