OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Mapping the Tesco Corporate Organisational Sprawl – An Initial Sketch

A quick sketch, prompted by Tesco Graph Hunting on OpenCorporates of how some of Tesco’s various corporate holdings are related based on director appointments and terminations:

The recipe is as follows:

- grab a list of companies that may be associated with “Tesco” by querying the OpenCorporates reconciliation API for tesco
– grab the filings for each of those companies
– trawl through the filings looking for director appointments or terminations
– store a row for each directorial appointment or termination including the company name and the director.

You can find the scraper here: Tesco Sprawl Grapher

import scraperwiki, simplejson,urllib

import networkx as nx

#Keep the API key [private - via http://blog.scraperwiki.com/2011/10/19/tweeting-the-drilling/
import os, cgi
try:
    qsenv = dict(cgi.parse_qsl(os.getenv("QUERY_STRING")))
    ockey=qsenv["OCKEY"]
except:
    ockey=''

rurl='http://opencorporates.com/reconcile/gb?query=tesco'
#note - the opencorporates api also offers a search:  companies/search
entities=simplejson.load(urllib.urlopen(rurl))

def getOCcompanyData(ocid):
    ocurl='http://api.opencorporates.com'+ocid+'/data'+'?api_token='+ockey
    ocdata=simplejson.load(urllib.urlopen(ocurl))
    return ocdata

#need to find a way of playing nice with the api, and not keep retrawling

def getOCfilingData(ocid):
    ocurl='http://api.opencorporates.com'+ocid+'/filings'+'?per_page=100&api_token='+ockey
    tmpdata=simplejson.load(urllib.urlopen(ocurl))
    ocdata=tmpdata['filings']
    print 'filings',ocid
    #print 'filings',ocid,ocdata
    #print 'filings 2',tmpdata
    while tmpdata['page']<tmpdata['total_pages']:
        page=str(tmpdata['page']+1)
        print '...another page',page,str(tmpdata["total_pages"]),str(tmpdata['page'])
        ocurl='http://api.opencorporates.com'+ocid+'/filings'+'?page='+page+'&per_page=100&api_token='+ockey
        tmpdata=simplejson.load(urllib.urlopen(ocurl))
        ocdata=ocdata+tmpdata['filings']
    return ocdata

def recordDirectorChange(ocname,ocid,ffiling,director):
    ddata={}
    ddata['ocname']=ocname
    ddata['ocid']=ocid
    ddata['fdesc']=ffiling["description"]
    ddata['fdirector']=director
    ddata['fdate']=ffiling["date"]
    ddata['fid']=ffiling["id"]
    ddata['ftyp']=ffiling["filing_type"]
    ddata['fcode']=ffiling["filing_code"]
    print 'ddata',ddata
    scraperwiki.sqlite.save(unique_keys=['fid'], table_name='directors', data=ddata)

def logDirectors(ocname,ocid,filings):
    print 'director filings',filings
    for filing in filings:
        if filing["filing"]["filing_type"]=="Appointment of director" or filing["filing"]["filing_code"]=="AP01":
            desc=filing["filing"]["description"]
            director=desc.replace('DIRECTOR APPOINTED ','')
            recordDirectorChange(ocname,ocid,filing['filing'],director)
        elif filing["filing"]["filing_type"]=="Termination of appointment of director" or filing["filing"]["filing_code"]=="TM01":
            desc=filing["filing"]["description"]
            director=desc.replace('APPOINTMENT TERMINATED, DIRECTOR ','')
            director=director.replace('APPOINTMENT TERMINATED, ','')
            recordDirectorChange(ocname,ocid,filing['filing'],director)

for entity in entities['result']:
    ocid=entity['id']
    ocname=entity['name']
    filings=getOCfilingData(ocid)
    logDirectors(ocname,ocid,filings)

The next step is to graph the result. I used a Scraperwiki view (Tesco sprawl demo graph) to generate a bipartite network connecting directors (either appointed or terminated) with companies and then published the result as a GEXF file that can be loaded directly into Gephi.

import scraperwiki
import urllib
import networkx as nx

import networkx.readwrite.gexf as gf

from xml.etree.cElementTree import tostring

scraperwiki.sqlite.attach( 'tesco_sprawl_grapher')
q = '* FROM "directors"'
data = scraperwiki.sqlite.select(q)

DG=nx.DiGraph()

directors=[]
companies=[]
for row in data:
    if row['fdirector'] not in directors:
        directors.append(row['fdirector'])
        DG.add_node(directors.index(row['fdirector']),label=row['fdirector'],name=row['fdirector'])
    if row['ocname'] not in companies:
        companies.append(row['ocname'])
        DG.add_node(row['ocid'],label=row['ocname'],name=row['ocname'])   
    DG.add_edge(directors.index(row['fdirector']),row['ocid'])

scraperwiki.utils.httpresponseheader("Content-Type", "text/xml")


writer=gf.GEXFWriter(encoding='utf-8',prettyprint=True,version='1.1draft')
writer.add_graph(DG)

print tostring(writer.xml)

Saving the output of the view as a gexf file means it can be loaded directly in to Gephi. (It would be handy if Gephi could load files in from a URL, methinks?) A version of the graph, laid out using a force directed layout, with nodes coloured according to modularity grouping, suggests some clustering of the companies. Note the parts of the whole graph are disconnected.

In the fragment below, we see Tesco Property Nominees are only losley linked to each other, and from the previous graphic, we see that Tesco Underwriting doesn’t share any recent director moves with any other companies that I trawled. (That said, the scraper did hit the OpenCorporates API limiter, so there may well be missing edges/data…)

And what is it with accountants naming companies after colours?! (It reminds me of sys admins naming servers after distilleries and Lord of the Rings characters!) Is there any sense in there, or is arbitrary?

Written by Tony Hirst

April 12, 2012 at 3:56 pm

8 Responses

Subscribe to comments with RSS.

  1. > And what is it with accountants naming companies after colours?

    Big Reservoir Dogs fans?

    PaulGeraghty

    April 12, 2012 at 7:53 pm

  2. [...] on from Mapping the Tesco Corporate Organisational Sprawl – An Initial Sketch, where I graphed relations between Tesco registered companies based on co-directorships, I also [...]

  3. [...] The recipe is as follows: – grab a list of companies that may be associated with “Tesco” by querying the OpenCorporates reconciliation API for tesco – grab the filings for each of those companies Mapping the Tesco Corporate Organisational Sprawl – An Initial Sketch « OUseful.Info, the blog… [...]

  4. [...] more information on these graphics, you can find a step-by-step explanation of how the corporate structure graph was made and a brief guide to the address graphic at [...]

  5. [...] as collated by OpenCorporates, to sketch out maps of how corporate sprawls are structured (Mapping the Tesco Corporate Organisational Sprawl – An Initial Sketch. Here’s a recap of the recipe I used: – grab a list of companies that may be associated with [...]

  6. [...] I don’t remember if I tried grabbing Pearson data from OpenCorporates to do a corporate sprawl graph..? [...]

  7. [...] is to say – I have some thoughts on how to address some of its many deficiencies) – Corporate Sprawl mapping – I ran a search on OpenCorporates for mentions of “Thames Water” and then [...]

  8. [...] do a search around a company with many corporate entities – Tesco, for example – and map out which entities were connected to which by virtue of common director names. Directors’ data is [...]


Comments are closed.

Follow

Get every new post delivered to your Inbox.

Join 784 other followers

%d bloggers like this: