Category: Tinkering

Using IPython on Lego EV3 Robots Running Ev3Dev

In part so I don’t lose the recipe, here are some notes for getting up and running with IPython on a Lego EV3 brick.

The command lines are prefixed to show whether we’re running them on the Mac or the brick…

To start with, we need to flash a microSD card with an image of the ev3dev operating system. The instructions are on the ev3dev site – Writing an SD Card Image Using Command Line Tools on OS X. – , but I repeat the ones for a Mac here for convenience.

  1. Download an image from the repository – I used the ev3dev-jessie-2015-12-30 image because the current development version is in a state of flux and the python bindings don’t necessarily work with it just at the moment…
  2. Assuming you have downloaded the file to your home Downloads directory (that is, ~/Downloads), launch a new terminal and run: cd Downloads
    [Mac] unzip ev3-ev3dev-jessie-2015-12-30.img.zip
    [Mac] diskutil list
  3. Put the microSD card (at least 2GB, but no more than 16GB, I think? I used a 4GB microSD (HC) card) into an SD card adapter and pop it into the Mac SD card reader. Check that it’s there and what it’s called; (you’re looking for the new disk…):
    [Mac] diskutil list
  4. Now unmount the new disk corresponding the the SD card: diskutil unmountDisk /dev/disk1s1Downloads_—_robot_ev3dev____—_ssh_—_112×25
    If you don’t see the /dev/desk1 listing, check that the write protect slider on your SD card holder isn’t in the write protect position.
  5. We’re going to write some raw bits to the card (/dev/disk1 in my example), so we need to write to /dev/rdisk1 (note the r before the disk name). The write will take some time – 5 minutes or so – but if you’re twitchy, ctrl-T will show you a progress message). You’ll also need to enter your Mac password. (NOTE: if you use the wrong disk name, you can completely trash your computer. So be careful;-)
    [Mac] sudo dd if=~Downloads/ev3-ev3dev-jessie-2015-12-30.img of=/dev/rdisk1 bs=4m
    GOTCHA: when I tried, I got a Permission Denied message at first. It seems that for some reason my Mac thought the SD card was write protected. On the SD card adapter is a small slider that sets the card to “locked” or “unlocked”. The SD card reader in the Mac is a mechanical switch that detects which state the slider is in and whether the card is locked. I don’t know if it was a problem with the card adapter or the Mac reader, but I took the card reader out of the Mac, changed the slider setting, put card reader back in, and did the unmount and then sudo dd steps again. It still didn’t work, so I took the card out again, put the slider back the other way, and tired again. This time it did work.
  6. Once the copy is complete, take the SD card adapter out, remove the microSD car and put in in the EV3 brick. Start the EV3 brick in the normal way.

Hopefully, you should the brick boot into the Brickman UI (it may take two or three minutes, include a period when the screen is blank and the orange lights are ticking for a bit…)

Navigate the brick settings to the Networks menu, select Wifi and scan for your home network. Connect to the network (the password settings will be saved, so you shouldn’t have to enter them again).

By default, the IP address of the brick should be set to 192.168.1.106. To make life easier, I set up passwordless ssh access to the brick. Using some guidance in part originally from here, I accessed the brick from my Mac terminal and set up an ssh folder. First, log in to the brick from the Mac terminal:

[Mac] ssh robot@192.168.1.106

When prompted, the password for the robot user is maker.

Downloads_—_robot_ev3dev____—_bash_—_112×25

This should log you in to the brick. Run the following command to create an ssh directory into which the shh key will be placed, and then exit the brick commandline to go back to the Mac command line.

[Brick] install -d -m 700 ~/.ssh
[Brick] exit

Create a new key on your Mac, and then copy it over to the brick:

[Mac] ssh-keygen -R 192.168.1.106
[Mac] cat ~/.ssh/id_rsa.pub | ssh robot@192.168.1.106 'cat > .ssh/authorized_keys'
You will be prompted again for the password – maker – but next time you try to log in to the brick, you should be able to do so without having to enter the password. (Instead, the ssh key will be used to get you in.)

Downloads_—_robot_ev3dev____—_bash_—_112×25_and_Edit_Post_‹_OUseful_Info__the_blog____—_WordPress

If you login to the brick again – from the Mac commandline, run:

[Mac] ssh robot@192.168.1.106

you should be able to run a simple Python test program. Attach a large motor to input A on the brick. From the brick commandline, run:

[Brick] python

to open up a Python command prompt, and then enter the following commands to use the preinstalled ev3dev-lang-python Python bindings to run the motor for a few seconds:

[Python] import ev3dev.ev3 as ev3
[Python] m = ev3.LargeMotor('outA')
[Python] m.run_timed(time_sp=3000, duty_cycle_sp=75)

Enter:

[Python] exit

to exit from the Python interpreter.

Now we’re going to install IPython. Run the following commands on the brick command line (update, but DO NOT upgrade the apt packages). If prompted for a password, it’s still maker:

[Brick] sudo apt-get update
[Brick] sudo apt-get install -y ipython

You should now be able to start an IPython interpreter on the brick:

[Brick] ipython

The Python code to test the motor should still work (hit return it you find you are stuck in a code block). Typing:

[Brick] exit

will take you out of the interpreter and back to the brick command line.

One of the nice things about IPython is that we can connect to it remotely. What this means is that I can start an IPython editor on my Mac, but connect it to an IPython process running on the brick. To do this, we need to install another package:

[Brick] sudo apt-get install -y python-zmq

Now you should be able to start an IPython process on the brick that we can connect to from the Mac:

[Brick] ipython kernel

The process will start running and you should see a message of the form:

To connect another client to this kernel, use:
--existing kernel-2716.json

This file contains connections information about the kernel.

Now open another terminal on the Mac, (cmd-T in the terminal window should open a new Terminal tab) and let’s see if we can find where that file is. Actually, here’s a crib – in the second terminal window, go into the brick:

[Mac] ssh robot@192.168.1.106

And then on the brick command line in this second terminal window, show a listing of a the directory that should contain the connection file:

[Brick] sudo ls /home/robot/.ipython/profile_default/security/

You should see the –existing kernel-2716.json file (or whatever it’s exactly called) there. Exit the brick command line:

[Brick] exit

Still in the second terminal window, and now back on the Mac command, copy the connection file from the brick to your current directory on the Mac:

[Mac] scp robot@192.168.1.106:/home/robot/.ipython/profile_default/security/kernel-2716.json ./

If you have IPython installed on you Mac, you should now be able to open an IPython interactive terminal on the Mac that is connected to the IPython kernel running on the brick:

[Mac] ipython console --existing ${PWD}/kernel-2716.json --ssh robot@192.168.1.106

You should be able to execute the Python motor test command as before (remember to import the necessary ev3dev.ev3 package first).

Actually, when I ran the ipython console command, the recent version of jupyter on my Mac gave me a depreaction warning which means I would have been better running:

[Mac] jupyter console --existing ${PWD}/kernel-2716.json --ssh robot@192.168.1.106

So far so good – can we do any more with this?

Well, yes, a couple of things. When starting the IPython process on the brick, we could force the name and location of the connection file:

[Mac] ipython kernel -f /home/robot/connection-file.json

Then on the Mac we could directly copy over the connection file:

[Mac] scp robot@192.168.1.106:/home/robot/connection-file.json ./

Secondly, we can configure a Jupyter notebook server running on the Mac to so that it will create a new IPython process on the brick for each new notebook.

Whilst you can configure this yourself, it’s possibly easy to make use of the remote_ikernel helper:

[Mac] pip3 install remote_ikernel
[Mac] remote_ikernel manage --add --kernel_cmd="ipython kernel -f {connection_file}" --name="Ev3dev" --interface=ssh --host=robot@192.168.1.106

Now when you should be able to connect to a notebook run against an IPython kernel on the brick.

Home-jupyter

Note that it may take a few seconds for the kernel to connect and the first cell to run – but from then on it should be quite responsive.

Untitled2-ipynb

To show a list of available kernels for a particular jupyter server, run the following in a Jupyter code cell:

import jupyter_client
jupyter_client.kernelspec.find_kernel_specs()

PS for ad hoc use, I thought it might be useful to try setting up a quick wifi router that I could use to connect the brick to my laptop in the form of an old Samsung Galaxy S3 android phone (without a SIM card). Installing the Hotspot Control app provided a quick way of doing this… and it worked:-)

PPS for a more recent version of IPython, install it from pip.

If you installed IPython using the apt-get route, uninstall it with:

[Brick] sudo apt-get uninstall ipython

Install pip and some handy supporting tools that pip may well require at some point:

[Brick] sudo apt-get install build-essential python-dev

Running:

[Brick] sudo apt-get install python-pip

would run an old version of pippip --version shows 1.5.6 – which could be removed using sudo apt-get remove python-setuptools.

To grab a more recent version, use:

[Brick] wget https://bootstrap.pypa.io/get-pip.py
[Brick] sudo -H python get-pip.py

which seems to take a long time to run with no obvious sign of progress, and then tidy up:

[Brick] rm get-pip.py

Just to be sure, then update it:

[Brick] sudo pip install --upgrade setuptools pip

which also seems to take forever. Then install IPython:

[Brick] sudo pip install -y ipython

I’m also going to see if I can give IPythonwidgets a go, although the install requirements looks as if it’ll bring down a huge chunk of the Jupyter framework too, and I’m not sure that’s all necessary?

[Brick] sudo pip install ipywidgets

For a lighter install, try sudo pip install --no-deps ipywidgets to install ipywidgets without dependencies. The only required dependencies are ipython>=4.0.0, ipykernel>=4.2.2 and traitlets>=4.2.0;.

The widgets didn’t seem to work at first, but it seems that following a recent update to the Jupyter server on host, it needed a config kick before running jupyter notebook:

jupyter nbextension enable --py --sys-prefix widgetsnbextension

PPS It seems to take a bit of time for the connection to the IPython server to be set up:

ajh59_—_Python_—_123×24

The Timeout occurs quite quickly, but then I have to wait a few dozen seconds for the tunnels on ports to be set up. Once this is done, you should be able to run things in a code cell. I usually start with print("Hello World!") ;-)

PPPS For plotting charts:

sudo apt-get install -y python-matplotlib

Could maybe even go the whole hog…

sudo apt-get install -y python-pandas
sudo pip install seaborn

PPPPS Here’s my current build file (under testing atm – it takes about an hour or so…) – ev3_ipy_build.sh, so:
[Mac] scp ev3_ipy_build.sh robot@192.168.1.106
[Brick] chmod +x ev3_ipy_build.sh
[Brick] sudo ./ev3_ipy_build.sh

sudo apt-get update
sudo apt-get install -y build-essential python-dev
sudo apt-get install -y python-zmq python-matplotlib python-pandas

wget https://bootstrap.pypa.io/get-pip.py
sudo -H python get-pip.py
rm get-pip.py
sudo pip install --upgrade setuptools pip

sudo pip install ipython ipykernel traitlets seaborn
sudo pip install --no-deps ipywidgets

PPPPPS to clone the SD card on a Mac, insert the SD card and run:

[Mac] diskutil list
[Mac] sudo dd if=/dev/disk1 of=~/Desktop/my_ev3dev_image.dmg

The corresponding restore (in the process described at the start of this post) would use /my_ev3dev_image.dmg rather than the ev3-ev3dev-jessie-2015-12-30.img image.

PPPPPPS Connecting to remote kernel on brick – start IPyhton kernel on brick:

[Brick] ipython kernel -f /home/robot/test.json

Copy the connection file over to the host:
[Mac] scp robot@192.168.1.106:/home/robot/test.json ./

Check the path you copied it to
[Mac] pwd

For me, that returned /Users/ajh59.

Start a console on the host using the existing connection file – use a full, explicit path to the file. Also works with things like Spyder:

[Mac] jupyter console --existing /Users/ajh59/test.json --ssh robot@192.168.1.106

spyder-ssh-ev3

Steps Towards Some Docker IPython Magic – Draft Magic to Call a Contentmine Container from a Jupyter Notebook Container

I haven’t written any magics for IPython before (and it probably shows!) but I started sketching out some magic for the Contentmine command-line container I described in Using Docker as a Personal Productivity Tool – Running Command Line Apps Bundled in Docker Containers,

What I’d like to explore is a more general way of calling command line functions accessed from arbitrary containers via a piece of generic magic, but I need to learn a few things along the way, such as handling arguments for a start!

The current approach provides crude magic for calling the contentmine functions included in a public contentmine container from a Jupyter notebook running inside a container. The commandline contentmine container is started from within the notebook contained and uses a volume-from the notebook container to pass files between the containers. The path to the directory mounted from the notebook is identified by a bit of jiggery pokery , as is the method for spotting what container the notebook is actually running in (I’m all ears if you know of a better way of doing either of these things?:-)

The magic has the form:

%getpapers /notebooks rhinocerous

to run the getpapers query (with fixed switch settings for now) and the search term rhinocerous; files are shared back from the contentmine container into the .notebooks folder of the Jupyter container.

Other functions include:

%norma /notebooks rhinocerous
%cmine /notebooks rhinocerous

These functions are applied to files in the same folder as was created by the search term (rhinocerous).

The magic needs updating so that it will also work in a Jupyter notebook that is not running within a container – this should simply be just of case of switching in a different directory path. The magics also need tweaking so we can pass parameters in. I’m not sure if more flexibility should also be allowed on specifying the path (we need to make sure that the paths for the mounted directories are the correct ones!)

What I’d like to work towards is some sort of line magic along the lines of:

%docker psychemedia/contentmine -mountdir /CALLING_CONTAINER_PATH -v ${MOUNTDIR}:/PATH COMMAND -ARGS etc

or cell magic:

%%docker psychemedia/contentmine -mountdir /CALLING_CONTAINER_PATH -v ${MOUNTDIR}:/PATH
COMMAND -ARGS etc
...
COMMAND -ARGS etc

Note that these go against the docker command line syntax – should they be closer to it?

The code, and a walked through demo, are included in the notebook available via this gist, which should also be embedded below.


Thinking Around the Edges – Lego EV3 Robots Running Remote Jupyter Kernels

I’ve been pondering the use of Lego EV3 robots on the OU TXR120 residential school again, and after a couple of chats with OU colleagues Jane Bromley (who just won a sci-fi film pitch competition run by New Scientist) and Jon Rosewell (who leads on our level 1 robotics offerings), here’s where I’m at on “visioning” it…?!;-)

The setting for the residential school is 8 groups of 3 students per group working with their own Lego EV3 robot to complete a set of challenges. Although we were hoping to have laptops from which the robots could be programmed wirelessly, it seems as if we might end up with wifi-less desktop machines (which will require tethering the robots to the desktop machine to programme them) with big screens, though I could be wrong.. and we may be able to work around that anyway (a quick trip to Maplin to buy some wifi dongles, for example, or some cables we can plug into a wifi routing switch, for example…) Having laptops would have been a boon for making the room a bit more flexible to work with (but many students have their own laptops…;-) and having a big screen that laptops could be mirror displayed against would possibly improve student experience.

The Lego bricks come with their own firmware, but as described in Pondering New Ways of Programming Lego EV3 Mindstorms Bricks we can also put Linux on them, and make use of a python wrapper library to actually programme the bricks using Python.

In what follows, I’ll refer to the EV3 brick as the client and a laptop or desktop computer that connects to it as the host.

One of the easiest ways to access the Python on the brick is to connect the brick to the network and then ssh in to it – for example (use the correct IP address for the brick):

ssh root@192.168.1.106

Each time you ssh in, you have to provide a password (r00tme is the default on my brick), but passwordless ssh can be set up which makes things quicker (you don’t have to enter the password).

To set up passwordless ssh onto the brick, you need to check that an .ssh folder is available in the home directory (~) on the brick for the ssh key. To do this, ssh in to EV3, cd ~ to ensure you’re at home you home, then ls -a to list the directory contents. If there is no .ssh directory, create one (this post suggests install -d -m 700 ~/.ssh).

If you don’t have an ssh key set up on the host machine you want to get passwordless ssh access from to the brick, follow the above link to find out how to create one. Once you do have ssh keys set up on host, run something along the lines of the following, using the IP address of your connected EV3, to copy the key across to the EV3:

cat ~/.ssh/id_rsa.pub | ssh root@192.168.1.106 'cat > .ssh/authorized_keys'

You will need to provide the password to execute this command (r00tme for the version of Ev3dev I’m running).

You should now be able to ssh in without a password: ssh root@192.168.1.106

Rather than prompting for the password, ssh will use the key provided to log you in.

In my previous post, I described how it was possible to run an old IPython notebook server on the brick that can expose notebooks over the network, although it was a little slow. It’s also possible to ssh in to the brick and run an IPython terminal on the brick:

ssh root@192.168.1.106
ipython

A third way I’d have expected to work is to access a remote IPython kernel on the brick from an IPython console on my laptop: ssh into the EV3 brick, launch an IPython kernel, and pick up the location of the connection file. For example, if the command:

ssh root@192.168.1.106
ipython kernel

to run an IPython kernel on the EV3 responds with something like:

To connect another client to this kernel, use:
    --existing kernel-1129.json

Back on the laptop I’d expect to be able to run:
scp root@192.168.1.106:/root/.ipython/profile_default/security/kernel-1129.json ./
to grab a local copy of the connection file from the brick and copy it over to my host machine (which works) and then use it as an existing connection for a Jupyter console on the laptop host:
jupyter console --existing ./kernel-1129.json --ssh server

But that doesn’t work for some reason? CRITICAL | Could not find existing kernel connection file ./kernel-1129.json

Whatever…

So far, so much noise – the important thing to take away from the above is to get the passwordless ssh set up, because it makes the following possible…

Recall the basic scenario – we want to run code on the brick from a computer. The bricks will run an IPython notebook, but it’s slow. Or we can ssh in to the brick, start an IPython process up, and run things from an IPython command line on the brick.

But what if we could run a Jupyter server on host, making notebooks available via a browser, but use a remote kernel running on the brick?

The remote-ikernel package makes setting up remote kernels that can be accessed from the Jupyter server relatively straightforward to do – install the package on your host machine (eg my laptop) and then run the remote-ikernel command with the settings required:

pip3 install remote_ikernel
remote_ikernel manage --add --kernel_cmd="ipython kernel -f {connection_file}" --name="Ev3dev" --interface=ssh --host=root@192.168.1.106

This creates a kernel.json file and places it in the correct location. (On my Mac this was in the directory /Users/MYUSER/Library/Jupyter/kernels/. )

The kernel.json file created for me (/Users/ajh59/Library/Jupyter/kernels/rik_ssh_root_192_168_1_106_ev3dev/kernel.json) was as follows:

{
  "argv": [
    "/usr/local/opt/python3/bin/python3.5",
    "-m",
    "remote_ikernel",
    "--interface",
    "ssh",
    "--host",
    "root@192.168.1.106",
    "--kernel_cmd",
    "ipython kernel -f {host_connection_file}",
    "{connection_file}"
  ],
  "display_name": "SSH root@192.168.1.106 Ev3dev",
  "remote_ikernel_argv": [
    "/usr/local/bin/remote_ikernel",
    "manage",
    "--add",
    "--kernel_cmd=ipython kernel -f {connection_file}",
    "--name=Ev3dev",
    "--interface=ssh",
    "--host=root@192.168.1.106"
  ]
}

Launching the notebook server using jupyter notebook in the normal way should result in the server picking up the remote kernel from the new kernel file.

To list the kernels available, I launched a Jupyter notebook with the normal (local) Python kernel and ran:

from jupyter_client.kernelspec import KernelSpecManager
list(KernelSpecManager().find_kernel_specs().keys())

My newly created on was there, albeit horribly named (the name was also the directory name the kernel.json file was created in): rik_ssh_root_192_168_1_106_ev3dev.

So here’s where we’re at now:

  • a single desktop or laptop computer with passwordless ssh access to a Lego EV3;
  • the desktop or laptop computer runs a Jupyter server;
  • Jupyter notebooks are available that will start up and run code on a remote IPython kernel on the brick.

Home1

Untitled

Where do we need to be? The most immediate need is to have something that works for residential school. This means we need 8 desktop computers and 8 EV3s.

One thing we’d need to do is find a way of marrying computers and EV3s. Specifically, we need to have known IP addresses associated with specific bricks (can we assign a known IP based on MAC address of the wifi dongle attached to the bricks?) and we need to make sure that passwordless ssh works between the computers and the EV3s. The easiest way of doing the latter would be to use the same ssh keypair in every machine, but this would mean student group A might be able to mistakenly connect to the Student Group B’s machine.

One way round it might be to set up a Jupyterhub server that is responsible for managing connections. The Jupyterhub server is a multiuser hub that will set up a Jupyter server for each logged in user. If we set up the Jupyterhub attached the same wifi network as the EV3 bricks, with 8 users, one per student group, each user/student group could then presumably be assigned to one of the bricks. Students would login to the Jupyterhub, and their Jupyter server would be associated with the remote_ikernel kernel.json file for one of the bricks. Anyone who can connect to the local wifi network can then login to the Jupyterhub as one of the group users, launch a notebook, and run code on one of the bricks. This means that students could write notebooks from their own laptops, connected to the network. Wifi-less desktop machines provided for the residential school could presumably be added into the network via a local ethernet cable?

So we’d have something like this:

robotlab_-_jupyter_pptx2

Does that make sense?

Trawling the Companies House API to Generate Co-Director Networks

Somewhen ago (it’s always somewhen ago; most of the world never seems to catch up with what’s already happened!:-( I started dabbling with the OpenCorporates API to generate co-director corporate maps that showed companies linked by multiple directors. It must have been a bad idea because no-one could see any point in it, not even interestingness…  (Which suggests to me that boards made up of directors are similarly meaningless? In which case, how are companies supposed to hold themselves to account?)

I tend to disagree. If I hadn’t been looking at connected companies around food processing firms, I would never have learned that one that meat processors cope with animal fat waste is to feed it into the biodiesel raw material supply chain.

Anyway, if we ever get to see a beneficial ownership register, a similar approach should work to generate maps showing how companies sharing beneficial owners are linked. (The same approach also drives my emergent social positioning Twitter maps and the Wikipedia semantic maps I posted about again recently.)

As a possible precursor to that, I thought I’d try to reimplement the code (in part to see if a better approach came to mind) using data grabbed directly from Companies House via their API. I’d already started dabbling with the API (Chat Sketches with the Companies House API) so it didn’t take much more to get a grapher going…

But first, I realise in that earlier post I’d missed the function for actually calling the API – so here it is:

import urllib2, base64, json
from urllib import urlencode
from time import sleep

def url_nice_req(url,t=300):
    try:
        return urllib2.urlopen(url)
    except HTTPError, e:
        if e.code == 429:
            print("Overloaded API, resting for a bit...")
            time.sleep(t)
            return url_req(url)

#Inspired by http://stackoverflow.com/a/2955687/454773
def ch_request(CH_API_TOKEN,url,args=None):
    if args is not None:
        url='{}?{}'.format(url,urlencode(args))
    request = urllib2.Request(url)
    # You need the replace to handle encodestring adding a trailing newline 
    # (https://docs.python.org/2/library/base64.html#base64.encodestring)
    base64string = base64.encodestring('%s:' % (CH_API_TOKEN)).replace('\n', '')
    request.add_header("Authorization", "Basic %s" % base64string)   
    result = url_nice_req(request)

    return json.loads(result.read())

CH_API_TOKEN='YOUR_API_TOKEN_FROM_COMPANIES_HOUSE'

In the original implementation, I stored the incremental search results in a dict; in the reimplementation, I thought I’d make use of a small SQLite database.

import sqlite3
db=None
memDB=":memory:"
tmpDB='example.db'
if db in locals():
    db.close()
    
db = sqlite3.connect(tmpDB)
c = db.cursor()

for drop in ['directorslite','companieslite','codirs','coredirs','singlecos']:
    c.execute('''drop table if exists {}'''.format(drop))
              
c.execute('''create table directorslite
         (dirnum text primary key,
          dirdob integer,
          dirname text)''')

c.execute('''create table companieslite
         (conum text primary key,
          costatus text,
          coname text)''')

c.execute('''create table codirs
         (conum text,
          dirnum text,
          typ text,
          status text)''')

c.execute('''create table coredirs
         (dirnum text)''')

c.execute('''create table singlecos
         (conum text,
          coname text)''')

cosdone=[]
cosparsed=[]
dirsdone=[]
dirsparsed=[]
codirsdone=[]

The code itself runs in two passes. The first pass builds up a seed set of directors from a single company or set of companies using a simple harvester:

def updateOnCo(seed,typ='current',role='director'):
    print('harvesting {}'.format(seed))
    
    #apiNice()
    o=ch_getCompanyOfficers(seed,typ=typ,role=role)['items']
    x=[{'dirnum':p['links']['officer']['appointments'].strip('/').split('/')[1],
          'dirdob':p['date_of_birth']['year'] if 'date_of_birth' in p else None,
          'dirname':p['name']} for p in o]
    z=[]
    for y in x:
        if y['dirnum'] not in dirsdone:
            z.append(y)
            dirsdone.append(y['dirnum'])
        if isinstance(z, dict): z=[z]
    print('Adding {} directors'.format(len(z)))
    c.executemany('INSERT INTO directorslite (dirnum, dirdob,dirname)'
                     'VALUES (:dirnum,:dirdob,:dirname)', z)
    for oo in [i for i in o if i['links']['officer']['appointments'].strip('/').split('/')[1] not in dirsparsed]:
        oid=oo['links']['officer']['appointments'].strip('/').split('/')[1]
        print('New director: {}'.format(oid))
        #apiNice()
        ooo=ch_getAppointments(oid,typ=typ,role=role)
        #apiNice()
        #Play nice with the api
        sleep(0.5)
        #add company details
        x=[{'conum':p['appointed_to']['company_number'],
          'costatus':p['appointed_to']['company_status'] if 'company_status' in p['appointed_to'] else '',
          'coname':p['appointed_to']['company_name'] if 'company_name' in p['appointed_to'] else ''} for p in ooo['items']]
        z=[]
        for y in x:
            if y['conum'] not in cosdone:
                z.append(y)
                cosdone.append(y['conum'])
        if isinstance(z, dict): z=[z]
        print('Adding {} companies'.format(len(z)))
        c.executemany('INSERT INTO companieslite (conum, costatus,coname)'
                     'VALUES (:conum,:costatus,:coname)', z)
        for i in x:cosdone.append(i['conum'])
        #add company director links
        dirnum=ooo['links']['self'].strip('/').split('/')[1]
        x=[{'conum':p['appointed_to']['company_number'],'dirnum':dirnum,
            'typ':'current','status':'director'} for p in ooo['items']]
        c.executemany('INSERT INTO codirs (conum, dirnum,typ,status)'
                     'VALUES (:conum,:dirnum,:typ,:status)', x)
        print('Adding {} company-directorships'.format(len(x)))
        dirsparsed.append(oid)
    cosparsed.append(seed)

The set of seed companies may be companies associated with one or more specified seed directors, for example:

def dirCoSeeds(dirseeds,typ='all',role='all'):
    ''' Find companies associated with dirseeds '''
    coseeds=[]
    for d in dirseeds:
        for c in ch_getAppointments(d,typ=typ,role=role)['items']:
            coseeds.append(c['appointed_to']['company_number'])
    return coseeds

dirseeds=[]
for d in ch_searchOfficers('Bernard Ecclestone',n=10,exact='forename')['items']:
    dirseeds.append(d['links']['self'])
    
coseeds=dirCoSeeds(dirseeds,typ='current',role='director')

Then I call a first pass of the co-directed companies search with the set of company seeds:

typ='current'
#Need to handle director or LLP Designated Member
role='all'
for seed in coseeds:
    updateOnCo(seed,typ=typ,role=role)
c.executemany('INSERT INTO coredirs (dirnum) VALUES (?)', [[d] for d in dirsparsed])

seeder_roles=['Finance Director']
#for dirs in seeded_cos, if dir_role is in seeder_roles then do a second seeding based on their companies
#TO DO

depth=0

Then we go for a crawl for as many steps as required… The approach I’ve taken here is to search through the current database to find the companies heuristically defined as codirected, and then feed these back into the harvester.

seeder=True
oneDirSeed=True
#typ='current'
#role='director'
maxdepth=3
#relaxed=0
while depth<maxdepth:
    print('---------------\nFilling out level - {}...'.format(depth))
    if seeder and depth==0:
        #Another policy would be dive on all companies associated w/ dirs of seed
        #In which case set the above test to depth==0
        tofetch=[u[0] for u in c.execute(''' SELECT DISTINCT conum from codirs''')]
    else:
        duals=c.execute('''SELECT cd1.conum as c1,cd2.conum as c2, count(*) FROM codirs AS cd1
                        LEFT JOIN codirs AS cd2 
                        ON cd1.dirnum = cd2.dirnum AND cd1.dirnum
                        WHERE cd1.conum < cd2.conum GROUP BY c1,c2 HAVING COUNT(*)>1
                        ''')
        tofetch=[x for t in duals for x in t[:2]]
        #The above has some issues. eg only 1 director is required, and secretary IDs are unique to company
        #Maybe need to change logic so if two directors OR company just has one director?
        #if relaxed>0:
        #    print('Being relaxed {} at depth {}...'.format(relaxed,depth))
        #    duals=c.execute('''SELECT cd.conum as c1,cl.coname as cn, count(*) FROM codirs as cd JOIN companieslite as cl 
        #                 WHERE cd.conum= cl.conum GROUP BY c1,cn HAVING COUNT(*)=1
        #                ''')
        #    tofetch=tofetch+[x[0] for x in duals]
        #    relaxed=relaxed-1
    if depth==0 and oneDirSeed:
        #add in companies with a single director first time round
        sco=[]
        for u in c.execute('''SELECT DISTINCT cd.conum, cl.coname FROM codirs cd  JOIN companieslite cl ON
                                cd.conum=cl.conum'''):
            #apiNice()
            o=ch_getCompanyOfficers(u[0],typ=typ,role=role)
            if len(o['items'])==1 or u[0]in coseeds:
                sco.append({'conum':u[0],'coname':u[1]})
                tofetch.append(u[0])
        c.executemany('INSERT INTO singlecos (conum,coname) VALUES (:conum,:coname)', sco)
    #TO DO: Another stategy might to to try to find the Finance Director or other names role and seed from them?
    
    #Get undone companies
    print('To fetch: ',[u for u in tofetch if u not in cosparsed])
    for u in [x for x in tofetch if x not in cosparsed]:
            updateOnCo(u,typ=typ,role=role)
            cosparsed.append(u)
            #play nice
            #apiNice()
    depth=depth+1
    #Parse companies

To visualise the data, I opted for Gephi, which meant having to export the data. I started off with a simple CSV edgelist exporter:

data=c.execute('''SELECT cl1.coname as Source,cl2.coname as Target, count(*) FROM codirs AS cd1
                        LEFT JOIN codirs AS cd2 JOIN companieslite as cl1 JOIN companieslite as cl2
                        ON cd1.dirnum = cd2.dirnum and cd1.conum=cl1.conum and cd2.conum=cl2.conum
                        WHERE cd1.conum 1''')
import csv
with open('output1.csv', 'wb') as f:
    writer = csv.writer(f)
    writer.writerow(['Source', 'Target'])
    writer.writerows(data)
    
data= c.execute('''SELECT cl1.coname as c1,cl2.coname as c2 FROM codirs AS cd1
                        LEFT JOIN codirs AS cd2 JOIN singlecos as cl1 JOIN singlecos as cl2
                        ON cd1.dirnum = cd2.dirnum and cd1.conum=cl1.conum and cd2.conum=cl2.conum
                        WHERE cd1.conum &lt; cd2.conum''')
with open('output1.csv', 'ab') as f:
    writer = csv.writer(f)
    writer.writerows(data)

but soon changed that to a proper graph file export, based on a graph built around the codirected companies using the networkx package:

import networkx as nx

G=nx.Graph()

data=c.execute('''SELECT cl.conum as cid, cl.coname as cn, dl.dirnum as did, dl.dirname as dn
FROM codirs AS cd JOIN companieslite as cl JOIN directorslite as dl ON cd.dirnum = dl.dirnum and cd.conum=cl.conum ''')
for d in data:
    G.add_node(d[0], Label=d[1])
    G.add_node(d[2], Label=d[3])
    G.add_edge(d[0],d[2])
nx.write_gexf(G, "test.gexf")

I then load the graph file into Gephi to visualise the data.

Here’s an example of the sort of thing we can get out for a search seeded on companies associated with the Bernie Ecclestone who directs at least one F1 related company:

Gephi_0_9_1_-_Project_2

On the to do list is to automate this a little bit more by adding some netwrok statistics, and possibly a first pass layout, in the networkx step.

In terms of time required to collect the data, the ,a href=”https://developer.companieshouse.gov.uk/api/docs/index/gettingStarted/rateLimiting.html”>Companies House API is rate limited to allow 600 requests within a five minute period. Many company networks can be mapped within the 600 call limit, but even for larger networks, the trawl doesn’t take too long even if two or three rest periods are required.

Semantic Cartography – Mapping Dodgy Goth Bands With Common Members Using Wikipedia Data

Several years ago I did some doodles using the Gephi network visualiser Semantic Web Import plugin to sketch out how various sorts of thing (philosophers, music genres, programming languages) were related in Wikipedia (or at least, DBpedia, the semantic web derivative of Wikipedia). A couple of days ago, I started sketching some new queries in a Jupyter IPython notebook to generate a wider range of maps, using the networkx package to analyse the results locally, as well as building and export a graph that I could then visualise in Gephi.

The following bit of code provides a simple function for running a SPARQL query against a SPARQL endpoint, such as the DBpedia endpoint. It also accepts a set of prefix definitions for the query.

from SPARQLWrapper import SPARQLWrapper, JSON

#Add some helper functions
def runQuery(endpoint,prefix,q):
    ''' Run a SPARQL query with a declared prefix over a specified endpoint '''
    sparql = SPARQLWrapper(endpoint)
    sparql.setQuery(prefix+q)
    sparql.setReturnFormat(JSON)
    return sparql.query().convert()

endpoint='http://dbpedia.org/sparql'

prefix='''
prefix gephi:<http://gephi.org/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix dbp: <http://dbpedia.org/property/>
prefix dbr: <http://dbpedia.org/resource/>
prefix dbc: <http://dbpedia.org/resource/Category:>
prefix dct: <http://purl.org/dc/terms/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix yago: <http://dbpedia.org/class/yago/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
'''

Here’s an example of the style of query I explored a few years ago – it identifies a thing that’s a band in a particular genre, and then tries to find other genres associated with that band. Each combination of genres adds an edge to the resulting graph. The FILTER element makes sure that we make edges between different genres.

m='Gothic_rock'
q='''
SELECT DISTINCT ?a ?an ?b ?bn WHERE {{
?band dbp:genre dbr:{}.
?band <http://dbpedia.org/property/background> "group_or_band"@en.
?band dbp:genre ?a.
?band dbp:genre ?b.
?a dbp:name ?an.
?b dbp:name ?bn.
FILTER(?a != ?b && langMatches(lang(?an), "en")  && langMatches(lang(?bn), "en"))
}}'''.format(m)

r=runQuery(endpoint,prefix,q)

Another simple function takes the resulting edge list and creates a node labeled graph from it using the networkx library. We can then export a graph file from this network that can be visualised in Gephi. (On my to do list is using networkx to  calculate some simple network statistics and generate a first attempt at a preview layout automatically, rather than doing it by hand in Gephi, which is what I do at the moment…)

def nxGrapher_hack(response,config,typ='undirected'):
    ''' typ: forward | reverse | undirected'''
    if typ=='undirected':
        G = nx.Graph()
    else:
        G = nx.DiGraph()

    fr,fr_l=config['from']
    to,to_l=config['to']
    for r in response['results']['bindings']:
        G.add_node(r[fr]['value'], label=r[fr_l]['value'])
        G.add_node(r[to]['value'], label=r[to_l]['value'])
        if typ=='reverse':
            G.add_edge(r[to]['value'],r[fr]['value'])
        else:
            G.add_edge(r[fr]['value'],r[to]['value'])
    return G

G=nxGrapher_hack(r, {'from':('a','an'),'to':('b','bn')})
nx.write_gexf(G, "music_{}.gexf".format(m))

Here’s the sort of map/graph we can generate as a result:
Gephi_0_9_1_-_Project_1_-_Project_2

As well as genre information, we can look up information about band members, such as the current or previous members of a particular band*.

About__Wayne_Hussey

*Since generating the data files last night, and running them again today, a whole raft of bander membership details appear to have disappeared. WTF?! Now I remember another of the reasons I keep avoiding the semantic web – it’s as flakey as anything and you can never tell if the problem is yours, someone else’s or the result of an update (or downgrade) in the data!

What this means is that we can anchor a query on a band, and find the current or previous members. In the following snippet, the single braces (“{}”@en) are replaced by the value of the declared band name:

m="The Mission (band)"
q='''
SELECT DISTINCT ?a ?an ?b ?bn WHERE {{
?x <http://dbpedia.org/property/background> "group_or_band"@en.
?x rdfs:label "{}"@en.

?a <http://dbpedia.org/property/background> "group_or_band"@en.
?a rdfs:label ?an.

?b rdfs:label ?bn.
?b a dbo:Person.
{{?a dbp:pastMembers ?b.}} UNION
{{?a dbp:currentMembers ?b.}}.
{{?x dbp:pastMembers ?b.}} UNION
{{?x dbp:currentMembers ?b.}}

FILTER((lang(?an)=&amp;amp;quot;en&amp;amp;quot;) &amp;amp;amp;&amp;amp;amp; (lang(?bn)=&amp;amp;quot;en&amp;amp;quot;) &amp;amp;amp;&amp;amp;amp; !(STRSTARTS(?bn,&amp;amp;quot;List of&amp;amp;quot;)) &amp;amp;amp;&amp;amp;amp; !(STRSTARTS(?an,&amp;amp;quot;List of&amp;amp;quot;)))
}}'''.format(m)

r=runQuery(endpoint,prefix,q)

G=nxGrapher_hack(r, {'from':('a','an'),'to':('b','bn')})
nx.write_gexf(G, "band_{}.gexf".format(m))

A slight tweak to the code lets us replace the anchoring (that is, the search) around a single band name to a set of band names. This allows us to get the current and previous members of all the declared bands.

m=['The Mission (band)','The Cult','The Sisters of Mercy','Fields of the Nephilim','All_About_Eve_(band)']

p='''
?x rdfs:label "{}"@en.
'''

ms=''' UNION
'''.join(['{'+p.format(i)+'}' for i in m])

#In the query, replace ?x rdfs:label &amp;amp;quot;{}&amp;amp;quot;@en. with {}
#In the format method, replace m with ms

Rather than searching around one or more bands, we could instead hook into bands associated with a particular genre. Rather than anchoring around ?x rdfs:label "{}"@en, for example, use ?x dbp:genre dbr:{}. This then lets us generate views of the following form:

Gephi_0_9_1_-_Project_1

As well as mapping the territory around particular musical genres, we can also generate maps for other contexts, such as around particular art movements. For example:

m='Surrealism'
q='''
SELECT DISTINCT ?a ?an ?b ?bn WHERE {{
?movement dct:subject dbc:Art_movements.
?movement dct:subject dbc:{}.
?artist dbp:movement ?movement.
?artist dbp:movement ?a.
?artist dbp:movement ?b.
?a rdfs:label ?an.
?b rdfs:label ?bn.
FILTER(?a != ?b && (lang(?an)="en") && (lang(?bn)="en"))
}}'''.format(m)

r=runQuery(endpoint,prefix,q)
G=nxGrapher_hack(r, {'from':('a','an'),'to':('b','bn')})
nx.write_gexf(G, "art_{}.gexf".format(m))

Or we can tap into other ontologies to limit our searches, and generate a range of influence maps:

y='Artist109812338'
#Artist109812338
#Painter110391653
#Potter110460806
#Sculptor110566072
#Philosopher110423589
#PhilosophersOfLanguage
#PhilosophersOfMathematics
#PhilosophersOfMind
q='''
SELECT ?a ?an ?b ?bn WHERE {{
  ?a a yago:{typ} .
  ?b a yago:{typ} .
  ?a rdfs:label ?an.
  ?b rdfs:label ?bn.
  {{?a <http://dbpedia.org/ontology/influencedBy> ?b.}}
   UNION {{
  ?b <http://dbpedia.org/ontology/influenced> ?a.
  }}
  }}'''.format(typ=y)
r=runQuery(endpoint,prefix,q)
G=nxGrapher_hack(r, {'from':('a','an'),'to':('b','bn')},typ='forward')
nx.write_gexf(G, "influence_{}.gexf".format(y))

So why bother?

Here are several reasons: first, because it’s interesting/fun/recreational; secondly, it allows us to compare our own mental model of the wider context around a particular genre or movement with the Wikipedia version; thirdly, if we’re expert, it might allow us to spot gaps or errors in the Wikipedia data, and fix it; fourthly, these sorts of data collections are used to make recommendations to you, so it helps to get a feel for the sorts of things they can represent, the relations they claim exist, and the ways they can go wrong, so you trust the machines a little bit less, or are least, a little bit more informedly.

PS One of the reasons for grabbing the data using Python was because Gephi has recently undergone an update, and the extensions developed for the earlier version are still being migrated. However, checking today, I notice that the SemanticWebImport plugin has made it across, so it should be possible to run variants of the queries directly in Gephi. See the previous posts for examples.

Chat Sketches with the Companies House API, Before the F***kWit UKGov Sell It Off

Ranty title a gut reaction response to news that the Land Registry faces privatisation.

Sketching around similar ideas to my Slack/slash conversational autoresponder around the Parliament data platform API, I thought I’d have a quick play with the UK Companies House API, which provides a simple interface to company registration data, director information and disqualified director information.

Bulk downloads are available for company registration information (here’s a quick howto about working with it; I’ll post a howto showing how to work with it using a containerised database when I get a chance, but for now, here are some clues) and from the API developer forums it looks as if bulk director’s information is available by request.

Working with your own bulk copy of the data is preferable, because it means you can write your own arbitrarily complex queries over any or all of the columns. The Companies House API, on the other hand, gives you a weak search over company and directors names, and the ability to look up individual known records. To do any sort of processing, you need to grab a large amount of search data, and/or make lots of individual known item records to build you own local datastore, and then search or filter across that.

So for example, here’s the first fumblings of my own function for filtering down on a list of officers associated with a particular company based on director role or current status (which I called typ for some reason? Bah:-(:

def ch_getCompanyOfficers(cn,typ='all',role='all'):
    #typ: all, current, previous
    url=&quot;https://api.companieshouse.gov.uk/company/{}/officers&quot;.format(cn)
    co=ch_request(CH_API_TOKEN,url)
    if typ=='current':
        co['items']=[i for i in co['items'] if 'resigned_on' not in i]
        #should possibly check here that len(co['items'])==co['active_count'] ?
    elif typ=='previous':
        co['items']=[i for i in co['items'] if 'resigned_on' in i]
    if role!='all':
        co['items']=[i for i in co['items'] if role==i['officer_role']]
    return co

The next function runs a search over officers by name, but then also lets you filter down the responses to show just those directors who also match a particular search string as part of any company name they are associated with.

def ch_searchOfficers(q,n=50,start_index='',company=''):
    url= 'https://api.companieshouse.gov.uk/search/officers'
    properties={'q':q,'items_per_page':n,'start_index':start_index}
    o=ch_request(CH_API_TOKEN,url,properties)
    if company != '':
        for p in o['items']:
            p['items'] = [i for i in ch_getAppointments(p['links']['self'])['items'] if company.lower() in i['appointed_to']['company_name'].lower()]
        o['items'] = [i for i in o['items'] if len(i['items'])]
    return o

You get the gist, hopefully. Run a crude API call, and then filter down the result according to particular data properties contained within the search result.

Anyway, as far as the chatting goes, here’s what I’ve started playing around with…

First, let’s just ask what companies a director with a particular name is associated with.

Companies_House_API_Bot1

We can take this a bit further by filtering down on the directors associated with a particular company. (Actually, this is simplified now to call the reporting function simply as dirCompanies(c)).

Companies_House_API_Bot

Alternatively, we might try to narrow the search for directors associated with companies in a particular locality. (I’m still trying to get my head round the different logics of this, because companies as well as directors are associated with addresses. I really need to try some specific investigative tasks to get a better feel for how to tune this sort of filter…)

Companies_House_API_Bot2

I’ve also started trying to think around the currency of appointments, for example supporting the ability to filter down based on resigned appointments:

Companies_House_API_Bot3

Associated with this sort of query (in the sense of exploring the past) are filters that let us search around dissolved companies, for example:

Companies_House_API_Bot4

(I should probably also put some time filters in there, for example to search for companies that a particular person was a director of at a particular time…)

We can also search into the disqualified directors register. To try to reduce the sense of turning this into a fishing trip, searching by director name and then filtering by locality feels like it could be handy (though again, this needs some refinement on the way I apply the locality filter.)

Companies_House_API_Bot5

Next step on this particular task is to tidy these up a little bit more and then pop them into a Slack responder.

But there are also some other Companies House goodies to come…such as revisiting the notion of co-director based company maps.

 

Calling an OData Service From Python – UK Parliament Members Data Platform

Whilst having a quick play producing Slack bots and slash commands around the UK Parliament APIs, I noticed (again) that the Members data platform has an OData endpoint.

OData is a data protocol for querying online data services via HTTP requests although it never really seemed to have caught the popular imagination, possibly because Microsoft thought it up, possibly because it seems really fiddly to use…

I had a quick look around for Python client/handler for it, and the closest I came was the pyslet package. I’ve posted a notebook showing my investigations to date here: Handling the UK Parliament Members Data Platform OData Feed, but it seems really clunky and I’m not sure I’ve got it right! (There doesn’t seem to be a lot of tutorial support out there, either?)

Here’s an example of the sort of mess I got myself in:

UK_Parliament_api_test_and_members_data_platform_OData_service_test

To make the Parliament OData service more useful needs a higher level Python wrapper, I think, that abstracts a bit further and provides some function calls that make it a tad easier (and natural) to get at the data. Or maybe I need to step back, have a read of the OData blocks, properly get my head around the pyslet OData calls, and try again!