Pondering the Sunday Times Panama Papers directors/companies database yesterday (Panama Papers, Quick Start in SQLite3), I thought it was about time I got my head round using a graph database to store this sort of relational information.
Getting started with the neo4j database has been on my to do list for some time, so when pointed to a post on the neo4j about Analyzing the Panama Papers with Neo4j: Data Models, Queries & More, I thought I should give it a go.
So here’s a quick start to the first part – getting a working environment up and running. A quick search turned up a set of examples of how to get started using Neo4j using Jupyter notebooks by Nicole White, so I opted for a notebook/neo4j combination. The following docker-compose.yml file fires up a notebook server and a neo4j in separate docker containers and links them together:
neo4j: image: kbastani/docker-neo4j:latest ports: - "7474:7474" - "1337:1337" volumes: - /opt/data jupyterscipy: image: jupyter/scipy-notebook ports: - "8888:8888" links: - neo4j:neo4j volumes: - .:/home/jovyan/work
Launching the docker CLI from Kitematic, I can cd into the directory containing the docker-compose.yml file and run the command docker-compose up -d to download and launch the containers.
In the browser, launch a Jupyter terminal and pull down the example notebooks by running the command:
The notebooks will be downloaded in to the folder neo4j-jupyter. Still in the terminal, create a figure directory, as required for the hello-world.ipynb notebook.
mkdir -p neo4j-jupyter/figure
You’ll also need to install the py2neo python package. By default, this will be installed into the Python 3 path:
pip install py2neo
The example notebooks run in a Python 2 kernel, so we need to install the package into that environment too:
source activate python2
pip install py2neo
Now you should be able to run the example notebooks. One thing to note though – you will need to change the connection details for the neo4j database slightly, In the appropriate notebook code cells, change the default graph = Graph() connection to:
graph = Graph("http://neo4j:7474/db/data/")
I’ve run out of time to do any more just now. In the next post on this topic, I’ll see if I can work out how to get the Sunday Times data into neo4j…