Pondering the Sunday Times Panama Papers directors/companies database yesterday (Panama Papers, Quick Start in SQLite3), I thought it was about time I got my head round using a graph database to store this sort of relational information.
Getting started with the neo4j database has been on my to do list for some time, so when pointed to a post on the neo4j about Analyzing the Panama Papers with Neo4j: Data Models, Queries & More, I thought I should give it a go.
So here’s a quick start to the first part – getting a working environment up and running. A quick search turned up a set of examples of how to get started using Neo4j using Jupyter notebooks by Nicole White, so I opted for a notebook/neo4j combination. The following docker-compose.yml file fires up a notebook server and a neo4j in separate docker containers and links them together:
neo4j: image: kbastani/docker-neo4j:latest ports: - "7474:7474" - "1337:1337" volumes: - /opt/data jupyterscipy: image: jupyter/scipy-notebook ports: - "8888:8888" links: - neo4j:neo4j volumes: - .:/home/jovyan/work
Launching the docker CLI from Kitematic, I can cd into the directory containing the docker-compose.yml file and run the command docker-compose up -d to download and launch the containers.
In the browser, launch a Jupyter terminal and pull down the example notebooks by running the command:
git clone https://github.com/nicolewhite/neo4j-jupyter.git
The notebooks will be downloaded in to the folder neo4j-jupyter. Still in the terminal, create a figure directory, as required for the hello-world.ipynb notebook.
mkdir -p neo4j-jupyter/figure
You’ll also need to install the py2neo python package. By default, this will be installed into the Python 3 path:
pip install py2neo
The example notebooks run in a Python 2 kernel, so we need to install the package into that environment too:
source activate python2
pip install py2neo
Now you should be able to run the example notebooks. One thing to note though – you will need to change the connection details for the neo4j database slightly, In the appropriate notebook code cells, change the default graph = Graph() connection to:
graph = Graph("http://neo4j:7474/db/data/")
I’ve run out of time to do any more just now. In the next post on this topic, I’ll see if I can work out how to get the Sunday Times data into neo4j…
Really interesting. I’ll have to put it down for today because of a localhost connection error that I can’t figure out, but I’ll definitely come back to it :)
Monette5 – don’t use localhost – use the /neo4j/ alias to reach noe4j from the notebook (if that’s the issue) or the IP address assigned to the container and listed via eg Kitematic or something like /docker-machine env default/ to lookup the environment vars (including IP address) for the docker virtual machine if opening the notebook is the issue.
Thanks, I’ll try again tomorrow :)
git clone https://github.com/nicolewhite/neo4j-jupyter.git failed with below error
fatal: could not create work tree dir ‘neo4j-jupyter’.: Permission denied
I tried to run the same commend using sudo but I don’t know how to find password for joyvan user.
Would you help?
Do you have write permissions for the directory you are trying to clone into?