Panama Papers, Quick Start in SQLite3

Via @Megan_Lucero, I notice that the Sunday Times data journalism team have published “a list of companies in Panama set up by Mossack Fonseca and its associates, as well the directors, shareholders and legal agents of those companies, as reported to the Panama companies registry”: Sunday Times Panama Papers Database.

Here’s a quick start to getting the data (which is available for download) into a form you can start to play with using SQLite3.

  • Download and install SQLite3
  • download the data from the Sunday Times and unzip it
  • on the command line/terminal, cd into the unzipped directory
  • create a new SQLite3 database: sqlite3 sundayTimesPanamaPapers.sqlite
  • you should now be presented with a SQLite console command line. Run the command: .mode csv
  • we’ll now create a table to put the data into: CREATE TABLE panama(company_url TEXT,company_name TEXT,officer_position_es TEXT,officer_position_en TEXT,officer_name TEXT,inc_date TEXT,dissolved_date TEXT,updated_date TEXT,company_type TEXT,mf_link TEXT);
  • We can now import the data – the header row will be included but this is quick’n’dirty, right? .import sunday_times_panama_data.csv panama
  • so let’s poke the data – preview the first few lines: SELECT * FROM panama LIMIT 5;
  • let’s see which officers are names the most: SELECT officer_name,COUNT(*) as c FROM panama GROUP BY officer_name ORDER BY c DESC LIMIT 10;
  • see what officer roles there are: SELECT DISTINCT officer_position_en FROM panama;
  • see what people have most : SELECT officer_name,officer_position_en, COUNT(*) as c FROM panama WHERE officer_position_en='Director/President' OR officer_position_en='President' GROUP BY officer_name,officer_position_en ORDER BY c DESC LIMIT 10;
  • exit SQLite console by running: .q
  • to start a new session from the command line: sqlite3 sundayTimesPanamaPapers.sqlite (you won’t need to load the data in again, you can get started with a SELECT straightaway).

sunday_times_panama_data_—_sqlite3_—_82×41_and_ajh59_—_jupyter_mac_command_—_bash_—_80×24

sunday_times_panama_data_—_sqlite3_—_82×41

Have fun…

PS FWIW, I’d consider the above to be a basic skill for anyone who calls themselves an information professional… Which includes the Library…;-) [To qualify that, here’s an example question: “I just found this data on the Panama Papers and want to see which people seemed to be directors of a lot of companies; can I do that?”]

7 comments

  1. gyges01 (@gyges01)

    Here’s a piece of R code that can be used to get the data:-

    temp <- tempfile()
    download.file(“https://cdn.rawgit.com/times/data/master/sunday_times_panama_data.zip”,temp)
    panama <- read.csv(unz(temp, “sunday_times_panama_data.csv”),stringsAsFactors = FALSE)
    unlink(temp)
    rm(temp)

    it first appeared on twitter but the above has some slight amendments.

  2. abe

    is there an easy way to cross reference with the names of Wikipedia entries (i.e. notable names)

    • Tony Hirst

      Abe

      Depends if you have a list of notable names? One way might be to run the names through a named entity recognition service (eg IBM Alchemy, Reuters OpenCalais, Azure Machine Learning).