SQLite is a really handy file based database engine. When you put data into a database, it can make it easier to search; it also provides a great tool for looking for stories or story leads hidden in the data. So here are seven ways of getting started with SQLite.
Querying SQLite Databases Using Rich User Interfaces
Whilst SQLite
comes with it’s own command line client, many people will find the easiest way of getting started with querying SQLite databases is to use an application, either on the desktop or accessed via a browser based UI.
I think this project has stalled; not sure if it still works? -> franchise is a browser based UI that you can access via the cloud or run locally (code). If you have a SQLite database file (make sure the suffix is .sql
) you can upload it and explore it using the franchise application. (If you have a CSV or Excel data file, you can upload that too and it will add it to its own temporary SQLite database). Here’s a review: Asking Questions of CSV Data, Using SQL In the Browser, With Franchise.
In the browser, you can try the official sql.js demo GUI or SQLite Viewer.
If you prefer something on the desktop, the cross-platform sqlitebrowser [repo] might suit your needs.
Another great way of making use of SQLite is bring it alive using datasette
. A single command allows you to publish an interactive, browser based user interface to one or more databases, either on your own computer or via an online host such as Zeit Now, or Heroku. For example, I’ve popped up three databases I scrape together on Heroku and pointed my own URL at them (unfortunately, I tend to run out of free Heroku credits in the last week of the month at which point the site goes down!)
datasette
allows you to query the databases through a customisable webpage and supports a range of plugins. For example, the datasette-cluster-map
will detect latitude and longitude columns in a datatable and present the results using an interactive map. I gave it a spin with this map of UK Food Hygiene Ratings. You can find several other examples of datasettes published in the wild on the datasette wiki.
Finding Data: SQLite Databases in the Wild
Whilst tools such as datasette
are great for quickly getting started with querying a SQLite database, one obvious question that arises is: what SQLite database?
Once you start poking around, however, you can start to find examples of SQLite databases working quietly behind the scenes on you own computer. (Searching your computer for files with a .sqlite
suffix is one way of finding them!)
As a specific example, the Safari, Chrome and Firefox web browsers all keep track of your browser history using a SQLite database on your computer (this gist – dropmeaword/browser_history.md
– tells you where you can find the files. You can then simply query them using datasette
. On a Mac, I can simply run:
datasette ~/Library/Application\ Support/Google/Chrome/Default/History
and I can then start to query my browser history using the datasette
browser based UI. Here’s an ‘inspect your browser history’ tutorial to get you started…
Ingesting Data
Applications such as franchise allow you to upload a CSV or Excel datafile and automatically import it into a SQLite database so that it can be queried using SQLite. The datasette ecosystem also includes and application for uploading CSV files and ingesting them into a SQlite database: datasette publish.
Behind the scenes of that application is a Python command line utility called csvs-to-sqlite
. A simple command lets yoiu convert a CSV file to a SQLite database:
csvs-to-sqlite myfile.csv mydatabase.db
Whilst csvs-to-sqlite
focusses on the conversion of CSV files into a SQLite database, the more general sqlitebiter
command line utility can convert CSV, Excel, HTML tables (eg from a URL), JSON, Jupyter Notebooks, Markdown, TSV and Google-Sheets to a SQLite database file using a similar command format.
Using SQLite from the Commandline
Natively, SQLite comes with its own command line shell that allows you to connect to and query a SQLite database from the command line.
As well as command line tools for converting data contained in various file formats into a SQLite database, several command line tools embed that conversion within a command line interface that allows you convert a data file to an in-memory SQLite database and query it directly.
For example, using the csvsql
command from csvkit
:
csvsql --query "select * from iris as i join irismeta as m on (i.species = m.species)" examples/iris.csv examples/irismeta.csv
Or textql
:
Or the simply named q
:
q -H "SELECT COUNT(DISTINCT(uuid)) FROM ./clicks.csv"
Querying SQLite Databases from Programming Languages Such as Python and R
If you are developing your own data-centric reproducible research pipelines, it’s likely that you will be using a programming language such as R or the Python pandas library.
There are two main ways for using SQLite in this context. The first is to connect to the database from the programming language and then query it from within the language. For example, in R, you might use the RSQLite
package. In Python, you can connect to a SQLite database using the base sqlite3
package.
The other approach is to use SQLite as an in-memory database that provides a SQL query interface to a dataframe. In R, this can be achieved using the sqldf
package:
library(sqldf) sqldf("select * from iris limit 5")
In Python/pandas, you can use the pandasql
package:
from pandasql import sqldf, load_births pysqldf = lambda q: sqldf(q, globals()) births = load_births() print(pysqldf("SELECT * FROM births LIMIT 10;").head())
In many respects, sqldf
and pandasql
behave like programming language native versions of command-line utilities such as csvsql
, textql
and q
, although rather than importing a data file into a SQLite database so that it can be queried, they import the contents of the referenced dataframe.
pandas also provides native support for adding dataframes as tables to a connected SQLite database, as well as reading results from queries onto the database back into a dataframe. Once you start getting into the swing of putting data into a database, and then running JOINed queries over multiple tables, you’ll start to wonder why you spent so much time getting hassled by VLOOKUP
. As an example, here’s a way of Making a Simple Database to Act as a Lookup for the ONS Register of Geographic Codes.
Using SQLite in Your Own Applications
If you are comfortable with using programming code to manipulate your data, then you may want to explore ways of using SQLite to create your own data driven applications.
One way to start is to use SQLite completely within the browser. Accessing desktop applications from a webpage is typically a no-no because of browser security restrictions, but SQLite is quite a light application, so it can – and has been – compiled to Javascript so that it can be imported as a Javascript library and run from within a webpage: sql.js. The aforementioned sql.js GUI and SQLite Viewer both use sql.js
under the hood…
As well as running SQLite in a browser, SQLite can also be used to power an API. One of the advantages of running a datasette
service is that it also exposes a datasette API. This means you can publish your datasette to a web host then other computers can querying it programmatically.
If you are working in a Python Jupyter environment, it’s simple enough to use the Jupyer kernel gateway to create your own APIs. Here’s an example of building a service to allow the lookup of ONS codes from a simple SQLite database: Building a JSON API Using Jupyter Notebooks in Under 5 Minutes.
Another way of using SQLite databases in a Jupyter environment is to use Scripted Forms
to For example, here’s one of my own recipes for Creating Simple Interactive Forms Using Python + Markdown Using ScriptedForms + Jupyter that shows how to create a simple interactive form for querying a SQLite database containing descriptions of images used in OpenLearn courses.
SQLite Database Admin Tools
As well as providing a simple explorer and query interface, the sqlitebrowser tool also supports a range of other SQLite database administration functions such as the ability to create, define, modify and delete tables and indexes, or edit, add and delete individual records.
The browser based sqlite-web
application provides a similar range of utilities via a browser based, rather than desktop client, UI.
Summary
SQLite is lightweight, in-memory and file based database that allows you to run SQL queries over a wide range of tabular datasets. If you work with data, knowing how to write even simple SQL queries can add a powerful tool to your toolbox. SQLite, and the associated tools created around it, is almost certainly one of the easiest ways in to using this most versatile, portable, and personal data power tool.
PS Here are some more ways of using SQLite:
- as a spatial database using [
SpatiaLite
](https://www.gaia-gis.it/fossil/libspatialite/index)