SQLite is a really handy file based database engine. When you put data into a database, it can make it easier to search; it also provides a great tool for looking for stories or story leads hidden in the data. So here are seven ways of getting started with SQLite.
Querying SQLite Databases Using Rich User Interfaces
SQLite comes with it’s own command line client, many people will find the easiest way of getting started with querying SQLite databases is to use an application, either on the desktop or accessed via a browser based UI.
I think this project has stalled; not sure if it still works? -> franchise is a browser based UI that you can access via the cloud or run locally (code). If you have a SQLite database file (make sure the suffix is
.sql) you can upload it and explore it using the franchise application. (If you have a CSV or Excel data file, you can upload that too and it will add it to its own temporary SQLite database). Here’s a review: Asking Questions of CSV Data, Using SQL In the Browser, With Franchise.
Another great way of making use of SQLite is bring it alive using
datasette. A single command allows you to publish an interactive, browser based user interface to one or more databases, either on your own computer or via an online host such as Zeit Now, or Heroku. For example, I’ve popped up three databases I scrape together on Heroku and pointed my own URL at them (unfortunately, I tend to run out of free Heroku credits in the last week of the month at which point the site goes down!)
datasette allows you to query the databases through a customisable webpage and supports a range of plugins. For example, the
datasette-cluster-map will detect latitude and longitude columns in a datatable and present the results using an interactive map. I gave it a spin with this map of UK Food Hygiene Ratings. You can find several other examples of datasettes published in the wild on the datasette wiki.
Finding Data: SQLite Databases in the Wild
Whilst tools such as
datasette are great for quickly getting started with querying a SQLite database, one obvious question that arises is: what SQLite database?
Once you start poking around, however, you can start to find examples of SQLite databases working quietly behind the scenes on you own computer. (Searching your computer for files with a
.sqlite suffix is one way of finding them!)
As a specific example, the Safari, Chrome and Firefox web browsers all keep track of your browser history using a SQLite database on your computer (this gist –
dropmeaword/browser_history.md – tells you where you can find the files. You can then simply query them using
datasette. On a Mac, I can simply run:
datasette ~/Library/Application\ Support/Google/Chrome/Default/History
and I can then start to query my browser history using the
datasette browser based UI. Here’s an ‘inspect your browser history’ tutorial to get you started…
Applications such as franchise allow you to upload a CSV or Excel datafile and automatically import it into a SQLite database so that it can be queried using SQLite. The datasette ecosystem also includes and application for uploading CSV files and ingesting them into a SQlite database: datasette publish.
Behind the scenes of that application is a Python command line utility called
csvs-to-sqlite. A simple command lets yoiu convert a CSV file to a SQLite database:
csvs-to-sqlite myfile.csv mydatabase.db
csvs-to-sqlite focusses on the conversion of CSV files into a SQLite database, the more general
sqlitebiter command line utility can convert CSV, Excel, HTML tables (eg from a URL), JSON, Jupyter Notebooks, Markdown, TSV and Google-Sheets to a SQLite database file using a similar command format.
Using SQLite from the Commandline
Natively, SQLite comes with its own command line shell that allows you to connect to and query a SQLite database from the command line.
As well as command line tools for converting data contained in various file formats into a SQLite database, several command line tools embed that conversion within a command line interface that allows you convert a data file to an in-memory SQLite database and query it directly.
For example, using the
csvsql command from
csvsql --query "select * from iris as i join irismeta as m on (i.species = m.species)" examples/iris.csv examples/irismeta.csv
Or the simply named
q -H "SELECT COUNT(DISTINCT(uuid)) FROM ./clicks.csv"
Querying SQLite Databases from Programming Languages Such as Python and R
If you are developing your own data-centric reproducible research pipelines, it’s likely that you will be using a programming language such as R or the Python pandas library.
There are two main ways for using SQLite in this context. The first is to connect to the database from the programming language and then query it from within the language. For example, in R, you might use the
RSQLite package. In Python, you can connect to a SQLite database using the base
The other approach is to use SQLite as an in-memory database that provides a SQL query interface to a dataframe. In R, this can be achieved using the
library(sqldf) sqldf("select * from iris limit 5")
In Python/pandas, you can use the
from pandasql import sqldf, load_births pysqldf = lambda q: sqldf(q, globals()) births = load_births() print(pysqldf("SELECT * FROM births LIMIT 10;").head())
In many respects,
pandasql behave like programming language native versions of command-line utilities such as
q, although rather than importing a data file into a SQLite database so that it can be queried, they import the contents of the referenced dataframe.
pandas also provides native support for adding dataframes as tables to a connected SQLite database, as well as reading results from queries onto the database back into a dataframe. Once you start getting into the swing of putting data into a database, and then running JOINed queries over multiple tables, you’ll start to wonder why you spent so much time getting hassled by
VLOOKUP. As an example, here’s a way of Making a Simple Database to Act as a Lookup for the ONS Register of Geographic Codes.
Using SQLite in Your Own Applications
If you are comfortable with using programming code to manipulate your data, then you may want to explore ways of using SQLite to create your own data driven applications.
sql.js under the hood…
As well as running SQLite in a browser, SQLite can also be used to power an API. One of the advantages of running a
datasette service is that it also exposes a datasette API. This means you can publish your datasette to a web host then other computers can querying it programmatically.
If you are working in a Python Jupyter environment, it’s simple enough to use the Jupyer kernel gateway to create your own APIs. Here’s an example of building a service to allow the lookup of ONS codes from a simple SQLite database: Building a JSON API Using Jupyter Notebooks in Under 5 Minutes.
Another way of using SQLite databases in a Jupyter environment is to use
Scripted Forms to For example, here’s one of my own recipes for Creating Simple Interactive Forms Using Python + Markdown Using ScriptedForms + Jupyter that shows how to create a simple interactive form for querying a SQLite database containing descriptions of images used in OpenLearn courses.
SQLite Database Admin Tools
As well as providing a simple explorer and query interface, the sqlitebrowser tool also supports a range of other SQLite database administration functions such as the ability to create, define, modify and delete tables and indexes, or edit, add and delete individual records.
The browser based
sqlite-web application provides a similar range of utilities via a browser based, rather than desktop client, UI.
SQLite is lightweight, in-memory and file based database that allows you to run SQL queries over a wide range of tabular datasets. If you work with data, knowing how to write even simple SQL queries can add a powerful tool to your toolbox. SQLite, and the associated tools created around it, is almost certainly one of the easiest ways in to using this most versatile, portable, and personal data power tool.
PS Here are some more ways of using SQLite:
- as a spatial database using [