SQL Databases in the Browser, via WASM: SQLite and DuckDB

Several years ago, in a post on Asking Questions of CSV Data, Using SQL In the Browser, I described a rather nifty application called franchise, (from the repo, it seems to have been deprecated for some time, or at least, is no longer actively maintained there). It worked as a general SQL database client, or could be used to manage and query a SQLite database powered by sql.js.

In this post, I’ll review a couple of WASM compiled databses that you can run purely within a browser: SQLite and Duck.db.

SQLite

Several ways of working with SQLite in the browser exist; the following does not claim to be a complete list, just a representative sample, and includes:

  • sql.js and sql.js-httpvfs;
  • SQLime SQLite Playground;
  • JupyterLite xeus-sqlite-kernel.

See also: Seven Ways of Making Use of SQLite.

sql.js

The sql.js package provides a WASM compiled version of SQLite that runs purely in the browser. Databases can be loaded from an uploaded file, or retrieved from a remote URL. If you want to try it out, there’s a simple demo page here:

Barebones sql.js UI

Applications moving off-the-server and into the browser is really handy in many educational contexts because it means:

  • there is no requirement on the user to install any software on their desktop: all they need is a web browser, a network connection to download the environment, and a powerful enough computer to run the application in the browser;
  • there is no requirement on the provider to provide a code execution server: a static web server is all that is required to provide the environment to the user.

A Brief Aside – Serving Webpages Locally Without a Webserver

For purely local running, there may be requirement for the the user to run a local webserver to serve the environment: if you double click an HTML file on your desktop and open it in a broweser with a URL starting file://, the browser may well throw CORS (cross-origin) security errors as it tries to load the page. Fortunately, applications such as servefolder.dev make it possible to run a webserver in your web browser to serve content held locally by uploading it to the browser and serving it from there:

Unfortunately, you can’t run that application locally from a file:// addressed location.

Fortunately, you can install the application as a Chrome app that does work when you are offline:

Install serverfolder.dev for offline running as a Chrome App

Essentially, Chrome will trust apps you have installed from the web into Chrome, but not things you want to run from you own desktop…

Serve folder app

For running simple applications, installing something like servefolder.dev as an app so you can run it offline is really handy when it comes to not requiring a user to run their own webserver. But there is another huge issue that is perhaps currently a blocker for using in-browser apps in an educational setting: if you edit content in the app – for example, a SQL query you spent ages crafting – you lose it when you close the web page: there is nowhere you can save it to and then reload it, nowhere you can persist it, unless you export it / downlad it to the desktop, and then import it / upload it from the desktop next time you run the application.

In the case of sql.js, the minimal demo UI does not persist the database, nor provide any means for saving and reusing queries: it is just a minimal, temporary UI. IndexedDB is a NoSQL storage solution that is supported inside contemporary browsers. Packages such as dexie.js provide a convenient wrapper around it and tools such as manics/jupyter-offlinenotebook can use it to persist items in the browser from otherwise transient web application sessions, such as MyBinder powered Jupyter notebook sessions. It would be handy if a simple template for creating sql.js apps with custom queries and database and query persistence using IndexedDB were available. If you know of such an example, please let me know via the comments.

sql.js-httpvfs

If the database you want to query is hosted via a web URL, sql.js can load the database from the URL. If the database is large, however, this could cause a delay as the database is downloaded, or knock your browser over completely if the database file is very large. Via @simonw, I learned of the phiresky/sql.js-httpvfs package, a fork of sql.js that “provide[s] a read-only HTTP-Range-request based virtual file system for SQLite”. This means that you can host an arbitrarily large SQLite database on a (static) file hoster and then query the database from the browser without needing to download the whole database. The techique used is described in Phiresky’s blog post Hosting SQLite databases on Github Pages (or any static file hoster) (for a full demo, use the netlify hosted version of the post).

The Phiresky blog post also includes a rather nifty web component for embedding, editing and executing SQL queries over the (remote) database, as well as displaying the results.

The results can be displayed in the context of that component, or embedded into the DOM (that is, anywhere in the web page) as inline content. This raises the intriguing prospect of having a web page is essentially a set of SQL queries over a remote database that are then rendered as the web page content.

Also notably, the source code for the original blog post is a markdown file that is rendered to HTML via a custom pandoc filter. I tried to make sense of the source code repository to see how easy it would be to just copy and paste the bits I’d need to run my own minimal demo of rendering a simple markdown file to HTML, with SQL editor components embedded, and querying my own online database, but it’s too web-dev uber-foo for me to be able to even rebuild from the source repository. (Again, if you know of, or create, a minimal example repo that shows how to build a Github Page from the repo that lets you query a database hosted in the repo using the above components – assuming there’s a license or implied license that allows their reuse – please let me know via the comments…)

In passing, I was also prompted to this fascinating interview with Richard Hipp, creator of SQLite, via @simonw. It’s well worth a listen.

SQLime SQLite Playground

Another way of exploring SQLite in the browser that does provide a means of persistence if you have a network connection available is the SQLime SQLite Playground [repo]:

This client can upload a locally available database file, or download and then connect to a database from a remote location:

SQLime “remote” database access is a download rather than a connect…

It would be interesting if SQLime wrapped phiresky/sql.js-httpvfs and also offered its remote query support, methinks (issue)…

File persistence is supported by connecting to Github:

If you then “share” a query, it is saved to a gist and you are provided with the gist link:

The content of the database and the query are then saved as a private gist:

This is obviously quite an expensive operation – each time you save a query to new gist you also save the contents of the database there – so it would be neater if you could alternatively just save the URL location of the database, or be able to support remote connections as per sql.js-httpvfs (issue).

JupyterLite xeus-sqlite-kernel

JupyterLite is Jupyter distribution that runs purely in the browser. One of the available kernels is the jupyterlite/xeus-sqlite-kernel (based on the jupyter-xeus/xeus-sqlite Jupyter kernel). Code cells in this kernel run SQL queries against an in-memory SQLite database, with additional magics defined to support loading of data and rendering of query results as Vega charts.

An example notebook demonstrates how databases can be created and queried.

Example RetroLab notebook running with a xeus-sqlite-kernel .

You can try a live example of the notebook running, via JupyterLite in the browser, here: https://jupyterlite.github.io/demo/retro/notebooks/?path=xeus-sqlite/simple-operations.ipynb

Currently, databases must be created and populated within the notebook. However, a repo issue comment claims a recent PR to the upstream xeus/xeus-sqlite supports magics for downloading a database from a URL (although I haven’t been able to get it to work as yet [issue]) and then opening a database connection to it, so this support will hopefully soon make its way into the xeus-sqlite-kernel at some point.

Notes are persisted in browser storage, and the previously mentioned PR looks like it will also allow the database to be persisted in storage too (though I’m not sure whether browser permissions will allow it to be accessed from other JupyterLite notebooks with a different filename/URL?)

DuckDB

DuckDB is available as a WebAssembly compiled application that can run purely in the browser (Chrome, Firefox, Safari) that claims to offer full support for Apache Arrow.

The intial release blog post – DuckDB-Wasm: Efficient Analytical SQL in the Browser – suggests that DuckDB “reads Parquet, CSV and JSON files from either your local filesystem or HTTP servers”, but whilst it was easy enough to figure out how to download a file from a web URL to my desktop from the DuckDB CLI (.files download $URL) I couldn’t work out how to open a file from my desktop (I thoought .files add might open a desktop file picker, but it just seems to hang the browser without any error messages I could see anywhere…). And whilst I had no trouble querying web-hosted parquet or CSV files, I didn’t manage to query a JSON file. (I also note that it would be hand to query a remote sqlite databse, especially one configured to best support efficient remote sql.js-httpvfs style connections).

The WASM client is fronted by a simple terminal, shell.duckdb.org. (I found that if I did anything wrong it would just hang and I had to reload the page; no ctrl-c etc to get out of trouble.)

DuckDB WASM terminal

The client appears to support queries onto remote files. For example, queries can be run over remotely hosted CSV files found randomly on Github:

The first query appears to download the whole file into memory, and the second query then essentially runs against that.

Queries can also be run over remote parquet hosted files, which I assume could be of an arbitrary size, without the need to download them in full:

Queries also seemed to run against arbitrary parquet data files I found on Github:

I couldn’t find a way to load or query a JSON file, though. Or persist and save and retrieve queries? If you know of a simple DuckDb tutorial that gives a complete set of worked CLI examples, please… let me know via the comments:-)

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

2 thoughts on “SQL Databases in the Browser, via WASM: SQLite and DuckDB”

Comments are closed.

%d bloggers like this: