Prompted by a conversation with Rufus Pollock over lunch today, in part about data containerisation and the notion of “frictionless” data that can be easily discovered and is packaged along with metadata that helps you to import it into other tools or applications (such as a database), I’ve been confusing myself about what it might be like to have a frictionless data analysis working environment, where I could do something like write fda --datapackage http://example.com/DATAPACKAGE --db postgres --client rstudio ipynb and that would then:
- generate a fig script (eg as per something like Using Docker to Build Linked Container Course VMs);
- download the data package from the specified URL, unbundle it, create an SQL file to create an appropriate init file for the database specified, fire up the database and use the generated SQL file to configure the database by creating any necessary tables and loading the data in;
- fire up any specified client applications (IPython notebook and RStudio server in this example) and ideally seed them with SQL magic or database connection statements, for example, that automatically define an appropriate data connection to the database that’s just been configured;
- launch browser tabs that contain the clients;
- it might also be handy to be able to mount local directories against directory paths in the client applications, so I could have my R scripts in one directory of my own desktop, IPython notebooks in another, and then have visibility of those analysis scripts from the actual client applications.
The idea is that from a single command I can pull down a datafile, ingest it into a database, fire up one or more clients that are connected to that database, and start working with the data immediately. It’s not so different to double clicking on a file on your desktop and launching it into an application to start working on it, right?!
Can’t be that hard to wire up, surely?!;-) But would it be useful?
PS See also a further riff on this idea: Data Analysis Packages…?