Chasing the thought of Frictionless Data Analysis – Trying to Clarify My Thoughts, I wonder: how about if, in addition to the datapackage.json specification, there was a data analysis package or data analysis toolkit package specification? Perhaps the latter might be something that unpacks rather like the fig.yml file described in Using Docker to Build Linked Container Course VMs, and the former a combination of a datapackage and a data analysis toolkit package, that downloads a datapackage and opens it into a toolkit configuration specified by data analysis toolkit package. We’d perhaps also want to be able to define a set of data analysis scripts (data analysis script package???) relevant to working with a particular datapackage in the specified tools (for example, some baseline IPython notebooks or R/Rmd scripts?)

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

  1. John Tukey was here before, as usual. :-) See his Styles of Data Analysis (p 11 of “Modern Data Analysis” Academic Press: 1982.) Has a very provocative diagram showing “input” plugging into “automatic data expanders” plugging into other exotic Tukey coinages.

