FutureLearn Data Doodles Notebook and a Reflection on unLearning Analytics

With LAK16 (this year’s Learning Analytics and Knowledge Conference upon us, not that I’m there, I thought I’d post an (updated) notebook showing some of my latest data sketches’n’doodles around the data made available to FutureLearn partners arising from courses they have presented on FutureLearn. You can find the notebook in the notebooks folder of this Github repository: psychemedia/futurelearnStatsSketches.

Recalling the takedown notice I received for posting “unauthorised” screenshots around some of the data from a FutureLearn course I’d worked on, the notebook doesn’t actually demonstrate the analysis of any data at all. Instead, it provides a set of recipes that can be applied to your own FutureLearn course data to try to help you make sense of it.

In contrast to many learning analytics approaches, where the focus is on building models of learners so that you can adaptively force them to do particular things to make your metrics look better

My thinking hasn’t really moved on that much since my original take on course analytics in 2007, or in a presentation I gave in 2008 (Course Analytics in Context presentation) and it can (still) be summarised pretty much as follows:

Insofar as we are producers of online course producers for delivery “at scale” (that is, to large numbers of learners), our first duty is to ensure that the course materials appear to be working. That is, we should regard the online materials in the same way as the publisher of any content focussed website, as pages that can be optimised in terms of their own performance against any expectations we place on them.

So, if a page has links on it, we should keep track of whether folk click on the link in the volumes we expect. If we expect a person to spend a certain amount of time on a page, we should be concerned if, en masse, they appear to be spending a much shorter longer period of time on the page. In short, we should be catering to the mass behaviour of the visitors, to try to ensure that the page appears to be delivering (albeit at a surface level) the sort of experience we expect for it for the majority of visitors. (Unless the page has been designed to target a very particular audience, in which case we need to segment our visitor stats to ensure that for that particular audience, the page meets out expectations of it in terms of crude user engagement metrics.) This is not about dynamically trying to manage the flow of users through the course materials, it’s about making sure the static site content is behaving as we expect it to.

This is possibly naive, and could be seen as showing a certain level of disinterest in users’ individual learning behaviours, but I think it reflects how we tend to write static materials. In the case of the OU, this tends to be with a very strong, single narrative line, almost as if the materials were presented as a set of short books. I suspect that writing material that is intended to be dynamically served up in response to an algorithm perceived model of the user needs to be authored differently using chunks that can be connected in ways that allow for multiple narrative pathways through them.

In certain respects, this is a a complementary approach to learning design where educators are encouraged in advance of writing a course to identify various sorts of structural activity that I suspect LD advocates would then like to then see being used as the template for an automated course production process; templated steps conforming to the structural design elements could then be dropped into the academic workflow for authors to fill out. (At the same time, my experience of authoring workflows for online material delivery is that they generally suck, despite my best efforts…. See also: here.)


The notebook is presented as a Jupyter notebook with code written using Python3. It requires pandas and seaborn but no other special libraries and should work on a variety of notebook hosting workbenches (see for example Seven Ways of Running IPython / Jupyter Notebooks). I’ve also tested it against the psychemedia/ou-tm351-pystack container image on Github, which is a clone of the Jupyter set-up we’re using in the current presentation of the OU course TM351 Data management and analysis. My original FutureLearn data analysis notebook only used techniques developed in the FutureLearn course Learn to Code for Data Analysis, but the current one goes a little bit further than that…

The notebook includes recipes that analyse all four FutureLearn data file types, both individually and in combination with each other. It also demonstrates a few interactive widgets. Aside from a few settings (identifying the location and name of the data files), and providing some key information such as course enrolment opening date and start date, and any distinguished steps (specific social activity or exercise steps, for example, that you may want to highlight), the analyses should all run themselves. (At least, they ran okay on the dataset I had access to. If you run them and get errors, please send me a copy of any error messages and/or fixes you come up with.) All the code is provided though, so if you want to edit or otherwise play with any of it, you are free to do so. The code is provided without warranty and may not actually do what I claim for it (though if you find any such examples, please let me know).

The notebook and code are licensed as attribution required works. I thought about an additional clause expressly forbidding commercial use or financial gain from the content by FutureLearn Ltd, but on reflection I thought that might get me into more grief than it was worth!;-) (It could also come over as perhaps a bit arrogant and I’m note sure the notebooks have anything that novel or interesting in them; they’re more of a travel diary that record my initial meanderings around the data, as well as a few sketches and doodles as I tried to work out how to wrangle the data.)

As I was about to post the notebooks, I happened to come a across report of a recent investigation on the financial flows in academic publishing (It’s time to stand up to greedy academic publishers) which raises questions about “the issue of how research is communicated in society … that cut to the heart of what academics do, and what academia is about”; this resonated with a couple of recent quotes from Downes that made me smile, an off-the-cuff remark from Martin Weller last week mooting whether there was – or wasn’t a book around the idea of guerrilla research as compared to digital scholarship, and the observation that I wasn’t the only person who took holiday and covered my own expenses to attend the OER conference in Edinburgh last week (though I am most grateful to the organisers for letting me in and giving me the opportunity to bounce a few geeky ideas around with Jim Groom, Brian Lamb, Grant Potter, Martin Hawksey and David Kernohan that I still need to think through. I need to start pondering the data driven stand-up and panel show games too…!:-)

Arising from that confused melee of ideas around what I guess is the economics of gonzo academia, I decided to a post a version of the notebook on Leanpub as the first part of a possible work in progress: Course Analytics – Wrangling FutureLearn Data With Python and R. (I’ve been pondering a version of the notebook recast as an R shiny app, a Jupyter dashboard, and an RMarkdown report, and I think that title will accommodate such ramblings under the same cover.) So if you find value in the notebook and feel as if you should pay for it, you now have an opportunity to do so. (Any monies generated will be used to cover costs of activities related to the topic of the work, along with the progression and dissemination of ideas related to it. Receipts and expenditure arising therefrom will be itemised in full in the repository.) And if you don’t think it’s worth anything, the book is flexibly priced with a starting price of free.

PS the notebook is a Jupyter notebook, as used in the OU/FutureLearn course Learn to Code for Data Analysis. My original FutureLearn data analysis notebook used only techniques developed in that course, although the current version uses a few more, including interactive widgets that let you analyse the data interactively within the notebook. If you need any further reasons as to why you should take the course, here’s a marketing pitch…