Fragmentary Thoughts on Data (and “Analytics”) in Online Distance Education

A recent episode of TWiT Triangulation features Shoshana Zuboff, author of the newly released The Age of Surveillance Capitalism (which I’ve still to get, let alone read).

Watching the first ten minutes reminds me of Google’s early reluctance to engage in advertising. For example, in their 1998 paper The Anatomy of a Large-Scale Hypertextual Web Search Engine, Brin and Page (the founders of Google) write, in Appendix A of that paper, the following:

Currently, the predominant business model for commercial search engines is advertising. The goals of the advertising business model do not always correspond to providing quality search to users. For example, in our prototype search engine one of the top results for cellular phone is “The Effect of Cellular Phone Use Upon Driver Attention”, a study which explains in great detail the distractions and risk associated with conversing on a cell phone while driving. This search result came up first because of its high importance as judged by the PageRank algorithm, an approximation of citation importance on the web [Page, 98]. It is clear that a search engine which was taking money for showing cellular phone ads would have difficulty justifying the page that our system returned to its paying advertisers. For this type of reason and historical experience with other media [Bagdikian 83], we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.

Since it is very difficult even for experts to evaluate search engines, search engine bias is particularly insidious. A good example was OpenText, which was reported to be selling companies the right to be listed at the top of the search results for particular queries [Marchiori 97]. This type of bias is much more insidious than advertising, because it is not clear who “deserves” to be there, and who is willing to pay money to be listed. This business model resulted in an uproar, and OpenText has ceased to be a viable search engine. But less blatant bias are likely to be tolerated by the market. For example, a search engine could add a small factor to search results from “friendly” companies, and subtract a factor from results from competitors. This type of bias is very difficult to detect but could still have a significant effect on the market. Furthermore, advertising income often provides an incentive to provide poor quality search results. For example, we noticed a major search engine would not return a large airline’s homepage when the airline’s name was given as a query. It so happened that the airline had placed an expensive ad, linked to the query that was its name. A better search engine would not have required this ad, and possibly resulted in the loss of the revenue from the airline to the search engine. In general, it could be argued from the consumer point of view that the better the search engine is, the fewer advertisements will be needed for the consumer to find what they want. This of course erodes the advertising supported business model of the existing search engines. However, there will always be money from advertisers who want a customer to switch products, or have something that is genuinely new. But we believe the issue of advertising causes enough mixed incentives that it is crucial to have a competitive search engine that is transparent and in the academic realm.

How times change.

Back to the start of the Triangulation episode, and Leo Laporte reminisces on how in the early days of Google the focus was on using data to optimise the performance of the search engine — that is, to optimise the way in which search results were presented on a page in response to a user query. Indeed, the first design goal listed in the Anatomy of a Search Engine paper is to “improve the quality of web search engines”.

In contrast, today’s webcos seek to maximise revenues by modeling, predicting, and even influencing, user behaviours in order to encourage users to enter into financial transactions. Google takes an early cut from others’ potential revenues arising from potential transactions in the form of advertising revenue.

At which point, let’s introduce learning analytics. I think the above maps well on to how I see the role of analytics in education. I am still firmly in the camp of Appendix A. I think we should use data to improve the performance of the things we control and use data to inform changes to the things we control. I see learning analytics as a bastard child of a Surveillance Capitalism worldview.

Looking back to the early OUseful.info archives, here and in my original (partially complete) blog archive, I’ve posted several times over the years about how we might make use of “analytics” data to maintain and improve the things we control.

Treating our VLE course pages as a website

In the OU, a significant portion of the course content of an increasing number of courses is delivered as VLE website content. Look at an OpenLearn course to get a feel for what this content looks like. In the OU, the VLE is not used as a place to dump lecture notes: it is the lecture.

The VLE content is under out control. We should use website performance data to improve the quality of our web pages (which is to say, our module content). During module production (in some modules at least) at lot of design effort is put into limiting and chunking content so as not to overload students (word limits in the content we produce; guides about how much time to spend on a particular activity.

So do we make use of simple (basic) web analytics to track this? To track how long students spend on a particular web page, to track whether they ever click on links to external resources, to track sorts of study patterns students appear to have so we can better chunk our content (eg form the web stats, do they study in one hour blocks, two hour blocks, four hour block) or better advise online forum moderators as to when students are online so we can maybe even provide a bit of realtime interaction/support?

If students appear to spend far longer on a page than the design budgeted for it, is that ever flagged up to us?

From my perspective, I don’t get to see that data or the opportunity to make changes based on it.

(There’s “too much data” to try to collect it all apparently. (By the by, was that a terabyte SD card I saw has recently gone on sale?) At one point crude stats for daily(?) page usage was available to us in the VLE, but I haven’t checked recently to see what stats I can download from there easily (pointers would be much welcomed…). Even crude data might be useful to module teams (eg see the heatmap in this post on Teaching Material Analytics).)

I’ve posted similar rants before. See also rants on things like not doing A/B testing. I also did a series of posts on Library web analytics and have a scraggy script for analysing FutureLearn data as available a couple of years ago here.

Note that there is one area where I know we do use stats to improve materials, or modify internal behaviour, and that’s in assessment. Looking at data from online quiz questions can identify if questions are too easier, or two hard, or we maybe need to teach something better if one of the distractors is getting selected as the right answer too often.

In tutor marked and end of course assessment, we also use stats to shape question level stats or modify individual tutor marks (the numbers are such that excessively harsh or generous markers can often be identified, and their awarded marks statistically tweaked to bring them into line with other markers as a whole).

In both those cases, we do use data to modify OUr behaviour and things we control.

Search Data

This is something we don’t get to see from course material or conveniently at new module/curriculum planning time.

For example, what are (new) students searching for on the OU website in subject related terms. (I used to get quite het up about the way we wrote course descriptions in course listings on the OU website, arguing that it’s all very well putting in words describing the course that students will understand once they’ve finished the course, but it doesn’t help folk find that page when they don’t have the vocabulary and won’t be using those search terms…) Or what subjects are folk searching for on OpenLearn or FutureLearn (the OU owns FutureLearn, though I’m not sure what benefits accrue from it back to the OU?).

In terms of within-course related searching, what terms are students searching for and how might we use that information to improve navigation, glossary items, within-module “SEO”. Again, how might we use data that is available, or that can be collected, to improve the thing we control (the course content).

UPDATE — Okay, So Maybe We Do Run the Numbers

Via a blog post in my feeds, a tweet chaser from me to the author, and a near immediate response, maybe I was wrong: maybe we are closing the loop (at least, in a small part of the OU): see here: So I was Wrong… Someone Does Look at the Webstats….

I know I live on the Isle of Wight, but for years it’s felt like I’ve been sent to Coventry.

Learning Analytics

The previous two sections correspond to my Appendix 8 world view, and original design goal of “improving the quality of module content web pages”, a view that never got traction because… I don’t know. I really don’t know. Too mundane, maybe?

That approach also stands in marked contrast to the learning analytics view, which is more akin to the current dystopia being developed by Google et al. In this world, data is collected not to improve the thing we control (the course content, structure and navigation) but to control the user so they better meet our metrics. Data is collected not so that we can make interventions in the thing we control (the course content, structure and navigation) but “the product” — the student. Interventions are there so we can tell the students where they are going wrong, where they are not performing.

The fact that we spend £loads on electronic resources that (perhaps) no-one ever uses (I don’t know – they may do? I don’t see the click stats) is irrelevant.

The fact that students do or don’t watch videos, or bail out of watching videos after 3 minutes (so maybe we shouldn’t make four minute videos?), is not something that gets back to the course team. I can imagine that more likely would be an email to a student as an intervention saying “we notice you don’t seem to be watching the videos…”

But in such a case, IT’S NOT A STUDENT PROBLEM, IT’S A CONTENT DESIGN PROBLEM. Which is to say, it’s OUr problem, and something we can do something about.

Conclusion

It would be so refreshing to have a chance to explore a data driven course maintenance model on a short course presented a couple of times a year for a couple of years. We could use this as a testbed to explore setting up feedback loops to monitor intended design goals (time on activity, for example, or designed pacing of materials compared to actual use pacing) and maybe even engage in a bit of A/B testing.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering... View all posts by Tony Hirst