OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for the ‘Anything you want’ Category

Intellectual Talisman, Baudelaire

leave a comment »

During tumultuous times there is often an individual, an intellectual talisman if you like, who watches events unfold and extracts the essence of what is happening into a text, which then provides a handbook for the oppressed. For the frustrated Paris-based artists battling with the Academy during the second half of the nineteenth century, Baudelaire was that individual, his essay, The Painter of Modern Life, the text.

… He claimed that ‘for the sketch of manners, the depiction of bourgeois life … there is a rapidity of movement which calls for an equal speed of execution of the artist’. Sound familiar? The essay goes on to feature several references to the word ‘flâneur‘, the concept of a man-about-town, which Buudelaire was responsible for bringing to the public’s attention, describing the role thus: ‘Observer, philosopher, flâneur – call him what you will… the crowd is his element, as the air is that of the birds and water of fishes. His passion and his profssion are to become one flesh with the crowd. For the perfect flâneur, for the passionate spectator, it is an immense joy to set up house in the heart of the multitude, amid the ebb and flow of movement, in the midst of the fugitive and the infinite.’

There was no better provocation for the Impressionists to go out and paint en plein air. Baudelaire passionately believed that it was incumbent upon living artists to document their time…

And the way to do that was by immersing oneself in the day-to-day of metroplitan living: watching, thinking, feeling and finally recording. …”

Will Gompertz, What Are You Looking At? 150 Years of Modern Art in the Blink of an Eye pp. 28-29

Written by Tony Hirst

December 19, 2014 at 11:01 am

Posted in Anything you want

A Briefest of Looks at the REF 2014 Results Data – Reflection on a Data Exercise

leave a comment »

At a coursemodule team meeting for the new OU data course […REDACTED…], which will involve students exploring data sets that we’ve given them, as well as ones that they will hopefully find for themselves, it was mentioned that we really should get an idea about how the the exercises we’ve written, are writing, and have yet to write will take students to do.

In that context, I noticed that the UK Higher Education Research Excellence Framework, 2014 results were out today, so I gave myself an hour to explore the data, see what’s there, and get an idea for some of the more obvious stories we might try to pull out.

Here’s as far as I got: an hour long conversation from a standing start with the REF 2014 data.

Although I did have a couple of very minor interruptions, I didn’t get as far as I’d expected/hoped.

So here are a few reflections, as well as some comments on the constraints I put myself under:

  • the students will be working in a headless virtual machine we have provided them with; we don’t make any requirements of students to have access to a spreadsheet application; OpenRefine runs on the VM, so that could be used to get a preview of the spreadsheet (I’m not sure how well OpenRefine copes with multiple sheets, if a spreadsheet does contain multiple sheets?); given all that, I thought I’d try to explore the notebook purely within an IPython notebook, without (as @andysc put it) eyeballing the data in a spreadsheet first;
  • I didn’t really read any of the REF docs, so I wasn’t really sure how the data would be reported. I’m not sure how much time it would have taken to read up on the reporting data used, or what sort of explanatory notes and/or external metadata are provided?
  • I had no real idea what questions to ask or reports to generate. “League tables” was an obvious one, but calculated how? You need to know what numbers are available in the data set and how they may (or may not) relate to each other to start down that track. I guess I could have looked at distributions down a column, and then grouped in different ways, and then started to look for and identify the outliers, at least as visually revealed.
  • I didn’t do any charts at all. I had it half in mind to do some dodged bar charts eg within an institution to show how the different profiles were scored within each unit of assessment for a given institution, but ran out of time before I tried that. (I couldn’t remember offhand what sort of shape the data needs to be in to plot that sort of chart, and then wasted a minute or two, gardener’s foot on fork staring into the distance, pondering what we could do if I cast (unmelted) the separate profile data into different columns for each return), but then decided it’d use up too much of my hour trying to remember/look-up how to do that, let alone then trying to make up stuff to do with the data once it was in that form.
  • The exploration/conversation demonstrated grouping, sorting and filtering, though I didn’t limit the range of columns displayed. I did use a few cribs, both from the pandas online documentation, and referenced other notebooks we have drafted for student use (eg on how to generate sorted group/aggregate reports on dataframe).
  • our assessment will probably mark folk down for not doing graphical stuff… so I’d have lost marks on not just putting a random chart in, such as a bar chart counting numbers of institutions by unit of assessment;
  • I didn’t generate any derived data – again, this is something we’d maybe mark students down on; an example I saw just now in the OU internal report on the REF results is GPA – grade point average. I’m not sure what it means, but while in the data I was wondering whether I should explore some function of the points (eg 4 x (num x 4*) + 3 x (num x 3*) … etc) or some function of the number of FTEs and the star rating results.
  • Mid-way through my hour, I noticed that Chris Gutteridge had posted the data as Linked Data; Linked Data and SPARQL querying is another part of the course, so maybe I should spend an hour seeing what I can do with that endpoint from a standing start? (Hmm.. I wonder – does the Gateway to Research also have a SPARQL endpoint?)
  • The course is about database management systems, in part, but I didn’t put the data into either PostgreSQL or MongoDB, which are the two systems we introduce students to, or discuss rationale for which db may have been a useful store for the data, extent to which normalisation was required, (eg taking the data to third normal form or wherever, and perhaps actually demonstrating that etc). (In the course, we’ll probably also show students how to generate RDF triples they can run their own SPARQL queries against.) Nor did I throw the dataframe in SQLite using pandasql, which would have perhaps made it easier (and quicker?) to write some of the queries using SQL rather than the pandas syntax?
  • I didn’t link in to any other datasets, which again is something we’d like students to be able to do. At the more elaborate end might have been pulling in data from something like Gateway to Research? A quicker hack might have been to try annotating the data with administrative information, which I guess can be pulled from one of the datasets on data.ac.uk?
  • I didn’t do any data joining or merging; again, I expect we’ll want students to be able to demonstrate this sort of thing in an appropriate way, eg as a result of merging data in from another source.
  • writing filler text (setting the context, explaining what you’re going to do, commenting on results etc) in the notebook takes time… (That is, the hour is not just taken up by writing the code/queries; and there is also time spent, but not seen, in coming up with questions to ask, as well as then converting them to queries and and then reading, checking and mentally interpreting the results.)

One thing I suggested to the course team was that we all spend an hour on the data and see what we come up with. Another thing that comes to mind is what might I now be able to achieve in a second hour; and then a third hour. (This post has taken maybe half an hour?)

Another approach might have been to hand off notebooks to each other, the second person building on the first’s notebook etc. (We’d need to think how that would work for time: would the second person’s hour start before – or after – reading the first person’s notebook?) This would in some way model providing the student with an overview of a dataset and then getting them to explore it further, giving us an estimate of timings based on how well we can build on work started by someone else, but still getting us to work under a time limit.

Hmmm.. does this also raise the possibility of some group exercises? Eg one person has to normalise the data and get it into PostgreSQL, someone else get some additional linkable data into the mix, someone to start generating summary textual reports and derived data elements, someone generating charts/graphical reports, someone exploring the Linked Data approach?

PS One other thing I didn’t look at, but that is a good candidate for all sorts of activity, would be to try to make comparisons to previous years. Requires finding and ordering previous results, comparing rankings, deciding whether rankings actually refer to similar things (and extent to which we can compare them at all). Also, data protection issues: could we identify folk likely to have been included in a return just from the results data, annotated with data from Gateway to Research, perhaps, or institutional repositories?

Written by Tony Hirst

December 18, 2014 at 1:59 pm

Sketching Scatterplots to Demonstrate Different Correlations

with 8 comments

Looking just now for an openly licensed graphic showing a set of scatterplots that demonstrate different correlations between X and Y values, I couldn’t find one.

[UPDATE: following a comment, Rich Seiter has posted a much cleaner – and general – method here: NORTA Algorithm Examples; refer to that post – rather than this – for the method…(my archival copy of rseiter’s algorithm)]

So here’s a quick R script for constructing one, based on a Cross Validated question/answer (Generate two variables with precise pre-specified correlation):

library(MASS)

corrdata=function(samples=200,r=0){
  data = mvrnorm(n=samples, mu=c(0, 0), Sigma=matrix(c(1, r, r, 1), nrow=2), empirical=TRUE)
  X = data[, 1]  # standard normal (mu=0, sd=1)
  Y = data[, 2]  # standard normal (mu=0, sd=1)
  data.frame(x=X,y=Y)
}

df=data.frame()
for (i in c(1,0.8,0.5,0.2,0,-0.2,-0.5,-0.8,-1)){
  tmp=corrdata(200,i)
  tmp['corr']=i
  df=rbind(df,tmp)
}

library(ggplot2)

g=ggplot(df,aes(x=x,y=y))+geom_point(size=1)
g+facet_wrap(~corr)+ stat_smooth(method='lm',se=FALSE,color='red')

And here’s an example of the result:

scatterCorr

It’s actually a little tidier if we also add in + coord_fixed() to fix up the geometry/aspect ratio of the chart so the axes are of the same length:

scatterCorrSquare

So what sort of OER does that make this post?!;-)

PS methinks it would be nice to be able to use different distributions, such as a uniform distribution across x. Is there a similarly straightforward way of doing that?

UPDATE: via comments, rseiter (maybe Rich Seiter?) suggests the NORmal-To-Anything (NORTA) algorithm (about, also here). I have no idea what it does, but here’s what it looks like!;-)

//based on http://blog.ouseful.info/2014/12/17/sketching-scatterplots-to-demonstrate-different-correlations/#comment-69184
#The NORmal-To-Anything (NORTA) algorithm
library(MASS)
library(ggplot2)

#NORTA - h/t rseiter
corrdata2=function(samples, r){
  mu <- rep(0,4)
  Sigma <- matrix(r, nrow=4, ncol=4) + diag(4)*(1-r)
  rawvars <- mvrnorm(n=samples, mu=mu, Sigma=Sigma)
  #unifvars <- pnorm(rawvars)
  unifvars <- qunif(pnorm(rawvars)) # qunif not needed, but shows how to convert to other distributions
  print(cor(unifvars))
  unifvars
}

df2=data.frame()
for (i in c(1,0.9,0.6,0.4,0)){
  tmp=data.frame(corrdata2(200,i)[,1:2])
  tmp['corr']=i
  df2=rbind(df2,tmp)
}
g=ggplot(df2,aes(x=X1,y=X2))+geom_point(size=1)+facet_wrap(~corr)
g+ stat_smooth(method='lm',se=FALSE,color='red')+ coord_fixed()

Here’s what it looks like with 1000 points:

unifromScatterCorr

Note that with smaller samples, for the correlation at zero, the best fit line may wobble and may not have zero gradient, though in the following case, with 200 points, it looks okay…

unifscattercorrsmall

The method breaks if I set the correlation (r parameter) values to less than zero – Error in mvrnorm(n = samples, mu = mu, Sigma = Sigma) : ‘Sigma’ is not positive definite – but we can just negate the y-values (unifvars[,2]=-unifvars[,2]) and it seems to work…

If in the corrdata2 function we stick with the pnorm(rawvars) distribution rather than the uniform (qunif(pnorm(rawvars))) one, we get something that looks like this:

corrnorm1000

Hmmm. Not sure about that…?

Written by Tony Hirst

December 17, 2014 at 1:24 pm

Posted in Anything you want, Rstats

Tagged with

Thoroughly Confused About Student VMs & Docker

with 11 comments

The story so far… We’re looking at using a virtual machine (VM) preconfigured with all sorts of software and services for a distance education course. The VM runs in headless mode (no graphical desktop) and exposes the applications we want students to be able to run as services accessed through a web browser. The VM is built from a vagrant script using puppet, in order to support maintenance and as a demonstration of a potentially generic production model. It also means that we should be able to build VMs for different VM runners (Virtualbox, VMWare etc), as well as being able to generate machine images for use in cloud hosted VMs as well as getting them up and running. The activities to be run inside the VM include a demonstration of a distributed MongoDB network into which network partitions are introduced. The separate database instances run inside individual docker containers and firewall rules are used to network partition them.

The set-up looks something like this – green blocks are services running in the VM, orange blocks are containers:

ouvm

The containers are started up from within the IPython notebook using a python wrapper to docker.io.

One problem with this approach is that we have two Mongo DB downloads – one in the VM and one for use in containers. This makes me think that it might make more sense to run all the applications in containers of their own. For example, the notebook server in a container of its own, the original MongoDB instance in a container of its own, PostgreSQL in a container of its own, and finally a data container or other area of the VM that can be used to persist data within the VM such that it can be accessed by one or more of the other services as and when required.

I’m not sure what the best strategy would be for persisting state, for example as used by the database services? If we mount a database’s datastore in a volume within the VM, we can destroy the container that is used to run a database service and our database datastore will be preserved. If the mounted volume is located in the VM/host share area, if the VM itself is destroyed the data will persist in a volume on the host. This is perhaps a bit scrappy because it means that we might nominally take up a large amount of space “on the host”, compared with the situation of providing students with a pre-populated database in which the data volume was mounted “inside” the VM proper (i.e. away from the share area). In such a case, it would nice if any data tables that students built were mounted into the host shared area (so the students can clearly “retain ownership” of those tables), but I suspect that DBMS don’t like putting different data tables or databases into different volumes..?

This approach to using docker is perhaps at odds with the typical way of thinking about how we might make use of it. It is very much in the style of a VM acting as an app runner for a host, and docker containers being used to run the individual apps.

One problem with this approach is that we need a control panel that:

  • is always running;
  • is exposed via an http/HTML service that can be accessed on the host machine;
  • allows students to bring up and shut down containers/services/”apps” as required.

The container approach is nice because if something breaks with one of the databases, for example, it should be easy for a student to switch it off and switch it back on again… That said, if we are popping up and ripping down containers, we need to work out a connection manager so that eg IPython knows where to find a particular database (I think that docker starts services up on essentially arbitrary internal IP addresses?). This could possibly be done by naming docker processes sensibly and then creating a little python library to look up the docker processes by name so that we know where to find them?

As ever, there are tradeoffs in terms of making this easy approach easy for us (as production engineers), making them easy for students, and making them easy for the helpdesk to support…

I don’t know enough about any of this stuff to know whether it makes sense rebuilding the VM we currently have, in which services roaming around the base VM, with a completely containerised version. The core requirement from the user perspective is that a student should be able to download a base box, fire it up as easily as possible in Virtualbox, and then access a control panel (ideally in a browser) that allows them to start up and shut down applications-in-containers, as well as seeing a clear dashboard view of what services are up and running and what (localhost) ports they can be accessed on through their browser.

If you can help talk me through what the issues are or might be, and whether any of the above makes sense or is complete nonsense, I would be most grateful…

Written by Tony Hirst

December 10, 2014 at 2:34 pm

Posted in Anything you want, OU2.0

Tagged with

Crowd-Sourced Social Media Subtitling of the F1 Video Archive – Or Not…

with 2 comments

As a team entry with Martin Hawksey, we put in an entry to the third Tata F1 Connectivity Prize Challenge, which was to catalogue the F1 video archive. We didn’t win (prizewinning entries here), but here’s the gist of our entry… You may recognise it as something we’d bounced ideas around with before…

Social Media Subtitling of F1 Races

Every race weekend, a multitude of F1 fans informally index live race coverage. Along with broadcasters and F1 teams, audiences use Twitter and other social media platforms to generate real-time metadata which could be used to index video footage. The same approach can be used to index the 60,000 hours of footage dating back to 1981.

We propose an approach that harnesses the collection of race commentaries with the promotion of the race being watched, an approach referred to as social media subtitling. Social media style updates collected while a race is being watched are harvested and used to provide commentary-like subtitles for each race. These subtitles can then be used to index and search into each race video.

f1tata_1

Annotating Archival Footage

Audiences log in to an authenticated area using an account linked to one or more of their linked social media profiles. They select a video to watch and are presented with a DRM-enabled embedded streaming video player. An associated text editor can be used to create the social media subtitles. On starting to type into the text editor, a timestamp is grabbed from the video for a few seconds before the typing started (so a replay of the commented event can be seen) and associated with the text entry. On posting the subtitle, it is placed into timestamped comment database. Optionally, the comment can be published via a public social media account with an appropriate hashtag and a link to the timestamped part of the video. The link could lead to an authentication page to gain access to the video and the commentary client, or it may lead to a teaser video clip containing a second or two of the commented upon scene.

f1tata_2

Examples of the evolution of the original iTitle Twitter subtitler by M. Hawksey, showing: timestamped social media subtitle editor linked to video player; searching into a video using timestamped social media updates; video transcript from social media harvested updates collected in realtime.

The subtitles can be searched and used to act as a timestamped text index describing the video. Subtitles can also be used to generate commentary transcripts.

If a fan watches a replay of a race and comments on it using their own social media account, a video start time tweet could be sent from the player when they start watching the video (“I’ve just started watching the XXXX #F1 race on #TataF1.tv [LINK]”). This tweet then acts to timestamp their social media updates relative the corresponding video timestamp as well as publicising the video.

Subtitle Quality Control

The quality of subtitles can be controlled in several ways:

  • a stream of subtitles can be played alongside the video and liked (positively scored) or disliked (negatively scored) by a viewer. (There is a further opportunity here for liked comments to be shared to public social media (along with a timestamped link into the video).) This feedback can also be used to generate trust ratings for commenters (someone whose comments are “liked” by a wide variety of people may be seen as providing trusted commentary);
  • text mining / topic modeling of aggregated comments around the same time can be used to identify crowd consensus topic or keywords.

If available, historical race timing data may be used to confirm certain sorts of information. For example, from timing sheets we can get data about pitstops, or the laps on which cars exited a race from an accident or mechanical failure. This information can be matched to the racetime timestamp of a comment; if comment topics match timing data identified events at about the right time, those comments can automatically be rated positively.

Making Use of Public Social Media Updates

For current and future races, logging social media updates around live races provides a way of bootstrapping the comment database. (Timestamps would be taken as realtime updates, although offsetting mechanisms to account for several second delays in digital TV feeds, for example, would need to be accounted for.) Feeds from known F1 journalists, race teams etc. would be taken as trusted feeds. Harvesting hashtagged feeds from the wider F1 audience would allow the collection of race comment social media updates more widely.

f1tata_3

Social media updates can also be harvested in real time around live races or replayed races if we know the video start time.

For recent historical races, archived social media updates, as for example collected by Datasift, could be purchased and used to bootstrap the social media subtitle database.

Race Club

Social media subtitling provides a great opportunity for social activity. Groups of individuals can choose to watch a race at the same time, commenting to each other either through the bespoke subtitler client or by using public social media updates and an appropriate hashtag. If a user logs in to the video playback area, timestamps of concurrent updates from their linked public social media accounts can be reconciled with timestamps associated with the streamed video they are watching in the authenticated race video area.

In the off-season, or in the days leading up to a particular race “Historic Race Weekend” videos could be shown, perhaps according to a streamed broadcast model. That is, a race is streamed from within the authenticated area at a particular set time. Fans watch this scheduled event (under authentication) but comment on it in public using social media. These updates are harvested and the timestamps reconciled with the streamed video.

Summary

Social media subtitling draws on the idea that social media updates can be used to provide race commentary. Live social media comments collected around live events can be used to bootstrap a social media commentary database. Replayed streamed events can be annotated by associating social media update timestamps with known start/stop times of video replays. A custom client tied to a video player can be used to enter commentary directly to the database as well as issuing it as a social media update.

Team entry: Tony Hirst and Martin Hawksey

PS Rather than referring to social media subtitles and social media subtitling, I think social media captions and social media captioning is more generic?

Written by Tony Hirst

December 10, 2014 at 11:24 am

Posted in Anything you want

Tagged with

Looking for an Alternative to Twitter – and Vodafone…

with 4 comments

Whilst at an event over the weekend – from which I would generally have tweeted once or twice – I got the following two part message in what I regard as my personal Twitter feed to my phone:

1/2: Starting Nov 15, Vodafone UK will no longer support some Twitter SMS notifications. You will still be able to use 2-factor authentication and reset your pa
2/2: ssword over SMS.However, Tweet notifications and activity updates will cease. We are very sorry for the service disruption.

The notification appeared from a mobile number I have listed as “Twitter” in my contacts book. This makes both Twitter and Vodafone very much less useful to me – direct messages and mentions used to come direct to my phone as SMS text messages, and I used to be able to send tweets and direct messages to the same number (Docs: Twitter SMS commands). The service connected an SMS channel on my personal/private phone, with a public address/communication channel that lives on the web.

Why not use a Twitter client? A good question that deserves and answer that may sound foolish: Vodafone offers crappy coverage in the places I need it most (at home, at in-laws) where data connections are as good as useless. At home there’s wifi – and other screens running Twitter apps – elsewhere there typically isn’t. I also use my phone at events in the middle of fields, where coverage is often poor and even getting a connection can be chancy. (Client apps also draw on power – which is a factor on a weekend away at an event where there may be few recharging options; and they’re often keen to access contact book details, as well as all sorts of other permissions over your phone.)

So SMS works – it’s low power, typically on, low bandwidth, personal and public (via my Twitter ID).

But now both my phone contract and Twitter are worth very much less to me. One reason I kept my Vodafone contract was because of the Twitter connection. And one reason I stick with Twitter is the SMS route I have – I had – to it.

So now I’m looking for an alternative. To both.

I thought about rolling my own service using an SMS channel on IFTT, but I don’t think it supports/is supported by Vodafone in the UK? (Do any mobile operators support it? It so, I think I may have to change to them…)

ifft

If I do change contract though, I hope it’s easier that the last contract we tried are still trying – to kill. After several years it seems a direct debit on an old contract is still going out; after letters – and a phone call to Vodafone where they promised the direct debit was cancelled – it pops up again, paying out again, the direct debit that will never die. This week we’ll try again. Next month, if it pops up again, I guess we need to call on the ombudsman.

I guess what I’d really like is a mobile operator that offers me an SMS gateway so that I can call arbitrary webhooks in response to text messages I send, and field web requests that can then forward messages to my phone. (Support for the IFTT SMS channel would be almost as good.)

From what I know of Twitter’s origins (as twttr), the mobile SMS context was an important part of it – “I want to have a dispatch service that connects us on our phones using text” @Jack [Dorsey, Twitter founder] possibly once said (How Twitter Was Born). Text still works for me in the many places I go where wifi isn’t available and data connections are unreliable and slow, if indeed they’re available at all. (If wifi is available, I don’t need my phone contract…)

The founding story continues: I remember that @Jack’s first use case was city-related: telling people that the club he’s at is happening. If Vodafone and Twitter hadn’t stopped playing over the weekend, I’d have tweeted that I was watching the Wales Rally (WRC/FIA World Rally Championship) Rallyfest stages in North Wales at Chirk Castle on Saturday, and Kinmel Park on Sunday. As it was, I didn’t – so WRC also lost out on the deal. And I don’t have a personal tweet record of the event I was at.

If I’m going to have to make use of a web client and data connection to make use of Twitter messaging, it’s probably time to look for a service that does it better. What’s WhatsApp like in this respect?

Or if I’m going to have to make use of a web client and data connection to make use of Twitter messaging, I need to find a mobile operator that offers reliable data connections in places where I need it, because Vodafone doesn’t.

Either way, this cessation of the service has made me realise where I get most value from Twitter, and where I get most value from Vodafone, and it was in a combination of those services. With them now separated, the value of both to me are significantly reduced. Reduced to such an effect that I am looking for alternatives – to both.

Written by Tony Hirst

November 17, 2014 at 10:43 am

Posted in Anything you want

Teaching Material Analytics

leave a comment »

A couple of weeks ago, I had a little poke around some of the standard reports that we can get out of the OU VLE. OU course materials are generated from a structured document format – OU XML – that generates one or more HTML pages bound to a particular Moodle resource id. Additional Moodle resources are associated with forums, admin pages, library resource pages, and so on.

One of the standard reports provides a count of how many times each resource has been accessed within a given time period, such as a weekly block. Data can only be exported for so many weeks at a time, so to get stats for course materials over the presentation of a course (which may be up to 9 months long) requires multiple exports and the aggregation of the data.

We can then generate simple visual summaries over the data such as the following heatmap.

course_material_usage

Usage is indicated by colour density, time in weeks are organised along horizontal x-axis. From the chart, we can clearly see waves of activity over the course of the module as students access resources associated with particular study weeks. We can also see when materials aren’t
being accessed, or are only being accessed by a low number of times (that is, necessarily by a low proportion of students. If we get data about unique user accesses or unique user first use activity, we can get a better idea about the proportion of students in a cohort as a whole accessing a resource).

This sort of reporting – about material usage rather than student attainment – was what originally attracted me to thinking about data in the context of OU courses (eg Course Analytics in Context). That is, I wasn’t that interested in how well students were doing, per se, or interested in trying to find ways of spying on individual students to build clever algorithms behind experimental personalisation and recommender systems that would never make it out of the research context.

That could come later.

What I originally just wanted to know was whether this resource was ever looked at, whether that resource was accessed when I expected (eg if an end of course assessment page was accessed when students were prompted to start thinking about it during an exercise two thirds of the way in to the course), whether students tended to study for half an hour or three hours (so I could design the materials accordingly), how (and when) students searched the course materials – and for what (keyphrase searches copied wholesale out of the continuous assessment materials) and so on.

Nothing very personal in there – everything aggregate. Nothing about students, particularly, everything about course materials. As a member of the course team, asking how are the course materials working rather than how is that student performing?

There’s nothing very clever about this – it’s just basic web stats run with an eye to looking for patterns of behaviour over the life of a course to check that the materials appear to be being worked in the way we expected. (At the OU, course team members are often a step removed from supporting students.)

But what it is, I think, is an important complement to the “student centred” learning analytics. It’s analytics about the usage and utilisation of the course materials, the things we actually spend a couple of years developing but don’t really seem to track the performance of?

It’s data that can be used to inform and check on “learning designs”. Stats that act as indicators about whether the design is being followed – that is, used as expected, or planned.

As a course material designer, I may want to know how well students perform based on how they engage with the materials, but I should really to know how the materials are being utilised, because they’re designed to be utilised in a particular way? And if they’re not being used in that way, maybe I need to have a rethink?

Written by Tony Hirst

November 14, 2014 at 12:57 pm

Posted in Analytics, Anything you want

Tagged with

Follow

Get every new post delivered to your Inbox.

Join 870 other followers