OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Thoroughly Confused About Student VMs & Docker

with 11 comments

The story so far… We’re looking at using a virtual machine (VM) preconfigured with all sorts of software and services for a distance education course. The VM runs in headless mode (no graphical desktop) and exposes the applications we want students to be able to run as services accessed through a web browser. The VM is built from a vagrant script using puppet, in order to support maintenance and as a demonstration of a potentially generic production model. It also means that we should be able to build VMs for different VM runners (Virtualbox, VMWare etc), as well as being able to generate machine images for use in cloud hosted VMs as well as getting them up and running. The activities to be run inside the VM include a demonstration of a distributed MongoDB network into which network partitions are introduced. The separate database instances run inside individual docker containers and firewall rules are used to network partition them.

The set-up looks something like this – green blocks are services running in the VM, orange blocks are containers:

ouvm

The containers are started up from within the IPython notebook using a python wrapper to docker.io.

One problem with this approach is that we have two Mongo DB downloads – one in the VM and one for use in containers. This makes me think that it might make more sense to run all the applications in containers of their own. For example, the notebook server in a container of its own, the original MongoDB instance in a container of its own, PostgreSQL in a container of its own, and finally a data container or other area of the VM that can be used to persist data within the VM such that it can be accessed by one or more of the other services as and when required.

I’m not sure what the best strategy would be for persisting state, for example as used by the database services? If we mount a database’s datastore in a volume within the VM, we can destroy the container that is used to run a database service and our database datastore will be preserved. If the mounted volume is located in the VM/host share area, if the VM itself is destroyed the data will persist in a volume on the host. This is perhaps a bit scrappy because it means that we might nominally take up a large amount of space “on the host”, compared with the situation of providing students with a pre-populated database in which the data volume was mounted “inside” the VM proper (i.e. away from the share area). In such a case, it would nice if any data tables that students built were mounted into the host shared area (so the students can clearly “retain ownership” of those tables), but I suspect that DBMS don’t like putting different data tables or databases into different volumes..?

This approach to using docker is perhaps at odds with the typical way of thinking about how we might make use of it. It is very much in the style of a VM acting as an app runner for a host, and docker containers being used to run the individual apps.

One problem with this approach is that we need a control panel that:

  • is always running;
  • is exposed via an http/HTML service that can be accessed on the host machine;
  • allows students to bring up and shut down containers/services/”apps” as required.

The container approach is nice because if something breaks with one of the databases, for example, it should be easy for a student to switch it off and switch it back on again… That said, if we are popping up and ripping down containers, we need to work out a connection manager so that eg IPython knows where to find a particular database (I think that docker starts services up on essentially arbitrary internal IP addresses?). This could possibly be done by naming docker processes sensibly and then creating a little python library to look up the docker processes by name so that we know where to find them?

As ever, there are tradeoffs in terms of making this easy approach easy for us (as production engineers), making them easy for students, and making them easy for the helpdesk to support…

I don’t know enough about any of this stuff to know whether it makes sense rebuilding the VM we currently have, in which services roaming around the base VM, with a completely containerised version. The core requirement from the user perspective is that a student should be able to download a base box, fire it up as easily as possible in Virtualbox, and then access a control panel (ideally in a browser) that allows them to start up and shut down applications-in-containers, as well as seeing a clear dashboard view of what services are up and running and what (localhost) ports they can be accessed on through their browser.

If you can help talk me through what the issues are or might be, and whether any of the above makes sense or is complete nonsense, I would be most grateful…

Written by Tony Hirst

December 10, 2014 at 2:34 pm

Posted in Anything you want, OU2.0

Tagged with

Crowd-Sourced Social Media Subtitling of the F1 Video Archive – Or Not…

with 2 comments

As a team entry with Martin Hawksey, we put in an entry to the third Tata F1 Connectivity Prize Challenge, which was to catalogue the F1 video archive. We didn’t win (prizewinning entries here), but here’s the gist of our entry… You may recognise it as something we’d bounced ideas around with before…

Social Media Subtitling of F1 Races

Every race weekend, a multitude of F1 fans informally index live race coverage. Along with broadcasters and F1 teams, audiences use Twitter and other social media platforms to generate real-time metadata which could be used to index video footage. The same approach can be used to index the 60,000 hours of footage dating back to 1981.

We propose an approach that harnesses the collection of race commentaries with the promotion of the race being watched, an approach referred to as social media subtitling. Social media style updates collected while a race is being watched are harvested and used to provide commentary-like subtitles for each race. These subtitles can then be used to index and search into each race video.

f1tata_1

Annotating Archival Footage

Audiences log in to an authenticated area using an account linked to one or more of their linked social media profiles. They select a video to watch and are presented with a DRM-enabled embedded streaming video player. An associated text editor can be used to create the social media subtitles. On starting to type into the text editor, a timestamp is grabbed from the video for a few seconds before the typing started (so a replay of the commented event can be seen) and associated with the text entry. On posting the subtitle, it is placed into timestamped comment database. Optionally, the comment can be published via a public social media account with an appropriate hashtag and a link to the timestamped part of the video. The link could lead to an authentication page to gain access to the video and the commentary client, or it may lead to a teaser video clip containing a second or two of the commented upon scene.

f1tata_2

Examples of the evolution of the original iTitle Twitter subtitler by M. Hawksey, showing: timestamped social media subtitle editor linked to video player; searching into a video using timestamped social media updates; video transcript from social media harvested updates collected in realtime.

The subtitles can be searched and used to act as a timestamped text index describing the video. Subtitles can also be used to generate commentary transcripts.

If a fan watches a replay of a race and comments on it using their own social media account, a video start time tweet could be sent from the player when they start watching the video (“I’ve just started watching the XXXX #F1 race on #TataF1.tv [LINK]”). This tweet then acts to timestamp their social media updates relative the corresponding video timestamp as well as publicising the video.

Subtitle Quality Control

The quality of subtitles can be controlled in several ways:

  • a stream of subtitles can be played alongside the video and liked (positively scored) or disliked (negatively scored) by a viewer. (There is a further opportunity here for liked comments to be shared to public social media (along with a timestamped link into the video).) This feedback can also be used to generate trust ratings for commenters (someone whose comments are “liked” by a wide variety of people may be seen as providing trusted commentary);
  • text mining / topic modeling of aggregated comments around the same time can be used to identify crowd consensus topic or keywords.

If available, historical race timing data may be used to confirm certain sorts of information. For example, from timing sheets we can get data about pitstops, or the laps on which cars exited a race from an accident or mechanical failure. This information can be matched to the racetime timestamp of a comment; if comment topics match timing data identified events at about the right time, those comments can automatically be rated positively.

Making Use of Public Social Media Updates

For current and future races, logging social media updates around live races provides a way of bootstrapping the comment database. (Timestamps would be taken as realtime updates, although offsetting mechanisms to account for several second delays in digital TV feeds, for example, would need to be accounted for.) Feeds from known F1 journalists, race teams etc. would be taken as trusted feeds. Harvesting hashtagged feeds from the wider F1 audience would allow the collection of race comment social media updates more widely.

f1tata_3

Social media updates can also be harvested in real time around live races or replayed races if we know the video start time.

For recent historical races, archived social media updates, as for example collected by Datasift, could be purchased and used to bootstrap the social media subtitle database.

Race Club

Social media subtitling provides a great opportunity for social activity. Groups of individuals can choose to watch a race at the same time, commenting to each other either through the bespoke subtitler client or by using public social media updates and an appropriate hashtag. If a user logs in to the video playback area, timestamps of concurrent updates from their linked public social media accounts can be reconciled with timestamps associated with the streamed video they are watching in the authenticated race video area.

In the off-season, or in the days leading up to a particular race “Historic Race Weekend” videos could be shown, perhaps according to a streamed broadcast model. That is, a race is streamed from within the authenticated area at a particular set time. Fans watch this scheduled event (under authentication) but comment on it in public using social media. These updates are harvested and the timestamps reconciled with the streamed video.

Summary

Social media subtitling draws on the idea that social media updates can be used to provide race commentary. Live social media comments collected around live events can be used to bootstrap a social media commentary database. Replayed streamed events can be annotated by associating social media update timestamps with known start/stop times of video replays. A custom client tied to a video player can be used to enter commentary directly to the database as well as issuing it as a social media update.

Team entry: Tony Hirst and Martin Hawksey

PS Rather than referring to social media subtitles and social media subtitling, I think social media captions and social media captioning is more generic?

Written by Tony Hirst

December 10, 2014 at 11:24 am

Posted in Anything you want

Tagged with

Identifying Position Change Groupings in Rank Ordered Lists

with one comment

The title says it all, doesn’t it?!

Take the following example – it happens to show race positions by driver for each lap of a particular F1 grand prix, but it could be the evolution over time of any rank-based population.

poschanges

The question I had in mind was – how can I identify positions that are being contested during a particular window of time, where by contested I mean that the particular position was held by more than one person in a particular window of time?

Let’s zoom in to look at a couple of particular steps.

poschangeGroup

We see distinct groups of individuals who swap positions with each other between those two consecutive steps, so how can we automatically detect the positions that these drivers are fighting over?

A solution given to a Stack Overflow question on how to get disjoint sets from a list in R gives what I thought was a really nice solution: treat it as a graph, and then grab the connected components.

Here’s my working of it. Start by getting a list of results that show a particular driver held different positions in the window selected – each row in the original dataframe identifies the position held by a particular driver at the end of a particular lap:

library(DBI)
ergastdb =dbConnect(RSQLite::SQLite(), './ergastdb13.sqlite')

#Get a race identifier for a specific race
raceId=dbGetQuery(ergastdb,
                  'SELECT raceId FROM races WHERE year="2012" AND round="1"')

q=paste('SELECT * FROM lapTimes WHERE raceId=',raceId[[1]])

lapTimes=dbGetQuery(ergastdb,q)
lapTimes$position=as.integer(lapTimes$position)

library(plyr)

#Sort by lap first just in case
lapTimes=arrange(lapTimes,driverId,lap)

#Create a couple of new columns
#pre is previous lap position held by a driver given their current lap
#ch is position change between the current and previous lap
tlx=ddply(lapTimes,.(driverId),transform,pre=(c(0,position[-length(position)])),ch=diff(c(0,position)))

#Find rows where there is a change between a given lap and its previous lap
#In particular, focus on lap 17
llx=tlx[tlx['ch']!=0 & tlx['lap']==17,c("position","pre")]

llx

This filters the complete set of data to just those rows where there is a difference between a driver’s current position and previous position (the first column in the result just shows row numbers and can be ignored).

##      position pre
## 17          2   1
## 191        17  18
## 390         9  10
## 448         1   2
## 506         6   4
## 719        10   9
## 834         4   5
## 892        18  19
## 950         5   6
## 1008       19  17

We can now create a graph in which nodes represent positions (position or pre values) and edges connect a current and previous position.

#install.packages("igraph")
#http://stackoverflow.com/a/25130575/454773
library(igraph)

posGraph = graph.data.frame(llx)
    
}

plot(posGraph)

The resulting graph is split into several components:

posgraph

We can then identify the connected components:

posBattles=split(V(posGraph)$name, clusters(posGraph)$membership)
#Find the position change battles
for (i in 1:length(posBattles)) print(posBattles[[i]])

This gives the following clusters, and their corresponding members:

## [1] "2" "1"
## [1] "17" "18" "19"
## [1] "9"  "10"
## [1] "6" "4" "5"

To generalise this approach, I think we need to do a couple of things:

  • allow a wider window within which to identify battles (so look over groups of three or more consecutive laps);
  • simplify the way we detect position changes for a particular driver; for example, if we take the set of positions held by a driver within the desired window, if the cardinality of the set (that is, its size) is greater than one, then we have had at least one position change for that driver within that window. Each set of size > 1 of unique positions held by different drivers can be used to generate a set of distinct, unordered pairs that connect the positions (I think it only matters that they are connected, not that a driver specifically went from position x to position y going from one lap to the next?). If we generate the graph from the set of distinct unordered pairs taken across all drivers, we should then be able to identify the contested/driver change position clusters.

Hmm… I need to try that out… And when I do, if and when it works(?!), I’ll add a complete demonstration of it – and how we might make use of it – to the Wrangling F1 Data With R book.

Written by Tony Hirst

December 9, 2014 at 10:44 am

Posted in f1stats, Rstats

Exporting Markdown and XML From Google Docs

leave a comment »

Just over a year ago, we started production of a new OU course using Google docs as the medium within which we’d share draft course materials. This was something of an experiment to see whether the online social document medium encouraged sharing and discussion of ideas, resources, ongoing feedback and comment of the work in progress amongst the course team, rather than, or at least, in addition to, the traditional handover of significant chunks of the course at set handover dates. (In case you’re wondering, it didn’t…)

The question we’re now faced with is, how do we get the content that’s in Google docs into the next stage of the OU document workflow.

The actual HTML based course materials that appear in the VLE (or the ebook versions of them, etc) are generated automatically from an “OU Structured Content” XML document. The XML documents are prepared using the <oXygen/> XML editor, extended with an OU structured content framework that includes the requisite DTDs/schema files, hooks for rendering, previewing and publishing into the OU environment and so on. (Details about the available tags can be found here: tag guide [OU internal link].)

Whilst the preferred authoring route is presumably that authors use the <oXygen/> editor from the start, the guidance also suggests that many authors use Word (and an appropriate OU style sheet) and then copy and paste content over into the XML editor, at which point some amount of tidying and retagging may be required.

As Google docs doesn’t seem to support the addition of custom style elements or tags (users are limited to customising the visual style of provided style elements), we need to find another way of getting content from Google docs and into <oXygen/>. One approach would be to grab a copy of the Google doc into Microsoft Word, apply an OU template to mark up the content using appropriate custom style elements, and then copy the content over to the XML editor, upon which point it will probably need further tidying.

Another approach is to try to export the data as an XML document. Looking around, I found a Google Apps script script (sic) that allows you to export the content in a Google doc as markdown. Whilst markdown documents don’t have the same tree-like document structure as XML, it did provide an example of how to parse a Google docs document. My first attempt at a script to export a Google doc in the OU XML format can be found here: export Google doc as OU SC-XML. (Note: the script is subject to change; now I have a basic operational/functional spec, I can start to try to tidy up the code and try to parse out more structure…)

Having got a minimal exporter working, the question now arises as to where effort needs to be spent next. The exporter produces a minimal form of OU-XML that sometimes validates and (in early testing) sometimes doesn’t. (If the script is working properly it should produce output that always validate as XML; then it should produce output that always validate as OU-XML). Should time be spent improving the script to produce better XML, or can we live with the fact that the exporter gets the document some way in to <oXygen/>, but further work will be required to fix a few validation breaks?

Another issue that arises is how rich a form of OU-XML we try to export. When working in Microsoft Word environment, a document style can be defined using elements that map onto the elements in OU-XML. When working in Google docs, we need to define a convention that the parser can respond to.

At the moment, the parser is sensitive to:

  • HEADING1: treated as Session;
  • HEADING2: treated as Section;
  • HEADING3: treated as SubSection;
  • LIST_ITEM: numbered and unnumbered lists are treated as BulletedList; sublists to depth 1 are supported as BulletedSubsidiaryList
  • coloured text: treated as AuthorComment; at the moment this may incorrectly grab title elements as such;
  • INLINE_IMAGE: images are rendered as relatively referenced Figure elements with an empty Description element. A copy of the image is locally stored. (Note: INLINE_DRAWING elements are unsupported – there’s no way of exporting them; maybe I should export an empty Figure with a Description saying there’s a missing INLINE_DRAWING?)
  • TABLE: rendered as Table with empty TableHead;
  • bold, italic, LinkUrl: rendered as b, i and a tags respectively;
  • font.COURIER_NEW: rendered as ComputerCode.
  • By convention, we should be able to detect and parse things like activities, exercises and SAQs. Some mechanism needs to be supported for identifying the block elements in such cases. For example, one convention might take the form:

    Exercise N

    Discussion

    End

    The ^Exercise and ^End$ elements denote the block; heading style (eg HEADING4 for the ^Exercise (and perhaps HEADING5 for the ^End$?) could further aid detection?

    Another approach would be to use horizontal lines to denote the start and stop of a block. For example:


    SAQ

    My Answer

    where represents a horizontal line and denotes the start and end of a block. Again, heading styles within the block could either identify or reinforce a particular block element type.

    Rendering a preview of the OU-XML as it would appear in the OU VLE is possible by uploading the OU-XML file, or a zip file containing it and related assets, to an OU URL that sits behind an OU authentication care. The <oXygen/> editor handles previews by using the default web client – your default browser – to post a selected XML document to the appropriate OU upload/preview URL. Unfortunately, the functions that allow http POST operations from Google app script run on Google servers, which means that we can’t just create a button in Google docs that would post and XML export of the current Google doc to the authentication-required OU URL. (I don’t know if this would be possible in an OU/Google apps domain?).

    I’m not sure if a workaround would be to launch a preview window in the browser from Google docs containing a copy of the OU XML version of a document, highlighting the XML, then using a bookmarklet to post the highlighted XML to the OU preview service URL within the browser context using a browser where the user has logged in to the OU web domain? Alternatively, could a Chrome application access both content from Google Drive and then post to the authenticated OU preview URL using browser permissions? (That is, can a Chrome app access both Google Drive using a user’s Google permissions, or machine access permissions, and then post content grabbed from that source to the OU URL using permissions granted to the browser?) As ever, my ignorance about browser security policies on the one hand, and the Google Apps/Chrome apps security model on the other, make it hard to know what workarounds might be possible.

    If any members of OU staff would like to try out the exporter, please get in touch for hints, or let me know how you get on:-) In addition, if any members of OU staff are using Google docs for course production, I’d love to know how you’re using them and how you’re getting on:-)

    PS via @mahawksey, I see that we can associate metadata with a doc using PropertiesService.getDocumentProperties(). Could be handy for adding things like course code, author metadata, publishing route, template etc to a doc. I’m not sure if we can also associate metadata with a folder, though I guess we could also include a file in a folder that contains metadata relating to files held more generally within the same folder?

Written by Tony Hirst

December 4, 2014 at 10:27 pm

Posted in OU2.0, Tinkering

Tagged with

Information Density and Custom Chart Designs

leave a comment »

I’ve been doodling today with a some charts for the Wrangling F1 Data With R living book, trying to see how much information I can start trying to pack into a single chart.

The initial impetus came simply from thinking about a count of laps led in a particular race by each drive; this morphed into charting the number of laps in each position for each driver, and then onto a more comprehensive race summary chart (see More Shiny Goodness – Tinkering With the Ergast Motor Racing Data API for an earlier graphical attempt at producing a race summary chart).

lapPosChart

The chart shows:

grid position: identified using an empty grey square;
race position after the first lap: identified using an empty grey circle;
race position on each driver’s last lap: y-value (position) of corresponding pink circle;
points cutoff line: a faint grey dotted line to show which positions are inside – or out of – the points;
number of laps completed by each driver: size of pink circle;
total laps completed by driver: greyed annotation at the bottom of the chart;
whether a driver was classified or not: the total lap count is displayed using a bold font for classified drivers, and in italics for unclassified drivers;
finishing status of each driver: classification statuses other than *Finished* are also recorded at the bottom of the chart.

The chart also shows drivers who started the race but did not complete the first lap.

What the chart doesn’t show is what stage of the race the driver was in each position, and how long for. But I have an idea for another chart that could help there, as well as being able to reuse elements used in the chart shown here.

FWIW, the following fragment of R code shows the ggplot function used to create the chart. The data came from the ergast API, though it did require a bit of wrangling to get it into a shape that I could use to power the chart.

#Reorder the drivers according to a final ranked position
g=ggplot(finalPos,aes(x=reorder(driverRef,finalPos)))
#Highlight the points cutoff
g=g+geom_hline(yintercept=10.5,colour='lightgrey',linetype='dotted')
#Highlight the position each driver was in on their final lap
g=g+geom_point(aes(y=position,size=lap),colour='red',alpha=0.15)
#Highlight the grid position of each driver
g=g+geom_point(aes(y=grid),shape=0,size=7,alpha=0.2)
#Highlight the position of each driver at the end of the first lap
g=g+geom_point(aes(y=lap1pos),shape=1,size=7,alpha=0.2)
#Provide a count of how many laps each driver held each position for
g=g+geom_text(data=posCounts,
              aes(x=driverRef,y=position,label=poscount,alpha=alpha(poscount)),
              size=4)
#Number of laps completed by driver
g=g+geom_text(aes(x=driverRef,y=-1,label=lap,fontface=ifelse(is.na(classification), 'italic' , 'bold')),size=3,colour='grey')
#Record the status of each driver
g=g+geom_text(aes(x=driverRef,y=-2,label=ifelse(status!='Finished', status,'')),size=2,angle=30,colour='grey')
#Styling - tidy the chart by removing the transparency legend
g+theme_bw()+xRotn()+xlab(NULL)+ylab("Race Position")+guides(alpha=FALSE)

The fully worked code can be found in forthcoming update to the Wrangling F1 Data With R living book.

Written by Tony Hirst

November 21, 2014 at 6:21 pm

Posted in Rstats

Tagged with ,

Looking for an Alternative to Twitter – and Vodafone…

with 5 comments

Whilst at an event over the weekend – from which I would generally have tweeted once or twice – I got the following two part message in what I regard as my personal Twitter feed to my phone:

1/2: Starting Nov 15, Vodafone UK will no longer support some Twitter SMS notifications. You will still be able to use 2-factor authentication and reset your pa
2/2: ssword over SMS.However, Tweet notifications and activity updates will cease. We are very sorry for the service disruption.

The notification appeared from a mobile number I have listed as “Twitter” in my contacts book. This makes both Twitter and Vodafone very much less useful to me – direct messages and mentions used to come direct to my phone as SMS text messages, and I used to be able to send tweets and direct messages to the same number (Docs: Twitter SMS commands). The service connected an SMS channel on my personal/private phone, with a public address/communication channel that lives on the web.

Why not use a Twitter client? A good question that deserves and answer that may sound foolish: Vodafone offers crappy coverage in the places I need it most (at home, at in-laws) where data connections are as good as useless. At home there’s wifi – and other screens running Twitter apps – elsewhere there typically isn’t. I also use my phone at events in the middle of fields, where coverage is often poor and even getting a connection can be chancy. (Client apps also draw on power – which is a factor on a weekend away at an event where there may be few recharging options; and they’re often keen to access contact book details, as well as all sorts of other permissions over your phone.)

So SMS works – it’s low power, typically on, low bandwidth, personal and public (via my Twitter ID).

But now both my phone contract and Twitter are worth very much less to me. One reason I kept my Vodafone contract was because of the Twitter connection. And one reason I stick with Twitter is the SMS route I have – I had – to it.

So now I’m looking for an alternative. To both.

I thought about rolling my own service using an SMS channel on IFTT, but I don’t think it supports/is supported by Vodafone in the UK? (Do any mobile operators support it? It so, I think I may have to change to them…)

ifft

If I do change contract though, I hope it’s easier that the last contract we tried are still trying – to kill. After several years it seems a direct debit on an old contract is still going out; after letters – and a phone call to Vodafone where they promised the direct debit was cancelled – it pops up again, paying out again, the direct debit that will never die. This week we’ll try again. Next month, if it pops up again, I guess we need to call on the ombudsman.

I guess what I’d really like is a mobile operator that offers me an SMS gateway so that I can call arbitrary webhooks in response to text messages I send, and field web requests that can then forward messages to my phone. (Support for the IFTT SMS channel would be almost as good.)

From what I know of Twitter’s origins (as twttr), the mobile SMS context was an important part of it – “I want to have a dispatch service that connects us on our phones using text” @Jack [Dorsey, Twitter founder] possibly once said (How Twitter Was Born). Text still works for me in the many places I go where wifi isn’t available and data connections are unreliable and slow, if indeed they’re available at all. (If wifi is available, I don’t need my phone contract…)

The founding story continues: I remember that @Jack’s first use case was city-related: telling people that the club he’s at is happening. If Vodafone and Twitter hadn’t stopped playing over the weekend, I’d have tweeted that I was watching the Wales Rally (WRC/FIA World Rally Championship) Rallyfest stages in North Wales at Chirk Castle on Saturday, and Kinmel Park on Sunday. As it was, I didn’t – so WRC also lost out on the deal. And I don’t have a personal tweet record of the event I was at.

If I’m going to have to make use of a web client and data connection to make use of Twitter messaging, it’s probably time to look for a service that does it better. What’s WhatsApp like in this respect?

Or if I’m going to have to make use of a web client and data connection to make use of Twitter messaging, I need to find a mobile operator that offers reliable data connections in places where I need it, because Vodafone doesn’t.

Either way, this cessation of the service has made me realise where I get most value from Twitter, and where I get most value from Vodafone, and it was in a combination of those services. With them now separated, the value of both to me are significantly reduced. Reduced to such an effect that I am looking for alternatives – to both.

Written by Tony Hirst

November 17, 2014 at 10:43 am

Posted in Anything you want

Teaching Material Analytics

leave a comment »

A couple of weeks ago, I had a little poke around some of the standard reports that we can get out of the OU VLE. OU course materials are generated from a structured document format – OU XML – that generates one or more HTML pages bound to a particular Moodle resource id. Additional Moodle resources are associated with forums, admin pages, library resource pages, and so on.

One of the standard reports provides a count of how many times each resource has been accessed within a given time period, such as a weekly block. Data can only be exported for so many weeks at a time, so to get stats for course materials over the presentation of a course (which may be up to 9 months long) requires multiple exports and the aggregation of the data.

We can then generate simple visual summaries over the data such as the following heatmap.

course_material_usage

Usage is indicated by colour density, time in weeks are organised along horizontal x-axis. From the chart, we can clearly see waves of activity over the course of the module as students access resources associated with particular study weeks. We can also see when materials aren’t
being accessed, or are only being accessed by a low number of times (that is, necessarily by a low proportion of students. If we get data about unique user accesses or unique user first use activity, we can get a better idea about the proportion of students in a cohort as a whole accessing a resource).

This sort of reporting – about material usage rather than student attainment – was what originally attracted me to thinking about data in the context of OU courses (eg Course Analytics in Context). That is, I wasn’t that interested in how well students were doing, per se, or interested in trying to find ways of spying on individual students to build clever algorithms behind experimental personalisation and recommender systems that would never make it out of the research context.

That could come later.

What I originally just wanted to know was whether this resource was ever looked at, whether that resource was accessed when I expected (eg if an end of course assessment page was accessed when students were prompted to start thinking about it during an exercise two thirds of the way in to the course), whether students tended to study for half an hour or three hours (so I could design the materials accordingly), how (and when) students searched the course materials – and for what (keyphrase searches copied wholesale out of the continuous assessment materials) and so on.

Nothing very personal in there – everything aggregate. Nothing about students, particularly, everything about course materials. As a member of the course team, asking how are the course materials working rather than how is that student performing?

There’s nothing very clever about this – it’s just basic web stats run with an eye to looking for patterns of behaviour over the life of a course to check that the materials appear to be being worked in the way we expected. (At the OU, course team members are often a step removed from supporting students.)

But what it is, I think, is an important complement to the “student centred” learning analytics. It’s analytics about the usage and utilisation of the course materials, the things we actually spend a couple of years developing but don’t really seem to track the performance of?

It’s data that can be used to inform and check on “learning designs”. Stats that act as indicators about whether the design is being followed – that is, used as expected, or planned.

As a course material designer, I may want to know how well students perform based on how they engage with the materials, but I should really to know how the materials are being utilised, because they’re designed to be utilised in a particular way? And if they’re not being used in that way, maybe I need to have a rethink?

Written by Tony Hirst

November 14, 2014 at 12:57 pm

Posted in Analytics, Anything you want

Tagged with

Follow

Get every new post delivered to your Inbox.

Join 1,190 other followers