Fragmentary Course Data Thoughts… From XCRI-CAP1.2 to CourseData API?

A ten minute post, not thought through, just collecting together some thoughts arising from a Twitter exchange with @jmahoney127, @jacksonj04, @cgutteridge and @mhawksey that followed from picking up on a comment from Alan Paull re: the latest demo OU XCRI feed that I don’t want to forget in the midst of doing other unrelated stuff today… (Additional fragmentary thoughts in that comment thread.)

Looking at the latest OU XCRI-CAP 1.2 feed, it seems to be a little bit richer than last time I looked:

OU xcri-cp1.2 feed

Here’s a snapshot of the University of Lincoln feed:

Uni of lincoln xcricap2

This represents a partial view of data available via the ONCourse API, I think?

Which leads to a couple of questions:

1) is all the data required to publish an XCRI-CAP1.2 feed available via the ONCourse API? That is, could I generate the XCRI feed from calls to the API?

2) what is available via the API that isn’t in the XCRI feed?

3) could we create an API over the XCRI feed data that resembles (in part) the ONCourse API? That is, for an institution that has an XCRI-CAP 1.2 feed but no additional #coursedata development resources, could we create a simple JSON API layer that offers at least some of the ONCourse API functionality? And would the functionality/data that such an API could make available actually be rich enough or complete enough to actually do anything useful with?

Note: one of the advantages of the XCRI-CAP1.2 feed is that it can be used as a bulk transport format, a convenient source for a crude database of all the current (or future?) course offerings provided by an institution. It can also be though of as an XML database of an instition’s course offerings.

Via @mhawksey, a link to a current CottageLabs XCRI demonstrator project, which in turn leads to a Github repo that includes an XCRI aggregator scraper (there is some sort of irony there…?) and an XCRI2JSON converter.

SOmething else that also comes to mind: A Developers’ Guide to the Linked Data APIs that demonstrated the creation of RESTful, JSON producing APIs over Linked Data datasets.

Right… now for that other stuff…

PS via @alanepaull, JISC #coursedata programme blogs

Getting ONCourse Data Flowing into Google Spreadsheets

One of the challenges that needs to be addressed when developing data driven applications is how to get data from a data base to the point of use. Another relates to time-relevance. For example, in one document you might want to look at figures or data relating to a specific, fixed date or period (March 2012, for example, or the academic years 2012-13). At other times, you might want current data: values of some indicator over the last 6 weeks for example. Where live data feeds are available, you might want them to be displayed on a dashboard. In other circumstances, you might want a more traditional printed report or Powerpoint presentation, but one that contains absolutely up to the minute information. Ideally, the report or presentation would be “self-updating” so that each time you printed it off, it contained the latest data values.

One common user environment for data related activities is a spreadsheet. When developing an API that produces data that might be usefully manipulated within a spreadsheet context, it can be useful to provide a set of connectors that allow data to be pulled directly from the API and inserted into the spreadsheet.

Here’s a quick example of how we might start to pull data into a Google Spreadsheets context from the University of Lincoln course data API. The example calls broadly reproduce those described in Getting ONCourse With Course Data.

Although the new script management tools in the Google Apps environment confuse the process of defining spreadsheet specific scripts, at its heart defining custom spreadsheet functions for Google Spreadsheets is a trivial exercise: create a custom function (for example, myFunction(val)), pass in a value (such as a cell value) and return a value or array. You can can call the function using a formula, =myFunction(A3), for example. If a single value is returned, this will be returned into the calling cell. If a one-dimensional list is returned, it will add alues to the current row. If a two dimensional array is returned, it will populate rows and columns.

Here are a few helper functions for calling the ONCourse data into a Google Spreadsheet:

function signed(url){
  //Use this to add the API key, for example:
  //return url+'?key='+KEYVAL 
  return url
}

function gurl(url){
  //Fetch the JSON data and parse it as such
  raw=UrlFetchApp.fetch(signed(url))
  json=Utilities.jsonParse(raw.getContentText())

  return json
}
 
function jgrabber(u){
  root='https://n2.online.lincoln.ac.uk'
  return gurl(root+u)
}

function grabModulesId(id){
  path= '/modules/id/'+id
  json= jgrabber(path )
  return json['result']
}

function grabProgrammesId(id){
  path= '/programmes/id/'+id
  json= jgrabber(path )
  return json['result']
}

function assessmentByModuleID(moduleID){
  d=grabModulesId(moduleID)
  return d['assessments']
}

Let’s now see if we can pull data into the spreadsheet, and try to get a feel for whether this sort of approach looks as if it may be useful…

First up, how about getting a list of programmes associated with a module?

function printProgsFromModuleID(moduleID){
  d=grabModulesId(moduleID)
  var arr = [];
  c=[]
  c.push(d['title'])
  c.push(d['module_code']['code'])
  c.push(d['level']['description'])
  c.push(d['credit_rating'] + ' points')
  arr.push(c)
  arr.push([])
  c=['Programme', 'Course', 'Course Code']
  arr.push(c)
  for (var i=0;i< d['module_links'].length;i++){
   r=d['module_links'][i]['programme']
   c=[ r['programme_title'], r['course_title'], r['course_code']['code'] ]
   arr.push(c)
  }
  return arr
}

Here’s the result:

printProgsFromModuleID

How about learning outcomes per module (via the assessments associated with each module)?

function learningOutcomesByModuleID(moduleID){
  d=assessmentByModuleID(moduleID)
  arr=[]
  for (i=0; i<d.length; i++){
    r=d[i]
    assessment=gurl(r['nucleus_url'])['result']

    c=[ assessment['module']['title'], assessment['module']['module_code']['code'], assessment['assessment_method'] ] 
    arr.push(c)

    learningOutcomes=assessment['learning_outcomes']
    for (j=0; j< learningOutcomes.length; j++)
      arr.push([ '','', learningOutcomes[j]['description'] ])
  }
  return arr
}

Which gives us:

learningOutcomesByModule

And how about programme outcomes for a particular programme, broken out into outcome types:

function programmeOutcomesByProgrammeID(id){
  d=grabProgrammesId(id)
  arr=[]
  var types= new Object();
  for (i=0;i<d['programme_outcomes'].length; i++){
    r=d['programme_outcomes'][i]
    if (types[ r['category']['title'] ] == undefined ){
      types[r['category']['title']]=[]
    }
    types[r['category']['title']].push(r['description'])
  }

  for ( j in types )
    arr.push([j,types[j] ])
  return arr
}

The result? A handy table:

programmeOutcomesByProgrammeID

Okay, so that all seems easy enough. I think that Google docs are also scriptable, so it should be possible to populate a templated “word” document using data pulled from the API (for example, using a snippet like this one of Martin Hawksey’s: Google Apps Script to fill in a Document template with Spreadsheet data).

One thing I realised about the API from playing with this recipe was that it is defined very much as a little-l, little-d “linked data” API that works well for browsing the data. The well defined URIset and use of internal unique identifiers make it easy to traverse the data space once you are in it. However, it’s not immediately obvious to me how I could either search my way into the data, or access it via natural identifiers such as programme codes or module codes.

[Seems I missed a trick on the API front… As described in APMS -> Nucleus -> APIs. How? Why?, it seems that you could get details of all assessments that are the final assessment for the module, that contain group work at: https://n2.online.lincoln.ac.uk/assessments?access_token=ACCESSTOKEN&group_work=1&final_assessment=1]

For example, to print the programmes associated with a module, I might use the above formula =printProgsFromModuleID(97), but the identifier I need to pass in (97) is an internal, machine generated ID. This is all well and good if you are working within the machine-ID space, but this is not the human user space. For that, it would be slightly more natural to make use of a module code, and call something like printProgsFromModuleCode('FRS2002M'). There are issues with this approach of course, with module codes being carried over from presentation to presentation, and breaking a simple bijective (one-one onto) relationship between internal module IDs and module codes. As a result, we might need to further qualify these calls with a presentation year (or by default assume the presentation from the current academic year), or whatever other arguments are required to recapture the bijectivity.

PS in passing, here’s another view over modules by programme, broken down into core and option modules.

modules by programme

There are a few problems with this view – for example, the levels need ordering properly; where there are no core or no optional modules the display is not as good as it could bel the label sizing is a bit small; there is no information relating to pre-requisites for optional modules – but it’s a start, and it’s reasonably clean to look at.

It should also be easy enough to tweak the data generator script to allow us to use the same display script to show assessment types for each module in a programme, as demonstrated using enclosure charts in VIsually Revealing Gaps in ONCourse Data , and maybe even learning outcomes too.

If I get a chance, I’ll look through the sorts of thing requested in the ONCourse Focus Group – 14th March 2012 and try to pull out some views that match some of the requested ones. What would also be interesting would be to have a list of use cases from people who work with the data, too…

VIsually Revealing Gaps in ONCourse Data

One of the things that I never really, fully, truly appreciated until I saw Martin Hawksey mention it (I think in an off the cuff comment) at a JISC conference session a year or two ago (I think?!) was how we can use visualisations to quickly spot gaps, or errors in a data set.

Following on from my previous University of Lincoln course data explorations (see Getting ONCourse With Course Data and Exploring ONCourse Data a Little More… for more details), I have another little play, this time casting the output of course data around a programme into a hierarchical tree format that can be used to feed visualisations in d3.js or Gephi (code; call the data using URLs rooted on https://views.scraperwiki.com/run/uni_of_lincoln_oncourse_tree_json/ Parameters include: progID (programme ID using the n2 ID for a programme); full=true|contact|assessment (true generates a size attribute for a module relative to the points associated with the module; assessment breaks out assessment components for each module in the programme; contact breaks out contact hours for each module; format=gexf|json gives the output in GEXF format or d3.js tree JSON format. Note: it should be easy enough to tweak the code to accept a moduleID and just display a view for the assessment of contact time breakdown for the module, or a programme ID and level and just display the contact or assessment details for the modules in a particular level, or further core/option attribute to only display core or optional modules etc.).

I also did a quick demo of how to view the data using a d3.js enclosure layout. For example, here’s a peek at how core and option modules are distributed in a programme, by level, with the contact times broken out:

oncouerse contact time

The outermost bubble is a programme, the next largest bubbles represent level (Level 1, Level 2, etc corresponding to first year, or second year of an undergraduate degree programme etc). The structure then groups modules within a level as either core or option (not shown here? All modules are core for this programme, I think…?) and then for each module breaks out the contact time in hours (which proportionally set the size of contact circles) and the contact type.

The labelling is lacking in a flat image view, (if you hover over elements you get tooltips popping up, that identify module codes, whether a set of modules are core or option, the level of a set of modules etc., but in places the layout is hit and miss; for example, in the above example, one module must have 100% lectures, so we don’t see a module bubble around it. I maybe need to see each module bubble with a dummy/null bubble of size 0, if that’s possible, to force a contact bubble to be visibly enclosed by a module bubble?)

Here’s another example, this time breaking out assessment types:

oncourse assessment enclosure

Hovering over these two images (click through them to see the live version), I noticed that form this programme, there don’t appear to be any optional modules, which may or may not be “interesting”?

Looking at the first, contact time displaying image, we also notice that several modules do not have any contacts broken out. Going back to the data, the corresponding element for these modules is actually an empty list, suggesting the data is not available. What these views give us, then, is a quick way exploring not only how a programme is structured, but which modules are lacking data. This is something that the treemaps in Exploring ONCourse Data a Little More… did not show – they displayed contact or assessment breakdowns across a course only for modules where that data was available, which could be misleading. (Note that I should try to force the display of a small empty circle showing the lack of core or option modules if a programme has no modules in that category?)

Something else that’s lacking with the visualisations is labeling regarding which programme is being displayed, etc. But it should also be noted that my intention here is not to generate end user reports, it’s to explore what sorts of views over the course data we might get using off the shelf visualisation components, (testing the API along the way), how this might help us conceptualise the sorts of structures, stories and insights that might be locked up in the data and how visual approaches might help us open up new questions to ask over, or more informative reports to drawn down from, the API served data itself.

Anyway, enough for now. The last thing I wanted to try was pulling some of the API called data into an R environment and demonstrate the generation of a handful of reports in that context, but I’ve run out of time just now…

Exploring ONCourse Data a Little More…

Following on from my preliminary tinkering with the University of Lincoln course data API, I had a bit more of a play last night…

In particular, I started looking at the programme level to see how we might be able to represent the modules contained within it. Modules are associated with individual programmes, a core attribute specifying whether a module is “core” (presumably that means required).

oncourse programme core module

I thought it might be interesting to try to pull out a “core module” profile for each programme. Looking at the data available for each module, there were a couple of things that immediately jumped out at me as being “profileable” (sp?!) – assessment data (assessment method, weighting, whether it was group work or not):

oncourse assessment

and contact times (that is, hours of contact and contact type):

oncourse contact

I had some code kicking around for doing treemap views so I did a couple of quick demos [code]. First up, a view over the assessment strategy for core modules in a particular programme:

oncourse 0 assessment treemap

(I really need to make explicit the programme…)

Blocks are coloured according to level and sized according to product of points value for the course and assessment type weighting.

We can also zoom in:

oncourse assesssment treemap zoom

I took the decision to use colour to represent level, but it would also be possible to pull the level out into the hierarchy used to configure the tree (for example having Level 1, Level 2, Level 3 groupings at the top?) I initially used the weighting to set the block size, but then tweaked it to show the product of the weighting and the number of credit points for the module, so for example a 60 point module with exam weighting 50% would have size 60*0.5 = 30, whereas a 15 point module with exam weighting 80% would have size 15*0.8 = 12.

Note that there are other ways we could present these hierarchies. For example, another view might structure the tree as: programme – level – module – assessment type. Different arrangements can tell different stories, be differently meaningful to different audiences, and be useable in different ways… Part of the art comes from picking the view that is most appropriate for addressing a particular problem, question or intention.

Here’s an example of view over the contact hours associated with core modules in the same programme:

oncourse contact hours treemap

(Note that I didn’t make use of the group work attribute which should probably also be added in to the mix?).

Looking at different programmes, we can spot different sorts of profile. Note that there is a lot wrong with these visualisations, but I think they do act as a starting point for helping us think about what sorts of view we might be able to start pulling out of the data now it’s available. For example, how are programmes balanced in terms of assessment or contact over their core modules? One thing developing the above charts got me thinking about was how to step up a level to allow comparison of core module assessment and contact profiles across programmes leading to a particular qualification, or across schools? (I suspect a treemap is not the answer!)

It’s also worth noting that different sorts of view might be appropriate for different sorts of “customer”: potential student choosing a programme, student on a programme, programme manager, programme quality committee, and so on.

And it’s also worth noting that different visualisation types might give a more informative view over the same data structure. On my to do list is to have a play with viewing the data used above in some sort of circular enclosure diagram (or whatever that chart type is called!) for example. (See the next post in this series, which amongst other things made me realise how partial/fragmentary the data displayed in the above treemaps actually is…)

Having had a play, a couple more observations came to mind about the API. Firstly, it could be useful to annotate modules with a numerical (integer) attribute relating to a standardised educational level, such as the FHEQ levels. (At the moment, modules are given level descriptors along the lines of “Level 3”, relating to a third year course, akin to FHEQ level 6). Secondly, relating to assessment, it might be useful (down the line) to know how the grade achieved in a module at a particular level contributes to the final grade achieved at the end of the programme.

Playing with the data, I also found a little bug: in resources of the form https://n2.online.lincoln.ac.uk/module_links/id/273, the nuclues_url is incorrectly given as https://n2.online.lincoln.ac.uk/module_link/id/273 (i.e. the incorrect path element module_link which should be module_links).

Getting ONCourse With Course Data

In a post a couple of weeks ago – COuRsE Data – I highlighted several example questions that might usefully be asked of course data, thinks like “which modules are associated with any particular qualification?” or “which modules deliver which qualification level learning outcomes?”.

As the University of Lincoln ONCourse project comes to an end [disclosure: I am contracted on the project to do some evaluation], I thought it might be interesting to explore the API that’s been produced to see just what sorts of question we might be able to ask, out of the can.

Note that I don’t have privileged access to the database (though I could possibly request read access, or maybe even a copy of the database, or at least its schema), but that’s not how I tend to work. I play with things that are in principle publicly available, ideally things via openly published URLs and without authentication (no VPN); using URL parameter keys is about as locked down as I can usually cope with;-)

So what’s available? A quick skim of the Objects returned via the API turns up some potentially interesting sounding ones in a course data context, such as: Accrediting Body, Assessment, Contact Type and Time, Course Code, Delivery Mode, Learning Outcome, Module Code, Programme Outcome.

So let’s just pick one and see how far we can get… We can start at the bottom maybe, which is presumably Module.

A call to the API on /modules returns listings of the form:

{
  id: 97,
  nucleus_url: "https://n2.online.lincoln.ac.uk/modules/id/97",
  module_code: {
    id: 28,
    nucleus_url: "https://n2.online.lincoln.ac.uk/module_codes/id/28",
    code: "FRS2002M"
  },
  title: "Advanced Crime Scene Investigation and Analysis 2012-13",
  level: {
    id: 2,
    nucleus_url: "https://n2.online.lincoln.ac.uk/levels/id/2",
    description: "Level 2"
  }
}

Looking at the record for an actual module gives us a wealth of data, including: the module code, level and number of credit points; a synopsis and marketing synopsis; an outline syllabus (unstructured text, which could cause layout problems?); the learning and teaching strategy and assessment strategy; a set of module links to programmes the module is associated with, along with core programme data; a breakdown of contact time and assessments; and prerequisites, co-requisites and excluded combinations.

There does appear to be some redundancy in the data provided for a module, though this is not necessarily a bad thing in pragmatic terms (it can make like convenient). For example, the top level of a modules/id record looks like this:

n2 modules-id top level

and lower down the record, in the /assessments element we get duplication of the data:

n2 assessments module data dupe

As something of a tinkerer, this sort of thing works for me – I can grab the /assessments object out of a modules/id result and pass it around with the corresponding module data neatly bundled up too. But a puritan might take issue with the repetition…

Out of the box then, I can already start to write queries on a module if I have the module ID. I’ll give some code snippets of example Python routines as I play my way through the API…

import urllib2, json
def gurl(url):
  return json.load(urllib2.urlopen(url))

def jgrabber(u):
  root='https://n2.online.lincoln.ac.uk'
  return gurl(root+u)

#Note that API calls are actually signed with a key (not show in URLs)
def dd(): print "-----"

#Can I look up the programmes associated with a module given its ID?
def grabModulesId(id):
  return jgrabber( '/modules/id/'+str(id) )['result']

def printProgsFromModuleID(moduleID,d=''):
  if d=='': d=grabModulesId(moduleID)
  dd()
  print 'Module:', d['title'], '(', d['module_code']['code'], d['level']['description'], ',', d['credit_rating'], 'points)'
  dd()
  for r in d['module_links']:
    print 'Programme:', r['programme']['programme_title']
    print 'Course:', r['programme']['course_title'], "(",r['programme']['course_code']['code'],")"
    dd()

'''Example result:

>>> printProgsFromModuleID(97)
-----
Module: Advanced Crime Scene Investigation and Analysis 2012-13 ( FRS2002M Level 2 , 30 points)
-----
Programme: Forensic Science
Course: Forensic Science Bachelor of Science with Honours (BSc (Hons)) 2011-12 ( FRSFRSUB )
-----
Programme: Criminology and Forensic Invesitgation
Course: Criminology and Forensic Invesitgation Bachelor of Science with Honours (BSc (Hons)) 2011-12 ( CRIFRSUB )
'''

The /module_codes call then allows us to see when this module was presented:

#When was a module presented?
def grabModuleCodesId(id):
  return jgrabber( '/module_codes/id/'+str(id) )['result']

def printPresentationsFromModuleCode(moduleCode,d=''):
  if d=='': d=grabModuleCodesId(moduleCode)
  dd()
  print 'Presentations of',d['code']
  dd()
  for r in d['modules']:
    print r['title']
  dd()

'''Example result:

>>> printPresentationsFromModuleCode(28)
-----
Presentations of FRS2002M
-----
Advanced Crime Scene Investigation and Analysis 2010-11
Advanced Crime Scene Investigation and Analysis 2011-12
Advanced Crime Scene Investigation and Analysis 2012-13
Advanced Crime Scene Investigation and Analysis 2013-14
-----
'''

We can also tweak the first query to build a bigger report. For example:

#Combination effect
def getModuleCodeFromModuleID(moduleID,d=''):
  if d=='':d=grabModulesId(moduleID)
  return d['module_code']['id']

moduleID=97

d=grabModulesId(moduleID)
printProgsFromModuleID(moduleID,d)

moduleCode = getModuleCodeFromModuleID(moduleID,d)
printPresentationsFromModuleCode(moduleCode)

One thing I’m not sure about is the way in to a moduleID in the first instance? For example, I’d like to be able to straightforwardly be able to get the IDs for any upcoming presentations of FRS2002M in the current academic year?

So what else might we be able to do around a module? How about check out its learning outcomes? My first thought was the learning outcomes might be available from the /modules data (i.e. actual module presentations), but they don’t seem to be there. Next thought was to look for them in the abstracted module definition (from the module_codes), but again: no. Hmmm… Assessments are associated with a module, so maybe that’s where the learning outcomes come in (as subservient to assessment rather than a module?)

The assessment record in the module data looks like this:

n2 modules assessement

And if we click to to an assessment record we get something like this, which does include Learning Outcomes:

n2 assessment LOs

Philosophically, I’m not sure about this? I know that assessment is supposed to be tied back to LOs, and quality assurance around a course and its assessment is typically a driver for the use of LOs. But if we can only find the learning outcomes associated with a module via its assessment..? Hmmm… Data modeling is often fraught with philosophical problems, and I think is is one such case?

Anyway, how might we report on the learning outcomes associated with a particular module presentation? Here’s one possible way:

#Look up assessments
def assessmentByModuleID(moduleID,d=''):
  if d=='':d=grabModulesId(moduleID)
  return d['assessments']

#Look up learning outcomes
def learningOutcomesByModuleID(moduleID,d=''):
  if d=='': d=assessmentByModuleID(moduleID)
  dd()
  for r in d:
    assessment=gurl(r['nucleus_url'])['result']
    print assessment['module']['title'], '(', assessment['module']['module_code']['code'], ')', assessment['assessment_method'],'Learning Outcomes:'
    learningOutcomes=assessment['learning_outcomes']
    for learningOutcome in learningOutcomes:
      print '\t- ',learningOutcome['description']
    dd()

#Example call
learningOutcomesByModuleID(97)

'''Example output

Advanced Crime Scene Investigation and Analysis 2012-13 ( FRS2002M ) Exam Learning Outcomes:
	-  Identify factors which affect persistence and transfer of trace evidence materials and demonstrate practical skills in recovery and physical and chemical analysis of trace evidence using a variety of specialised techniques
	-  Discriminate different types of hairs and fibres on the basis of morphology and optical (e.g. birefringence) and chemical properties (e.g. dye composition)
...
'''

We can also query the API to get a list of learning outcomes directly, and this turns up a list of assessments as well as the single (?) module associated with the learning outcome. Does this mean that a particular learning outcome can’t be associated with two modules? I’m not sure that’s right? Presumably, database access would allow us to query learning outcomes by moduleID?

Okay – that’s probably enough for now. The API seems to be easy enough to use, and I guess it wouldn’t be too hard to come up with some Google Spreadsheet formulae to demonstrate how they could be used to pull the course data into that sort of workspace (eg along the lines of Using Data From Linked Data Datastores the Easy Way (i.e. in a spreadsheet, via a formula)).

In the next post, I’ll a peak at some of the subject keywords, and maybe have a play at walking through the data in order to generate some graphs (eg maybe along the lines of what Jamie has tried out previously: In Search of Similar Courses). There;s also programme level outcomes, so it might be interesting trying do do some sort of comparison between those and the learning outcomes associated with assessment in modules on a programme. And all this, of course, without (too much) privileged access…

PS Sod’s Law from Lincoln’s side means that even though I only touched a fragment of the data, I turned up some errors. So for example, in the documentation on bitbucket the award.md “limit” var was class as a “bool” rather than an “int” (which suggests a careful eye maybe needs casting over all the documentation. Or is that supposed to be my job?! Erm…;-) In programme/id=961 there are a couple of typos: programme_title: “Criminology and Forensic Invesitgation”, course_title: “Criminology and Forensic Invesitgation Bachelor of Science with Honours (BSc (Hons)) 2011-12”. Doing a bit of text mining on the data and pulling out unique words can help spot some typos, though in the above case Invesitgation would appear at least twice. Hmmm…

Looking at some of the API calls, it would be generally useful to have something along the lines of offset=N to skip the first N results, as well as returning “total-Results=” in the response. Some APIs provide helper data along the lines of “next:” where eg next=offset+num_results that can be plugged in as the offset in the next call if you want to roll your own paging. (This may be in the API, but I didn’t spot it?). When scripting this, care just needs be taken to check that a fence post error doesn’t sneak in.

Okay – enough for now. Time to go and play in the snow before it all melts…

eSTEeM Project: Library Website Tracking For VLE Referrals

Assuming my projects haven’t been cut out at the final acceptance stage because I haven’t yet submitted a revised project plan,

Preamble
As OU courses are increasingly presented through the VLE, many of them opt to have one or more “Library Resources” pages that contain links to course related resources either hosted on the OU Library website or made available through a Library operated web service. Links to Library hosted or moderated resources may also appear inline in course content on the VLE. However, at the current time, it is difficult to get much idea about the extent to which any of these resources are ever accessed, or how students on a course make use of other Library resources.

With the state of the collection and reporting of activity data from the VLE still evolving, this project will explore the extent to which we can make use of data I do know exists, and to which I do have access, specifically Google Analytics data for the library.open.ac.uk domain.

The intention is to produce a three-way reporting framework using Google Analytics for visitors to the OU Library website and Library managed resources from the VLE. The reports will be targeted at: subject librarians who liaise with course teams; course teams; subscription managers.

Google Analytics (to which I have access) are already running on the library website and the matter just(?!) arises now of:

1) Identifying appropriate filters and segments to capture visits from different courses;

2) development of Google Analytics API wrapper calls to capture data by course or resource based segments and enable analysis, visualisation and reporting not supported within the Google Analytics environment.

3) Providing a meaningful reporting format for the three audience types. (note: we might also explore whether a view over the activity data may be appropriate for presenting back to students on a course.)

The Project
The OU Library has been running Google Analytics for several year, but to my knowledge has not started to exploit the data being collected as part of a reporting strategy on the usage of library resources resulting from referrals from the VLE. (Whenever a user clicks on a link in the VLE that leads to the Library website, the Google Analytics on the Library website can capture that fact.)

At the moment, we do not tend to work on optimising our online courses as websites so that they deliver the sorts of behaviour we want to encourage. If we were a web company, we would regularly analyse user behaviour on our course websites and modify them as a result.

This project represents the first step in a web analytics approach to understanding how our students access Library resources from the VLE: reporting. The project will then provide the basis for a follow on project that can look at how we can take insight from those reports and make them actionable, for example in the redesign of the way links to library resources are presented or used in the VLE, or how visitors from the VLE are handled when they hit the Library website.

The project complements work that has just started in the Library on a JISC funded project to making journal recommendations to students based on previous user actions.

The first outcome will be a set of Google Analytics filters and advanced segments tuned to the VLE visitor traffic and resource usage on the Library website. The second will be a set of Google analytics API wrappers that allow us to export this data and use it outside the Google Analytics environment.

The final deliverables are three report types in two possible flavours:

1) a report to subject librarians about the usage of library resources from visitors referred from the VLE for courses they look after

2) a report to librarians responsible for particular subscription databases showing how that resource is accessed by visitors referred from the VLE, broken down by course

3) a report to course teams showing how library resources linked to from the VLE for their course are used by visitors referred to those resources from the VLE.

The two flavours are:

a) Google analytics reports

b) custom dashboard with data accessed via the Google Analytics API

Recommendations will also be made based on the extent to which Library website usage by anonymous students on particular OU courses may be tracked by other means, such as affinity strings in the SAMS cookie, and the benefits that may accrue from this more comprehensive form of tracking.

If course team members on any OU courses presenting over the next 9 months are interested in how students are using the library website following a referral from the VLE, please get in touch. If academics on courses outside the OU would like to discuss the use of Google Analytics in an educational context, I’d love to hear from you too:-)

eSTEeM is joint initiative between the Open University’s Faculty of Science and Faculty of Maths, Computing and Technology to develop new approaches to teaching and learning both within existing and new programmes.

TSO OpenUP Competition – Opening Up UCAS Data

Here’s the presentation I gave to the judging panel at the TSO OpenUp competition final yesterday. As ever, it doesn’t make sense with[out] (doh!) me talking, though I did add some notes in to the Powerpoint deck: Opening up UCAS Course Code Data

(I had hoped Slideshare would be able to use the notes as a transcript, bit it doesn’t seem to do that, and I can’t see how to cut and paste the notes in by hand?:-(

A quick summary:

The “Big Idea” behind my entry to the TSO competition was a simple one – make UCAS course data (course code, title and institution) avaliable as data. By opening up the data we make it possible for third parties to construct services and applications based around complete data skeleton of all the courses offered for undergraduate entry through clearing in a particular year across UK higher education.
The data acts as scaffolding that can be used to develop consumer facing applications across HE (e.g. improved course choice applications) as well as support internal “vertical” activities within HEIs that may also be transferable across HEIs.
Primary value is generated from taking the course code scaffolding and annotating it with related data. Access to this dataset may be sold on in a B2B context via data platform services. Consumer facing applications with their own revenue streams may also be built on top of the data platform.
This idea makes data available that can potentially disrupt the currently discovery model for course choice and selection (but in its current form, not in university application or enrollment), in Higher Education in the UK.

Here are the notes I doodled to myself in preparation for the pitch. Now the idea has been picked up, it will need tightening up and may change significantly! ;-) Which is to say – in this form, it is just my original personal opinion on the idea, and all ‘facts’ need checking…

  1. I thought the competition was as much about opening up the data as anything… So the original idea was simply that it would be really handy to have machine readable access to course code and course name information for UK HE courses from UCAS – which is presumably the closest thing we have to a national catalogue of higher education courses.

    But when selected to pitch the idea, it became clear that an application or two were also required, or at least some good business reasons for opening up this data…

    So here we go…

  2. UCAS is the clearing house for applying to university in the UK. It maintains a comprehensive directory of HE courses available in the UK.

    Postgraduate students and Open University students do not go through UCAS. Other direct entry routes to higher education courses may also be available.

    According to UCAS, in 2010, there were 697,351 applicants with 487,329 acceptances, compared with 639,860 applications and 481,854 acceptances in 2009. [ Slightly different figures in end of cycle report 2009/10? ]

    For convenience, hold in mind the thought that course codes could be to course marketing, what postcodes are for geo related applications… They provide a natural identifier that other things can be associated with.

    Associated with each degree course is a course code. UCAS course codes are also associated with JACS codes – Joint Academic Coding System identifiers – that relate to particular topics of study. “The UCAS course codes have no meaning other than “this course is offered by this institution for this application cycle”.” link]

    “UCAS course code is 4 character reference which can be any combination of letters and numbers.

    Each course is also assigned up to three JACS (Joint Academic Coding System) codes in order to classify the course for *J purposes. The JACS system was introduced for 2002 entry, and replaced UCAS Standard Classification of Academic Subjects (SCAS). Each JACS code consists of a single letter followed by 3 numbers. JACS is divided into subject areas, with a related initial letter for each. JACS codes are allocated to courses for the *J return.

    The JACS system is used by the Higher Education Statistics Agency (HESA), and is the result of a joint UCAS-HESA subject code harmonization project.

    JACS is also used by UK institutions to identify the subject matter of programmes and modules. These institutions include the Department for Innovation, Universities and Skills (DIUS), the Home Office and the Higher Education Funding Council for England (HEFCE).”

    Keywords: up to 10 keywords per course are allocated to each course from a restricted list of just over 4,500 valid keywords.
    “Main keyword: This is generally a broad subject category, usually expressed as a single word, for example ‘Business’.
    Suggested keyword (SUG): Where a search on a main keyword identifies more than 200 courses, the Course Search user is prompted to select from a set of secondary keywords or phrases. These are the more specific ‘Suggested keywords’ attached to the courses identified. For example, ‘Business Administration’ is one of a range of ‘Suggested keywords’ which could be attached to a Business course (there are more than 60 others to choose from). A course in Business Administration would typically have this as the ‘Suggested keyword’, with ‘Business’ as the main keyword.
    However, if a course only has a ‘Suggested keyword’ and not a related ‘Main keyword’, the course will not be displayed in any search under the ‘Main keyword’ alone.

    Single subject: Main keywords can be ticked as ‘Single subject’. This means that the course will be displayed by a keyword search on the subject, when the user chooses the ‘single subject’ option below. You may have a maximum of two keywords indicated as single subjects per course.”

    “Between January and March 2010, approximately 600,000 unique IP addresses access the UCAS course code search function. During the same time period, almost 5 million unique IP addresses accessed the UCAS subject search function.” [link]

    “New courses from 2012 will be given UCAS codes that should not be used for subject classification purposes. However, all courses will still be assigned up to three individual JACS3 codes based on the subject content of the course.

    An analysis of unique IP address activity on the UCAS Course Search has shown that very few searches are conducted using the course code, compared to the subject search function. UCAS Courses Data Team will be working to improve the subject search and course keywords over the coming year to enable potential applicants to accurately find suitable courses.” [link]

    Course code identifiers have an important role to play within a university administrations, for example in marshalling resources around a course, although they are not used by students. (On the other hand, students may have a familiarity with module codes.) Course codes identify courses that are the subject of quality assessment by the QAA. To a certain extent, a complete catalogue of course codes allows third parties to organise offerings based around UK higher education degrees in a comprehensive way and link in to the UCAS application procedure.

  3. If released as open data, and particularly as Linked Open Data, the course data can be used to support:
    – the release of horizontal data across the UK HE sector by HEIs, such as course catalogue information;
    – vertical scaffolding within an institution for elaboration by module codes, which in turn may be associated with module descriptions, reading lists, educational resources, etc.
    – the development across HE of services supporting student choice – for example “compare the uni” type services
  4. At the moment the data is siloed inside UCAS behind a search engine with unfriendly session based URLs and a poor results UI. Whilst it is possible to scrape or crowd-source course code information, such ad hoc collection mechanisms run the danger of being far from complete, which means that bias may be introduced into the collection as a side effect of the collection method.
  5. Making the data available via an API or Linked data store makes it easier for third parties to build course related services of whatever flavour – course comparison sites, badging services, resource recommendation services. The availability of the data also makes it easier for developers within an intsitution to develop services around course codes that might be directly transferable to, or scaleable across, other institutions.
  6. What happens if the API becomes writeable? An appropriately designed data store, and corresponding ingest routes, might encourage HEIs to start releasing the course data themselves in a more structured way.

    XCRI is JISC’s preferred way of doing this, and I think there has been some lobbying of HEFCE from various JISC projects, but I’m not sure how successful it’s been?

  7. Ultimately, we might be able to aggregate data from locally maintained local data stores. Course marketing becomes a feature of the Linked Data cloud.

    Also context of data burden on HEIs, reporting to Professional, Statutory and Regulatory Bodies – PSURBS.

    Reconciliation with HESA Institution and campus identifiers, as well as the JISCMU API and Guardian Datablog Rosetta Stone spreadsheet

    By hosting course code data, and using it as scaffolding within a Linked Data cloud around HE courses, a valuable platform service can be made available to HEIs as well as commercial operations designed to support student choice when it comes to selecting an appropriate course and university.

  8. Several recent JISC project have started to explore the release of course related activity data on the one hand, and Linked Data approaches to enterprise wide data management on the other. What is currently lacking is national data-centric view over all HEI course offerings. UCAS has that data.

    Opening up the data facilitates rapid innovation projects within HEIs, and makes it possible for innovators within an HEI to make progress on projects that span across course offerings even if they don’t have easy access to that data from their own institution.

  9. Consumer services are also a possibility. As HEIs become more businesslike, treating students as customers, and paying customers at that, we might expect to see the appearance of university course comparison sites.

    CompareTheUni has had a holding page up for months – but will it ever launch? Uni&Books crowd sources module codes and associated reading links. Talis Aspire is a commercial reading list system that associates resources with module codes.

  10. Last year, I pulled together a few separate published datasets and through them into Google Fusion Tables, then plotted the results. The idea was that you could chart research ratings against student satisfaction, or drop out rates against the academic pay. [link ]

    Guardian datablog picked up the post, and I still get traffic from there on a daily basis… [link ]

  11. The JISC MOSAIC Library data challenge saw Huddersfield University open up book loans data associated with course codes – so you could map between courses and books, and vice versa (“People who studied this course borrowed this book”, “this book was referred to by students on this course”)

    One demonstrator I built used a bookmarklet to annotate UCAS course pages with a link to a resource page showing what books had been borrowed by students on that course at Huddersfiled University. [Link ]

  12. Enter something like compare the uni, but data driven, and providing aggregated views over data from universities and courses.
  13. To set the scene, the site needs to be designed with a user in mind. I see a 16-17 year old, sloughing on the sofa, TV on with the most partial of attention being paid to it, laptop or tablet to hand and the main focus of attention. Facebook chat and a browser are grabbing attention on screen, with occasional distractions from the TV and mobile phone.
  14. The key is course data – this provides a natural set of identifiers that span the full range of clearing based HE course offerings in the UK and allows third parties to build servies on this basis.

    The course codes also provide hooks against which it may be possible to deploy mappings across skills frameworks, e.g. SFIA in IT world. The course codes will also have associated JACS subject code mappings and UCAS search terms, which in turn may provide weak links into other domains, such as the world of books using vocabularies such as the Library of Congress Subject headings and Dewey classification codes.

  15. Further down the line, if we can start to associate module codes with course codes, we can start to develop services to support current students, or informal learners, by hooking in educational resources at the module level.
  16. Marketing can go several ways. For the data platform, evangelism into the HE developer community may spark innovation from within HEIs, most likely under the auspices of JISC projects. Platform data services may also be marketed to third party developers and innovators/entrepeneurs.

    Marketing of services built on top of the data platform will need to be marketed to the target audience using appropriate channels. Specialist marketers such as Campus Group may be appropriate partners here.

  17. The idea pitched is disruptive in that one of the major competitors is at first UCAS. However, if UCAS retains it’s unique role in university application and clearing, then UCAS will still play an essential, and heavily trafficked, role in undergraduate student applications to university. Course discovery and selection will, however, move away from the UCAS site towards services that better meet the needs of potential applicants. One then might imagine UCAS becoming a B2B service that acts as intermediary between student choice websites and universities, or even entertain a scenario in which UCAS is disintermediated and some other clearing mechanism instituted between universities and potential-student facing course choice portals.
  18. According to UCAS, between January and March 2010 “almost 5 million unique IP addresses accessed the UCAS subject search function” [link] In each of the last couple of years, the annual application/acceptance numbers have been of the order approx 500,000 students intake per year, on 600,000 applicants. If 10% of applicants and generate £5 per applicant, that’s £300k pa. £10 from 20% of intake, that’s £1M pa. £50 each from 40% is £10M. I haven’t managed to find out what the acquisition cost of a successful applicant is, or the advertising budget allocated to an undergraduate recruitment marketing campaign, but there are 200 or so HE institutions (going by the number of allocated HESA institution codes).

    For platform business – e.g. business model based around selling queries on linked/aggregated/mapped datasets. If you imagine a query returning results with several attributes, each result is a row and each attribute is a column, If you allow free access to x thousand query cells returned a day, and then charge for cells above that limit, you:
    Encourage wider innovation around your platform; let people run narrow queries or broad queries. License on use of data for folk to use on their own datastores/augmented with their own triples.
    Generate revenue that scales on a metered basis according to usage;
    – offer additional analytics that get your tracking script in third party web pages, helping train your learning classifiers, which makes platform more valuable.

    For a consumer facing application – eg a course choice site for potential appications is the easiest to imagine:
    – Short term model would be advertising (e.g. course/uni ads), affiliate fees on booksales for first year books? Seond hand books market eg via Facebook marketplace?
    – Medium term – affiliate for for prospectus application/fulfilment
    Long term – affiliate fee for course registration

  19. At the end of the day, if the data describing all the HE courses available in the UK is available as data, folk will be able to start to build interesting things around it…

eSTEeM Project: Custom Course Search Engines

Preamble
If the desire for OU courses to make increased use of third party materials and open educational resources is realised, we are likely to see a shift in the pedagogy to one that is more resource based. This project seeks to explore the extent to which custom search engines tuned to particular courses may be used to support the discovery of appropriate resources published on the public web, and as indexed by Google, on any given course.

Many courses now include links to third party resources that have been published on the public web. Discovering appropriate resources in terms of relevance and quality can be a time consuming affair. The Google Custom Search Engine service allows users to define custom search engines (CSEs) that search over a limited set of domains or web pages, rather than the whole web.

(Topic based links can be discovered in a wide variety of places. For example, it is possible to create custom search engines based around the homepages of people added to a Twitter list, or the nominated blogs in annual award listings.)

The ranking of particular resources may also be boosted in the definition of the CSE via a custom ranking configuration. For example, open educational resources published in support of the course may be boosted in the search result rankings.

Alternatively, CSEs may be used to exclude results from particular domains, or return resources from the whole web with the ranking of results from specified pages or domains boosted as required. By opening up results to the whole of the web, if recent, relevant resources from an unspecified domain are identified in response to a particular search query, they stand a chance of being presented to the user in the results listing.

Synonyms for common terms may also be explicitly declared and refinement labels used to offer facet based search limits. This might be used to limit results to resources identified as particularly relevant for a particular unit, or block within a course, for example, or to particular topic areas spread across a course.

“Promoted” results may also be used to emphasise particular results in response to particular queries. A good example here might be to display promoted results relating to resources explicitly referenced in an exercise, assignment or activity.

If any of the indexed pages are marked up with structured data, it may be possible to expose this data using an rich snippet/enhanced search listing. Whilst there are few examples to date, enhanced listings that display document types or media types might be appropriate.

Examples of Google CSEs in action can be found here:

Digital Worlds Cusotm Search Engine (created by hand; as used in T151).

faceted “HE CSE” metasearch engine over UK Higher Education Library websites, UK Parliamentary pages, OERs, video protocols for science experiments. This example demonstrates how the search engine may be embedded in a web page.

The Project
The project proposes the automated generation of custom search engines on a per course basis based on the resources linked to from any given course.

The deliverables will be:

1) an automated way of generating Google CSE definition files through link scraping of Structured Authoring/XML versions of online course materials. If necessary, additional scraping of non-SA, VLE published resources may be required.

2) a resource template page and/or widget in the VLE providing access to the customised course search engine

Success will be based on the extent to which:

1) students on pilot courses use the search engine;
2) a survey of students on courses using the search engine about how useful they found it

Search engine metrics will also form part of the reporting chain. If appropriate, we will also explore the extent to which search engine analytics can be used to enhance the performance of the search engine (for example, by tuning custom ranking configurations), as well offering “recent searches” information to students.

The placement of the search box for the CSE will be an important factor and any evaluation should take this into account, e.g. through A/B testing on course web pages.

Another variable relating to the extent to which a CSE is used by students is whether the CSE performs a whole web search with declared resources prioritised, or whether it just searches over declared resources. Again, an A/B test may be appropriate.

For activities that include a resource discovery component, it would be interesting to explore what effect embedding the search engine with the activity description page might have?

If course team members on any OU courses presenting over the next 9 months are interested in trying out a course based custom search engine, please get in touch. If academics on courses outside the OU would like to discuss the creation and use of course search engines for use on their own courses, I’d love to hear from you too:-)

eSTEeM is joint initiative between the Open University’s Faculty of Science and Faculty of Maths, Computing and Technology to develop new approaches to teaching and learning both within existing and new programmes.