OpenLearn OER (Re)Publishing the Text Way

In response to a provocation, I built a thing that will let you grab an OpenLearn unit, convert it to a simple text format, and publish it on your own website.

[For the next step in this journey, see: Appropriating OpenLearn Content and Republishing Edited Versions Of It Via a “Simple” Automated Text Blogging Workflow.]

It doesn’t require much:

  • if you haven’t got one already, create a Github account (just don’t “ooh, Github, that’s really hard, so I won’t be able to do it…”; just f***ing get an account);
  • visit my repo and read down the page to see what to do…

And what to do essentially boils down to:

  • press a BIG GREEN BUTTON to grab your a copy of the repo;
  • raise an issue, which is to say: click a BIG GREEN button, copy and paste Fetch https://www.open.edu/openlearn as the title, and an OpenLearn course unit URL (if it ends in content-section-overview-0 or content-section-overview-0?SOMETHING it should work) as the first line in the issue body; for example: https://www.open.edu/openlearn/history-the-arts/visions-protest-graffiti/content-section-0?active-tab=description-tab
  • PRESS THE BIG GREEN BUTTON TO SUBMIT THE ISSUE;
  • go to your repo’s Github Pages website. For a repo at https://github.com/YOUR_GITHUB_USERNAME/YOUR_REPO this will be https://YOUR_GITHUB_USERNAME.github.io/YOUR_REPO and after a few minutes and a page refresh or or two you should see your website there. If it doesn’t appear, check the README for a possible fix.

As for changing the content – it’s not that hard once you’ve done it a few times and just go with the flow of writing what feels natural… “Easy” to edit text files are in the content directory and you can edit them via the Github website.

Open Education Versions of Open Source Software: Adding Lightness and Accessibility to User Interfaces?

In a meeting a couple of days ago discussing some of the issues around what sort of resources we might want to provide students to support GIS (geographical information system) related activities, I started chasing the following idea…

The OU has, for a long time, developed software application in-house that is provided to students to support one or more courses. More often than not, the code is devloped and maintained in-house, and not released / published as open source software.

There are a couple of reasons for this. Firstly, the applications typically offer a clean, custom UI that minimises clutter and is designed in order to support usability for learners learning about a particular topic. Secondly, we require software provided by students to be accessible.

For example, the RobotLab software, originally developed, an still maintained, by my colleague Jon Rosewell was created to support a first year undergrad short course, T184 Robotics and the Meaning of Life, elements of which are still used in one of our level 1 courses today. The simulator was also used for many years to support first year undergrad residential schools, as well as a short “build a robot fairground” activity in the masters level team engineering course.

As well as the clean design, and features that support learning (such as a code stepper button in RobotLab that lets students step through code a line at a time), the interfaces also pay great attention to accessibility requirements. Whilst these features are essential for students with particular accessibility needs, they also benefit all out students by adding to the improved usability of the software as a whole.

So those are two, very good reasons, for developing software in-house. But as a downside, it means that we limit the exposure of students to “real” software.

That’s not to say all our courses use in-house software: many courses also provide industry standard software as part of the course offering. But this can present problems too: third party software may come with complex user interfaces, or interfaces that suffer from accessibility issues. And software versions used in the course may drift from latest releases if the software version is fixed for the life of the course. (In fact, the software version may be adopted a year before the start of the course and then expected to last for 5 years of course presentation). Or if software is updated, this may cause significant updates to be made to the course material wrapping the software.

Another issue with professional software is that much of it is mature, and has added features over its life. This is fine for early adopters: the initial versions of the software are probably feature light, and add features slowly over time, allowing the user to grow with them. Indeed, many latterly added features may have been introduced to address issues surrounding a lack of functionality, power or “expressiveness” in use identfied by, and frustrating to, the early users, particularly as they became more expert in using the application.

For a novice coming to the fully featured application, however, the wide range of features of varying levels of sophistication, from elementary, to super-power user, can be bewildering.

So what can be done about this, particularly if we want to avail ourselves of some of the powerful (and perhaps, hard to develop) features of a third party application?

To steal from a motorsport engineering design principle, maybe we can add lightness?

For example, QGIS is a powerful, cross-platform GIS application. (We have a requirement for platfrom neutrality; some of us also think we should be browser first, but let’s for now accept the use of an application that needs to be run on a computer with a “desktop” applciation system (Windows, OS/X, Linux) rather than one running a mobile operating system (iOS, Android) or eveloped for use by a netbook (Chrome OS).)

The interface is quite busy, and arguably hard to quickly teach around from a standing start:

However, as well as being cross-platform, QGIS also happens to be open source.

That is, the source code is available [github: qgis/QGIS].

 

Which means that as well as the code that does all the clever geo-number crunching stuff, we have access to the code that defines the user interface.

*[UPDATE: in this case, we don’t need to customise the UI by forking the code and changing the UI definition files – QGIS provides a user interface configuration / customisation tool.]

For example, if we look for some menu labels in he UI:

we can then search the source code to find the files that contribute to building the UI:

In turn, this means we can take that code, strip out all the menu options and buttons we don’t need for a particular course, and rebuild QGIS with the simplified UI. Simples. (Or maybe not that simples when you actually start getting into the detail, depending on how the software is designed!)

And if the user interface isn’t as accessible as we’d like it, we can try to improve that, and contribute the imporvements back the to parent project. The advantage there is that if students go on to use the full QGIS application outside of the course, they can continue to benefit from the accessiblity improvements. As can every other user, whether they have accessibility needs or not.

So here’s what I’m wondering: if we’re faced with the decision between wanting to use an open source, third party “real” application with usability and access issues, why build the custom learning app, especially if we’re going to keep the code closed and have to maintain it ourselves? Why not join the developer community and produce a simplified, accessible skin for the “real” application, and feed accessibility improvements at least back to the core?

On reflection, I realised we do, of course, do the first part of this already (forking and customising), but we’re perhaps not so good at the latter (contributing accessibility or alt-UI patterns back to the community).

For operational systems, OU developers have worked extensively on Moodle, for example (and I think, committed to the parent project)… And in courses, the recent level 1 computing course uses an OU fork of Scratch called OUBuild, a cross-platform Adobe Air application (as is the original), to teach basic programming, but I’m not sure if any of the code changes have been openly published anywhere, or design notes on why the original was not appropriate as a direct/redistributed download?

Looking at the Scratch open source repos, Scratch looks to be licensed under BSD 3-clause “New” or “Revised” License (“a permissive license similar to the BSD 2-Clause License, but with a 3rd clause that prohibits others from using the name of the project or its contributors to promote derived products without written consent”). Although it doesn’t have to be, I’m not sure the OUBuild source code has been released anywhere or whether commits were made back to the original project? (If you know differently, please let me know:-)) At the very least, it’d be really handy if there was a public document somewhere that identifies the changes that were made to the original and why, which could be useful from a “design learning” perspective. (Maybe there is a paper being worked up somewhere about the software development for the course?) By sharing this information, we could perhaps influence future software design, for example by encouraging developers to produce UIs that are defined from configuration files that can be easily customised and selected from, in that that users can often select language packs).

I can think of a handful of flippant, really negative reasons why we might not want to release code, but they’re rather churlish… So they’re hopefully not the reasons…

But there are good reasons too (for some definition of “good”..): getting code into a state that is of “public release quality”; the overheads of having to support an open code repository (though there are benefits: other people adding suggestions, finding bugs, maybe even suggesting fixes). And legal copyright and licensing issues. Plus the ever present: if we give X away, we’re giving part of the value of doing our courses away.

At the end of the day, seeing open education in part as open and shared practice, I wonder what the real challenges are to working on custom educational software in a more open and collaborative way?

Sketching Scatterplots to Demonstrate Different Correlations

Looking just now for an openly licensed graphic showing a set of scatterplots that illustrate different correlation coefficients between X and Y values, I couldn’t find one.

[UPDATE: following a comment, Rich Seiter has posted a much cleaner – and general – method here: NORTA Algorithm Examples; refer to that post – rather than this – for the method…(my archival copy of rseiter’s algorithm)]

So here’s a quick R script for constructing one, based on a Cross Validated question/answer (Generate two variables with precise pre-specified correlation):

library(MASS)

corrdata=function(samples=200,r=0){
  data = mvrnorm(n=samples, mu=c(0, 0), Sigma=matrix(c(1, r, r, 1), nrow=2), empirical=TRUE)
  X = data[, 1]  # standard normal (mu=0, sd=1)
  Y = data[, 2]  # standard normal (mu=0, sd=1)
  data.frame(x=X,y=Y)
}

df=data.frame()
for (i in c(1,0.8,0.5,0.2,0,-0.2,-0.5,-0.8,-1)){
  tmp=corrdata(200,i)
  tmp['corr']=i
  df=rbind(df,tmp)
}

library(ggplot2)

g=ggplot(df,aes(x=x,y=y))+geom_point(size=1)
g+facet_wrap(~corr)+ stat_smooth(method='lm',se=FALSE,color='red')

And here’s an example of the result:

scatterCorr

It’s actually a little tidier if we also add in + coord_fixed() to fix up the geometry/aspect ratio of the chart so the axes are of the same length:

scatterCorrSquare

So what sort of OER does that make this post?!;-)

PS methinks it would be nice to be able to use different distributions, such as a uniform distribution across x. Is there a similarly straightforward way of doing that?

UPDATE: via comments, rseiter (maybe Rich Seiter?) suggests the NORmal-To-Anything (NORTA) algorithm (about, also here). I have no idea what it does, but here’s what it looks like!;-)

//based on https://blog.ouseful.info/2014/12/17/sketching-scatterplots-to-demonstrate-different-correlations/#comment-69184
#The NORmal-To-Anything (NORTA) algorithm
library(MASS)
library(ggplot2)

#NORTA - h/t rseiter
corrdata2=function(samples, r){
  mu <- rep(0,4)
  Sigma <- matrix(r, nrow=4, ncol=4) + diag(4)*(1-r)
  rawvars <- mvrnorm(n=samples, mu=mu, Sigma=Sigma)
  #unifvars <- pnorm(rawvars)
  unifvars <- qunif(pnorm(rawvars)) # qunif not needed, but shows how to convert to other distributions
  print(cor(unifvars))
  unifvars
}

df2=data.frame()
for (i in c(1,0.9,0.6,0.4,0)){
  tmp=data.frame(corrdata2(200,i)[,1:2])
  tmp['corr']=i
  df2=rbind(df2,tmp)
}
g=ggplot(df2,aes(x=X1,y=X2))+geom_point(size=1)+facet_wrap(~corr)
g+ stat_smooth(method='lm',se=FALSE,color='red')+ coord_fixed()

Here’s what it looks like with 1000 points:

unifromScatterCorr

Note that with smaller samples, for the correlation at zero, the best fit line may wobble and may not have zero gradient, though in the following case, with 200 points, it looks okay…

unifscattercorrsmall

The method breaks if I set the correlation (r parameter) values to less than zero – Error in mvrnorm(n = samples, mu = mu, Sigma = Sigma) : ‘Sigma’ is not positive definite – but we can just negate the y-values (unifvars[,2]=-unifvars[,2]) and it seems to work…

If in the corrdata2 function we stick with the pnorm(rawvars) distribution rather than the uniform (qunif(pnorm(rawvars))) one, we get something that looks like this:

corrnorm1000

Hmmm. Not sure about that…?

PS see also this Anscombe’s Quartet notebook and this recipe for creating datasets with the same summary statistics

PPS For a Python equivalent: https://stackoverflow.com/questions/18683821/generating-random-correlated-x-and-y-points-using-numpy

PEERing at Education…

I just had a “doh!” moment in the context of OERs – Open Educational Resources, typically so called because they are Resources produced by an Educator under an Open content license (which to all intents and purposes is a copyright waiver). One of the things that appeals to me about OERs is that there is no reason for them not to be publicly discoverable which makes them the ideal focus for PEER – Public Engagement with Educational Resources. Which is what the OU traditionally offered through 6am TV broadcasts of not-quite-lectures…

Or how about this one?

And which the OU is now doing through iTunesU and several Youtube Channels, such as OU Learn:


(Also check out some of the other OU playlists…or OU/BBC co-pros currently on iPlayer;-)

PS It also seems to me that users tend not to get too hung up about how things are licensed, particularly educational ones, because education is about public benefit and putting constraints on education is just plain stoopid. Discovery is nine tenths of law, as it were. The important thing about having something licensed as an OER is that no-one can stop you from sharing it… (which even if you’re the creator of a resource, you may not b able to do; academics, for example, often hand over the copyright of their teaching materials to their employer, and their employer’s copyright over their research output (similarly transferred as a condition of employment) to commercial publishers who then sell the content back to their employers.

The Learning Journey Starts Here: Youtube.edu and OpenLearn Resource Linkage

Mulling over the OU’s OULearn pages on Youtube a week or two ago, colleague Bernie Clark pointed out to me how the links from the OU clip descriptions could be rather hit or miss:

Via @lauradee, I see that the OU has a new offering on YouTube.com/edu is far more supportive of links to related content, links that can represent the start of a learning journey through OU educational – and commentary – content on the OU website.

Here’s a way in to the first bit of OU content that seems to have appeared:

This links through to a playlist page with a couple of different sorts of opportunity for linking to resources collated at the “Course materials” or “Lecture materials” level:

(The language gives something away, I think, about the expectation of what sort of content is likely to be uploaded here…)

So here, for example, are links at the level of the course/playlist:

And here are links associated with each lecture, erm, clip:

In this first example, several types of content are being linked to, although from the link itself it’s not immediately obvious what sort of resource a link points to? For example, some of the links lead through to course units on OpenLearn/Learning Zone:

Others link through to “articles” posted on the OpenLearn “news” site (I’m not ever really sure how to refer to that site, or the content posts that appear on it?)

The placing of content links into the Assignments and Others tabs always seems a little arbitrary to me from this single example, but I suspect that when a few more lists have been posted some sort of feeling about what sorts of resources should go where (i.e. what folk might expect by “Assignment” or “Other” resource links). If there’s enough traffic generated through these links, a bit of A/B testing might even be in order relating to the positioning of links within tabs and the behaviour of students once they click through (assuming you can track which link they clicked through, of course…)?

The transcript link is unambiguous though! And, in this case at least), resolves to a PDF hosted somewhere on the OU podcasts/media filestore:

(I’m not sure if caption files are also available?)

Anyway – it’ll be interesting to hear back about whether this enriched linking experience drives more traffic to the OpenLearn resources, as well as whether the positioning of links in the different tab areas has any effect on engagement with materials following a click…

And as far as the linkage itself goes, I’m wondering: how are the links to OpenLearn course units and articles generated/identified, and are those links captured in one of the data.open.ac.uk stores? Or is the process that manages what resource links get associated with lists and list items on Youtube/edu one that doesn’t leave (or readily support the automated creation of) public data traces?

PS How much (if any( of the linked resource goodness is grabbable via the Youtube API, I wonder? If anyone finds out before me, please post details in the comments below:-)

Deconstructing OpenLearn Units – Glossary Items, Learning Outcomes and Image Search

It turns out that part of the grief I encountered here in trying to access OpenLearn XML content was easily resolved (check the comments: mechanise did the trick…), though I’ve still to try to sort out a workaround for accessing OpenLearn images (a problem described here)), but at least now I have another stepping stone: a database of some deconstructed OpenLearn content.

Using Scraperwiki to pull down and parse the OpenLearn XML files, I’ve created some database tables that contain the following elements scraped from across the OpenLearn units by this OpenLearn XML Processor:

  • glossary items;
  • learning objectives;
  • figure captions and descriptions.

You can download CSV data files corresponding to the tables, or the whole SQLite database. (Note that there is also an “errors” table that identifies units that threw an error when I tried to grab, or parse, the OpenLearn XML.)

Unfortunately, I haven’t had a chance yet to pop up a view over the data (I tried, briefly, but today was another of those days where something that’s probably very simple and obvious prevented me from getting the code I wanted to write working; if anyone has an example Scraperwiki view that chucks data into a sortable HTML table or a Simile Exhibit searchable table, please post a link below; or even better, add a view to the scraper:-)

So in the meantime, if ypu want to have a play, you need to make use of the Scraperwiki API wizard.

Here are some example queries:

  • a search for figure descriptions containing the word “communication” – select * from `figures` where desc like ‘%communication%’: try it
  • a search over learning outcomes that include the phrase how to followed at some point by the word dataselect * from `learningoutcomes` where lo like ‘%how to%data%’: try it
  • a search of glossary items for glossary terms that contain the word “period” or a definition that contains the word “ancient” – select * from `glossary` where definition like ‘%ancient%’ or term like ‘%period%’: try it
  • find figures with empty captions – select * from `figures` where caption==”: try it

I’ll try to add some more examples when I get a chance, as well as knocking up a more friendly search interface. Unless you want to try…?!;-)

A Tracking Inspired Hack That Breaks the Web…? Naughty OpenLearn…

So it’s not just me who wonders Why Open Data Sucks Right Now and comes to this conclusion:

What will make open data better? What will make it usable and useful? What will push people to care about the open data they produce?
SOMEONE USING IT!
Simply that. If we start using the data, we can email, write, text and punch people until their data is in a standard, useful and usable format. How do I know if my data is correct until someone tries to put pins on a map for ever meal I’ve eaten? I simply don’t. And this is the rock/hard place that open data lies in at the moment:

It’s all so moon-hoveringly bad because no-one uses it.
No-one uses it because what is out there is moon-hoveringly bad

Or broken…

Earlier today, I posted some, erm, observations about OpenLearn XML, and in doing so appear to have logged, in a roundabout and indirect way, a couple of bugs. (I did think about raising the issues internally within the OU, but as the above quote suggests, the iteration has to start somewhere, and I figured it may be instructive to start it in the open…)

So here’s another, erm, issue I found relating to accessing OpenLearn xml content. It’s actually something I have a vague memory of colliding with before, but I don’t seem to have blogged it, and since moving to an institutional mail server that limits mailbox size, I can’t check back with my old email messages to recap on the conversation around the matter from last time…

The issue started with this error message that was raised when I tried to parse an OU XML document via Scraperwiki:

Line 85 - tree = etree.parse(cr)
lxml.etree.pyx:2957 -- lxml.etree.parse (src/lxml/lxml.etree.c:56230)(())
parser.pxi:1533 -- lxml.etree._parseDocument (src/lxml/lxml.etree.c:82313)(())
parser.pxi:1562 -- lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:82606)(())
parser.pxi:1462 -- lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:81645)(())
parser.pxi:1002 -- lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:78554)(())
parser.pxi:569 -- lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:74498)(())
parser.pxi:650 -- lxml.etree._handleParseResult (src/lxml/lxml.etree.c:75389)(())
parser.pxi:590 -- lxml.etree._raiseParseError (src/lxml/lxml.etree.c:74722)(())
XMLSyntaxError: Entity 'nbsp' not defined, line 155, column 34

nbsp is an HTML entity that shouldn’t appear untreated in an arbitrary XML doc. So I assumed this was a fault of the OU XML doc, and huffed and puffed and sighed for a bit and tried with another XML doc; and got the same result. A trawl around the web looking for whether there were workarounds for the lxml Python library I was using to parse the “XML” turned up nothing… Then I thought I should check…

A command line call to an OU XML URL using curl:

curl http://openlearn.open.ac.uk/mod/oucontent/view.php?id=397313&content=1

returned the following:

<meta http-equiv="refresh" content="0; url=http://openlearn.open.ac.uk/login/index.php?loginguest=true" /><script type="text/javascript">
//<![CDATA[
location.replace('http://openlearn.open.ac.uk/login/index.php?loginguest=true');
//]]></script>

Ah… vague memories… there’s some sort of handshake goes on when you first try to access OpenLearn content (maybe something to do with tracking?), before the actual resource that was called is returned to the calling party. Browsers handle this handshake automatically, but the etree.parse(URL) function I was calling to load in and parse the XML document doesn’t. It just sees the HTML response and chokes, raising the error that first alerted me to the problem.

[Seems the redirect is a craptastic Moodle fudge /via @ostephens]

So now it’s two hours later than it was when I started a script, full of joy and light and happy intentions, that would generate an aggregated glossary of glossary items from across OpenLearn and allow users to look up terms, link to associated units, and so on; (the OU-XML document schema that OpenLearn uses has markup for explicitly describing glossary items). Then I got the error message, ran round in circles for a bit, got ranty and angry and developed a really foul mood, probably tweeted some things that I may regret, one day, figured out what the issue was, but not how to solve it, thus driving my mood fouler and darker… (If anyone has a workaround that lets me get an XML file back directly from OpenLearn (or hides the workaround handshake in a Python script I can simply cut and paste), please enlighten me in the comments.)

I also found at least one OpenLearn unit that has glossary items, but just dumps then in paragraph tags and doesn’t use the glossary markup. Sigh…;-)

So… how was your day?! I’ve given up on mine…

Do We Need an OpenLearn Content Liberation Front?

For me, one of the defining attributes of openness relates to accessibility of the machine kind: if I can’t write a script to handle the repetitive stuff for me, or can’t automate the embedding of image and/or video resources, then whatever the content is, it’s not open enough in a practical sense for me to do what I want with it.

So here’s an, erm, how can I put this politely, little niggle I have with OpenLearn XML. (For those of you not keeping up, one of the many OpenLearn sites is the OU’s open course materials site; the materials published on the site as course unit contentful HTML pages are also available as structured XML documents. (When I say “structured”, I mean that certain elements of the materials are marked up in a semantically meaningful way; lots of elements aren’t, but we have to start somewhere ;-))

The context is this: following on from my presentation on Making More of Structured Course Materials at the eSTeEM conference last week, I left a chat with Jonathan Fine with the intention of seeing what sorts of secondary product I could easily generate from the OpenLearn content. I’m in the middle of building a scraper and structured content extractor at the moment, grabbing things like learning outcomes, glossary items, references and images, but I almost immediately hit a couple of problems, first with actually locating the OU XML docs, and secondly locating the images…

Getting hold of a machine readable list of OpenLearn units is easy enough via the OpenLearn OPML feed (much easier to work with than the “all units” HTML index page). Units are organised by topic and are listed using the following format:

<outline type="rss" text="Unit content for Water use and the water cycle" htmlUrl="http://openlearn.open.ac.uk/course/view.php?name=S278_12" xmlUrl="http://openlearn.open.ac.uk/rss/file.php/stdfeed/4307/S278_12_rss.xml"/>

URLs of the form http://openlearn.open.ac.uk/course/view.php?name=S278_12 link to a ‘homepage” for each unit, which then links to the first page of actual content, content which is also available in XML form. The content page URLs have the form http://openlearn.open.ac.uk/mod/oucontent/view.php?id=398820&direct=1, where the ID is one-one uniquely mapped to the course name identifier. The XML version of the page can then be accessed by changing direct=1 in the URL to content=1. Only, we don’t know the mapping from course unit name to page id. The easiest way I’ve found of doing that is to load in the RSS feed for each unit and grab the first link URL, which points the first HTML content page view of the unit.

I’ve popped a scraper up on Scraperwiki to build the lookup for XML URLs for OpenLearn units – OpenLearn XML Processor:

import scraperwiki

from lxml import etree

#===
#via http://stackoverflow.com/questions/5757201/help-or-advice-me-get-started-with-lxml/5899005#5899005
def flatten(el):           
    result = [ (el.text or "") ]
    for sel in el:
        result.append(flatten(sel))
        result.append(sel.tail or "")
    return "".join(result)
#===

def getcontenturl(srcUrl):
    rss= etree.parse(srcUrl)
    rssroot=rss.getroot()
    try:
        contenturl= flatten(rssroot.find('./channel/item/link'))
    except:
        contenturl=''
    return contenturl

def getUnitLocations():
    #The OPML file lists all OpenLearn units by topic area
    srcUrl='http://openlearn.open.ac.uk/rss/file.php/stdfeed/1/full_opml.xml'
    tree = etree.parse(srcUrl)
    root = tree.getroot()
    topics=root.findall('.//body/outline')
    #Handle each topic area separately?
    for topic in topics:
        tt = topic.get('text')
        print tt
        for item in topic.findall('./outline'):
            it=item.get('text')
            if it.startswith('Unit content for'):
                it=it.replace('Unit content for','')
                url=item.get('htmlUrl')
                rssurl=item.get('xmlUrl')
                ccu=url.split('=')[1]
                cctmp=ccu.split('_')
                cc=cctmp[0]
                if len(cctmp)>1: ccpart=cctmp[1]
                else: ccpart=1
                slug=rssurl.replace('http://openlearn.open.ac.uk/rss/file.php/stdfeed/','')
                slug=slug.split('/')[0]
                contenturl=getcontenturl(rssurl)
                print tt,it,slug,ccu,cc,ccpart,url,contenturl
                scraperwiki.sqlite.save(unique_keys=['ccu'], table_name='unitsHome', data={'ccu':ccu, 'uname':it,'topic':tt,'slug':slug,'cc':cc,'ccpart':ccpart,'url':url,'rssurl':rssurl,'ccurl':contenturl})

getUnitLocations()

The next step in the plan (because I usually do have a plan; it’s hard to play effectively without some sort of direction in mind…) as far as images goes was to grab the figure elements out of the XML documents and generate an image gallery that allows you to search through OpenLearn images by title/caption and/or description, and preview them. Getting the caption and description from the XML is easy enough, but getting the image URLs is not

Here’s an example of a figure element from an OpenLearn XML document:

<Figure id="fig001">
<Image src="\\DCTM_FSS\content\Teaching and curriculum\Modules\Shared Resources\OpenLearn\S278_5\1.0\s278_5_f001hi.jpg" height="" webthumbnail="false" x_imagesrc="s278_5_f001hi.jpg" x_imagewidth="478" x_imageheight="522"/>
<Caption>Figure 1 The geothermal gradient beneath a continent, showing how temperature increases more rapidly with depth in the lithosphere than it does in the deep mantle.</Caption>
<Alternative>Figure 1</Alternative>
<Description>Figure 1</Description>
</Figure>

Looking at the HTML page for the corresponding unit on OpenLearn, we see it points to the image resource file at http://openlearn.open.ac.uk/file.php/4178/!via/oucontent/course/476/s278_5_f001hi.jpg:

So how can we generate that image URL from the resource link in the XML document? The filename is the same, but how can we generate what are presumably contextually relevant path elements: http://openlearn.open.ac.uk/file.php/4178/!via/oucontent/course/476/

If we look at the OpenLearn OPML file that lists all current OpenLearn units, we can find the first identifier in the path to the RSS file:

<outline type="rss" text="Unit content for Energy resources: Geothermal energy" htmlUrl="http://openlearn.open.ac.uk/course/view.php?name=S278_5" xmlUrl="http://openlearn.open.ac.uk/rss/file.php/stdfeed/4178/S278_5_rss.xml"/>

But I can’t seem to find a crib for the second identifier – 476 – anywhere? Which means I can’t mechanise the creation of links to actually OpenLearn image assets from the XML source. Also note that there are no credits, acknowledgements or license conditions associated with the image contained within the figure description. Which also makes it hard to reuse the image in a legal, rights recognising sense.

[Doh – I can surely just look at URL for an image in an OpenLearn unit RSS feed and pick the path up from there, can’t I? Only I can’t, because the image links in the RSS feeds are: a) relative links, without path information, and b) broken as a result…]

Reusing images on the basis of the OpenLearn XML “sourcecode” document is therefore: NOT OBVIOUSLY POSSIBLE.

What this suggests to me is that if you release “source code” documents, they may actually need some processing in terms of asset resolution that generates publicly resolvable locators to assets if they are encoded within the source code document as “private” assets/non-resolvable identifiers.

Where necessary, acknowledgements/credits are provided in the backmatter using elements of the form:

<Paragraph>Figure 7 Willes-Richards, J., et al. (1990) ; HDR Resource/Economics’ in Baria, R. (ed.) <i>Hot Dry Rock Geothermal Energy</i>, Copyright CSM Associates Limited</Paragraph>

Whilst OU-XML does support the ability to make a meaningful link to a resource within the XML document, using an element of the form:

<CrossRef idref="fig007">Figure 7</CrossRef>

(which presumably uses the Alternative label as the cross-referenced identifier, although not the figure element id (eg fig007) which is presumably unique within any particular XML document?), this identifier is not used to link the informally stated figure credit back to the uniquely identified figure element?

If the same image asset is used in several course units, there is presumably no way of telling from the element data (or even, necessarily, the credit data?) whether the images are in fact one and the same. That is, we can’t audit the OpenLearn materials in a text mechanised way to see whether or not particular images are reused across two or more OpenLearn units.

Just in passing, it’s maybe also worth noting that in the above case at least, a description for the image is missing. In actual OU course materials, the description element is used to capture a textual description of the image that explicates the image in the context of the surrounding text. This represents a partial fulfilment of accessibility requirements surrounding images and represents, even if not best, at least effective practice.

Where else might content need liberating within OpenLearn content? At the end of the course unit XML documents, in the “backmatter” element, there is often a list of references. References have the form:

<Reference>Sheldon, P. (2005) Earth’s Physical Resources: An Introduction (Book 1 of S278 Earth’s Physical Resources: Origin, Use and Environmental Impact), The Open University, Milton Keynes</Reference>

Hmmm… no structure there… so how easy would it be to reliably generate a link to an authoritative record for that item? (Note that other records occasionally use presentational markup such as italics (or emphasis) tags to presentationally style certain parts of some references (confusing presentation with semantics…).)

Finally, just a quick note on why I’m blogging this publicly rather than raising it, erm, quietly within the OU. My reasoning is similar to the reasoning we use when we tell students to not be afraid of asking questions, because it’s likely that others will also have the same question… I’m asking a question about the structure of an open educational resource, because I don’t quite understand it; by asking the question in public, it may be the case that others can use the same questioning strategy to review the way they present their materials, so when I find those, I don’t have to ask similar sorts of question again;-)

PS sort of related to this, see TechDis’ Terry McAndrew’s Accessible courses need and accessibilty-friendly schema standard.

PPS see also another take on ways of trying to reduce cognitive waste – Joss Winn’s latest bid in progress, which will examine how the OAuth 2.0 specification can be integrated into a single sign on environment alongside Microsoft’s Unified Access Gateway. If that’s an issue or matter of interest in your institution, why not fork the bid and work it up yourself, or maybe even fork it and contribute elements back?;-) (Hmm, if several institutions submitted what was essentially the same bid from multiple institutions, how would they cope during the marking process?!;-)

Tune Your Feeds…

I’m so glad we’re at year’s end: I’m completely bored of the web, my feeds contain little of interest, I’m drastically in need of a personal reboot, and I’m starting to find myself stuck in a “seen-it-all-before” rut…

Take the “new” Google Circle’s volume slider, for example… Ooh.. shiny… ooh, new feature…

Yawn… Slider widgets have been around for ages, of course (e.g. Slider Widgets Around the Web) and didn’t Facebook allow you to do the volume control thing on your Facebook news feeds way back when, when Facebook’s feeds were themselves news (Facebook News Mixing Desk)?

Facebook Mixing desk

Does Facebook still offer this service I wonder?

On the other hand, there is the new Google Zeitgeist Scrapbook… I’m still trying to decide whether this is interesting or not… The prmeise is a series of half completed straplines that you can fill in with subheadings that interest you, and reveal a short info paragraph as a result.

Google scrapbook

Google scrapbook

The finished thing is part scrapbook, part sticker book.

Google scrapbook

The reason why I’m not sure whether this is interesting or not is because I can’t decide whether it may actually hint at a mechanic for customising your own newspaper out of content from your favoured news provider. For example, what would it look like if we tried to build something similar around content from the Guardian Platform API? Might different tag combinations be dragged into the story panels to hook up a feed from that tag or section of the “paper”? And once we’ve acted as editor of our own newspaper, might advanced users then make use of mixing desk sliders to tune the volume of content in each section?

This builds on the idea that newspapers provide you with content and story types you wouldn’t necessarily see, whilst still allowing to some degree of control over how weighted the “paper” is to different news sections (something we always had some element of control over before, though at a different level of granularity, for example, by choosing to buy newspapers only on certain days because they came with a supplement you were interested in, though you were also happy to read the rest of the paper since you have it…)

(It also reminds me that I never could decide about Google’s Living Stories either…)

PS in other news, MIT hints at an innovation in the open educational field, in particular with respect to certification… It seems you may soon be able to claim some sort of academic credit, for a fee, if you’ve been tracked through an MITx open course (MIT’s new online courses target students worldwide). Here’s the original news release: MIT launches online learning initiative and FAQ.

So I wonder: a “proven” online strategy is to grab as big an audience as you can as quickly as you can, then worry about how to make the money back. Could MIT’s large online course offereings from earlier this year be seen in retrospect as MIT testing the water’s to see whether or not they could grow an audience around online courses quickly?

I just wonder what would have happened if we’d managed to convert a Relevant Knowldge course to an open course accreditation container for a start date earlier this year, and used it to offer credit around the MIT courses ourselves?!;-) As to what other innovations might there be around open online education? I suspect the OU still has high hopes for SocialLearn… but I’m still of the mind that there’s far more interesting stuff to be done in the area of open course production

OERs: Public Service Education and Open Production

I suspect that most people over a certain age have some vague memory of OU programmes broadcast in support of OU courses taking over BBC2 at at various “off-peak” hours of the day (including Saturday mornings, if I recall correctly…)

These courses formed an important part of OU courses, and were also freely available to anyone who wanted to watch them. In certain respects, they allowed the OU to operate as a public service educator, bringing ideas from higher education to a wider audience. (A lot has been said about the role of the UK’s personal computer culture in the days of the ZX Spectrum and the BBC Micro in bootstrapping software skills development, and in particular the UK computer games industry; but we don’t hear much about the role the OU played in raising aspiration and introducing the very idea of what might be involved in higher education through free-to-air broadcasts of OU course materials, which I’m convinced it must have played. I certainly remember watching OU maths and physics programmes as a child, and wanting to know more about “that stuff” even if I couldn’t properly follow it at the time.)

The OU’s broadcast strategy has evolved since then, of course, moving into prime time broadcasts (Child of Our Time, Coast, various outings with James May, The Money Programme, and so on) as well as “online media”: podcasts on iTunes and video content on Youtube, for example.

The original OpenLearn experiment, which saw 10-20hr extracts of OU course material being released for free continues, but as I understand it, is now thought of in the context of a wider OpenLearn engagement strategy that will aggregate all the OU’s public output (from open courseware and OU podcasts to support for OU/BBC co-produced content) under a single banner: OpenLearn

I suspect there will continue to be forays into the world of “social media”, too:

A great benefit of the early days of OU programming on the BBC was that you couldn’t help but stumble across it. You can still stumble across OU co-produced broadcasts on the BBC now, of course, but they don’t fulfil the same role: they aren’t produced as academic programming designed to support particular learning outcomes and aren’t delivered in a particularly academic way. They’re more about entertainment. (This isn’t necessarily a bad thing, but I think it does influence the stance you take towards viewing the material.)

If we think of the originally produced TV programmes as “OERs”, open educational resources, what might we say about them?

– they were publicly available;
– they were authentic, relating to the delivery of actual OU courses;
– the material was viewed by OU students enrolled on the associated course, as well as viewers following a particular series out of general interest, and those who just happened to stumble by the programme;
– they provided pacing, and the opportunity for a continued level of engagement over a period of weeks, on a single academic topic;
– they provided a way of delivering lifelong higher education as part of the national conversation, albeit in the background. But it was always there…

In a sense, the broadcasts offered a way for the world to “follow along” parts of a higher education as it was being delivered.

In many ways, the “Massive Open Online Courses” (MOOCs), in which a for-credit course is also opened up to informal participants, and the various Stanford open courses that are about to start (Free computer science courses, new teaching technology reinvent online education), use a similar approach.

I generally see this as a Good Thing, as universities engaging in public service education whilst at the same time delivering additional support, resources, feedback, assessment and credit to students formally enrolled on the course.

What I’m not sure about is that initiatives like OpenLearn succeed in the “public service education” role, in part because of the discovery problem: you couldn’t help but stumble across OU/BBC Two broadcasts at certain times of the day. Nowadays, I’d be surprised if you ever stumbled across OpenLearn content while searching the web…

A recent JISC report on OER Impact focussed on the (re)use of OERs in higher education, identifying a major use case of OERs as enhancing teaching practice.

(NB I would have embedded the OER Impact project video here, but WordPress.com doesn’t seem to support embeds from Blip…; openness is not just about the licensing, it’s also about the practical ease of (re)use;-)

However, from my quick reading of the OER impact report, it doesn’t really seem to consider the “open course” use case demonstrated by MOOCs, the Stanford courses, or mid-70s OU course broadcasts. (Maybe this was out of scope…!;-)

Nor does it consider the production of OERs (I think that was definitely out of scope).

For the JISC OER3 funding call, I was hoping to put in a bid for a project based around an open “production-in-presentation” model of resource development targeted to a specific community. For a variety of reasons, (not least, I suspect, my lack of project management skills…) that’s unlikely to be submitted in time, so I thought I’d post the main chunk of the bid here as a way of trying to open up the debate a little more widely about the role of OERs, the utility of open production models, and the extent to they can be used to support cross-sector curriculum innovation/discovery as well as co-creation of resources and resource reuse (both within HE and into a target user community).

Outline
Rapid Resource Discovery and Development via Open Production Pair Teaching (ReDOPT) seeks to draft a set of openly licensed resources for potential (re)use in courses in two different institutions … through the real-time production and delivery of an open online short-course in the area of data handling and visualisation. This approach subverts the more traditional technique of developing materials for a course and then retrospectively making them open, by creating the materials in public and in an openly licensed way, in a way that makes them immediately available for informal study as well as open web discovery, embedding them in a target community, and then bringing them back into the closed setting for formal (re)use. The course will be promoted to the data journalism and open data communities as a free “MOOC” (Massive Online Open Course)/P2PU style course, with a view to establishing an immediate direct use by a practitioner community. The project will proceed as follows: over a 10-12 week period, the core project team will use a variant of the Pair Teaching approach to develop and publish an informal open, online course hosted on an .ac.uk domain via a set of narrative linked resources (each one about the length of a blog post and representing 10 minutes to 1 hour of learner activity) mapping out the project team’s own exploration/learning journey through the topic area. The course scope will be guided by a skeleton curriculum determined in advance from a review of current literature, informal interviews/questionnaires and perceived skills and knowledge gaps in the area. The created resources will contain openly licensed custom written/bespoke material, embedded third party content (audio, video, graphical, data), and selected links to relevant third party material. A public custom search engine in the topic area will also be curated during the course. Additional resources created by course participants (some of whom may themselves be part of the project team), will be integrated into the core course and added to the custom search engine by the project team. Part-time, hourly paid staff will also be funded to contribute additional resources into the evolving course. A second phase of the project will embed the resources as learning resources in the target community through the delivery of workshops based around and referring out to the created resources, as well as community building around the resources. Because of timescales involved, this proposal is limited to the production of the draft materials and embedding them as valuable and appropriate resources in the target community, and does not extend as far as the reuse/first formal use case. Success metrics will therefore be limited to impact evaluation, volume and reach of resources produced, community engagement with the live production of the materials, the extent to which project team members intend to directly reuse the materials produced as a result.

The Proposal
1. The aim of the project is to produce a set of educational resources in a practical topic area (data handling and visualisation), that are reusable by both teachers (as teaching resources) and independent learners (as learning resources), through the development of an openly produced online course in the style of an uncourse created in real time using a Pair Teaching approach as opposed to a traditional sole author or OU style course team production process, and to establish those materials as core reusable educational resources in the target community.

3. … : Extend OER through collaborations beyond HE: the proposal represents a collaboration between two HEIs in the production and anticipated formal (re)use of the materials created, as well as directly serving the needs of the fledgling data-driven journalism community and the open public data communities.

4. … : Addressing sector challenges (ii Involving academics on part-time, hourly-paid contracts): the open production model will seek to engage /part time, hourly paid staff/ in creating additional resources around the course themes that they can contribute back to the course under an open license and that cover a specific issue identified by the course lead or that the part-time staff themselves believe will add value to the course. (Note that the course model will also encourage participants in the course to create and share relevant resources without any financial recompense.) Paying hourly rate staff for the creation of additional resources (which may include quizzes or other informal assessment/feedback related resources), or in the role of editors of community produced resources, represents a middle ground between the centrally produced core resources and any freely submitted resources from the community. Incorporating the hourly paid contributor role is based on the assumption that payment may be appropriate for sourcing course enhancing contributions that are of a higher quality (and may take longer to produce) than community sourced contributions, as well as requiring the open licensing of materials so produced. The model also explores a model under which hourly staff can contribute to the shaping of the course on an ad hoc basis if they see opportunities to do so.

5. … Enhancing the student experience (ii Drawing on student-produced materials): The open production model will seek to engage with the community following the course and encourage them to develop and contribute resources back into the community under an open license. For example, the use of problem based exercises and activities will result in the production of resources that can be (re)used within the context of the uncourse itself as an output of the actual exercise or activity.

6. … The project seeks to explore practical solutions to two issues relating to the wider adoption of OERs by producers and consumers, and provide a case study that other projects may draw on. In the first case, how to improve the discoverablity and direct use of resources on the web by “learners” who do not know they are looking for OERs, or even what OERs are, through creating resources that are published as contributions to the development and support of a particular community and as such are likely to benefit from “implicit” search engine optimisation (SEO) resulting from this approach. In the second case, to explore a mechanism that identifies what resources a community might find useful through curriculum negotiation during presentation, and the extent to which “draft” resources might actually encourage reuse and revision.

7. Rather than publishing an open version of a predetermined, fixed set of resources that have already been produced as part of a closed process and then delivered in a formal setting, the intention is thus to develop an openly licensed set of “draft” resources through the “production in presentation” delivery of an informal open “uncourse” (in-project scope), and at a later date reuse those resources in a formally offered closed/for-credit course (out-of-project scope). The uncourse will not incorporate assessment elements, although community engagement and feedback in that context will be in scope. The uncourse approach draws on the idea of “teacher as learner”, with the “teacher” capturing and reflecting on meaningful learning episodes as they explore a topic area and then communicate these through the development of materials that others can learn from, as well as demonstrating authentic problem solving and self-directed learning behaviours that model the independent learning behaviours we are trying to develop in our students.

8. The quality of the resources will be assured at least to the level of fit-for-purpose at the time of release by combining the uncourse production style with a Pair Teaching approach. A quality improvement process will also operate through responding to any issues identified via the community based peer-review and developmental testing process that results from developing the materials in public.

9. The topic area was chosen based on several factors: a) the experience and expertise of the project team; b) the observation that there are no public education programmes around the increasing amounts of open public data; c) the observation that very few journalism academics have expertise in data journalism; d) the observation that practitioners engaged in data journalism do not have time or interest in to become academics, but do appear willing to share their knowledge.

10. The first uncourse will run over a 6-8 week period and result in the central/core development of circa 5 to 10 blog posts styled resources a week, each requiring 20-45 minutes of “student” activity, (approx. 2-6 hours study time per week equivalent) plus additional directed reading/media consumption time (ideally referencing free and openly licensed content). A second presentation of the uncourse will reuse and extend materials produced during the first presentation, as well as integrating resources, where possible, developed by the community in the first phase and monitoring the amount of time taken to revise/reversion them, as required, compared to the time taken to prepare resources from scratch centrally. Examples of real-time, interactive and graphical representations of data will be recorded as video screencasts and made available online. Participants will be encouraged to consider the information design merits of comparative visualisation methods for publication on different media platforms: print, video, interactive and mobile. In all, we hope to deliver up to 50 hours of centrally produced, openly licensed materials by the end of the course. The uncourse will also develop a custom search engine offering coverage of openly licensed and freely accessible resources related to the course topic area.

11. The course approach is inspired to a certain extent by the Massive Online Open Course (MOOC) style courses pioneered by George Siemens, Stephen Downes, Dave Cormier, Jim Groom et al. The MOOC approach encourages learners to explore a given topic space with the help of some wayfinders. Much of the benefit is derived from the connections participants make between each other and the content by sharing, reflecting, and building on the contributions of others across different media spaces, like blogs, Twitter, forums, YouTube, etc.

12. The course model also draws upon the idea of a uncourse, as demonstrated by Hirst in the creation of the Digital Worlds game development blog [ http://digitalworlds.wordpress.com ] that produced a series of resources as part of an openly blogged learning journey that have since been reused directly in an OU course (T151 Digital Worlds); and the Visual Gadgets blog ( http://visualgadgets.blogspot.com ) that drafted materials that later came to be reused in the OU course T215 Communication and information technologies, and then made available under open license as the OpenLearn unit Visualisation: Visual representations of data and information [ http://openlearn.open.ac.uk/course/view.php?id=4442 ]

13. A second phase of the project will explore ways of improving the discovery of resources in an online context, as well as establishing them as important and relevant resources within the target community. Through face-to-face workshops and hack days, we will run a series of workshops at community events that draw on and extend the activities developed during the initial uncourse, and refer participants to the materials. A second presentation of the uncourse will be offered as a way of testing and demonstrating reuse of the resources, as well as providing an exit path from workshop activities. One possible exit path from the uncourse would be entry into formal academic courses.

14. Establishing the resources within the target community is an important aspect of the project. Participation in community events plays an important role in this, and also helps to prove the resources produced. Attendance at events such as the Open Government Data camp will allow us to promote the availability of the resources to the appropriate European community, further identify community needs, and also provide a backdrop for the development of a promotional video with vox pops from the community hopefully expressing support for the resources being produced. The extent to which materials do become adopted and used within the community will be form an important part of the project evaluation.

15. … By embedding resources in the target community, we aim to enhance the practical utility of the resources within that community as well as providing an academic consideration of the issues involved. A key part of the evaluation workpackage, …, will be to rate the quality of the materials produced and the level of engagement with and reuse of them by both educators and members of the target community.

Note that I am still keen on working this bid up a bit more for submission somewhere else…;-)

[Note that the opinions expressed herein are very much my own personal ones…]

PS see also COL-UNESCO consultation: Guidelines for OER in Higher Education – Request for comments: OER Guidelines for Higher Education Stakeholders