OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for the ‘Open Content’ Category

My ILI2012 Presentation – Derived Products from OpenLearn/OU XML Documents

FWIW, a copy of the slides I used in my ILI2012 presentation earlier this week – Making the most of structured content:data products from OpenLearn XML:

I guess this counts as a dissemination activity for my related eSTEeM project on course related custom search engines, since the work(?!) sort of evolved out of that idea…

The thesis is this:

  1. Course Units on OpenLearn are available as XML docs – a URL pointing to the XML version of a unit can be derived from the Moodle URL for the HTML version of the course; (the same is true of “closed” OU course materials). The OU machine uses the XML docs as a feedstock for a publication process that generates HTML views, ebook views, etc, etc of a course.
  2. We can treat XML docs as if they were database records; sets of structured XML elements can be viewed as if they define database tables; the values taken by the structured elements are like database table entries. Which is to say, we can treat each XML docs as a mini-database, or we we can trivially extract the data and pop it into a “proper”/”real” database.
  3. given a list of courses we can grab all the corresponding XML docs and build a big database of their contents; that is, a single database that contains records pulled from course XML docs.
  4. the sorts of things that we can pull out of a course include: links, images, glossary items, learning objectives, section and subsection headings;
  5. if we mine the (sub)section structure of a course from the XML, we can easily provide an interactive treemap version of the sections and subsections in a course; generating a Freemind mindmap document type, we can automatically generate course-section mindmap files that students can view – and annotate – in Freemind. We can also generate bespoke mindmaps, for example based on sections across OpenLearn courses that contain a particular search term.
  6. By disaggregating individual course units into “typed” elements or faceted components, and then reaggreating items of a similar class or type across all course units, we can provide faceted search across, as well as university wide “meta” view over, different classes of content. For example:
    • by aggregating learning objectives from across OpenLearn units, we can trivially create a search tool that provides a faceted search over just the learning objectives associated with each unit; the search returns learning outcomes associated with a search term and links to course units associated with those learning objectives; this might help in identifying reusable course elements based around reuse or extension of learning outcomes;
    • by aggregating glossary items from across OpenLearn units, we can trivially create a meta glossary for the whole of OpenLearn (or similarly across all OU courses). That is, we could produce a monolithic OpenLearn, or even OU wide, glossary; or maybe it’s useful to have redefine the same glossary terms using different definitions, rather than reuse the same definition(s) consistently across different courses? As with learning objectives, we can also create a search tool that provides a faceted search over just the glossary items associated with each unit; the search returns glossary items associated with a search term and links to course units associated with those glossary items;
    • by aggregating images from across OpenLearn units, we can trivially create a search tool that provides a faceted search over just the descriptions/captions of images associated with each unit; the search returns the images whose description/captions are associated with the search term and links to course units associated with those images. This disaggregation provides a direct way of search for images that have been published through OpenLearn. Rights information may also be available, allowing users to search for images that have been rights cleared, as well as openly licensed images.
  7. the original route in was the extraction of links from course units that could be used to seed custom search engines that search over resources referenced from a course. This could in principle also include books using Google book search.

I also briefly described an approach for appropriating Google custom search engine promotions as the basis for a search engine mediated course, something I think could be used in a sMoocH (search mediated MOOC hack). But then MOOCs as popularised have f**k all to do with innovation, don’t they, other than in a marketing sense for people with very little imagination.

During questions, @briankelly asked if any of the reported dabblings/demos (and there are several working demo) were just OUseful experiments or whether they could in principle be adopted within the OU, or even more widely across HE. The answers are ‘yes’ and ‘yes’ but in reality ‘yes’ and ‘no’. I haven’t even been able to get round to writing up (or persuading someone else to write up) any of my dabblings as ‘proper’ research, let alone fight the interminable rounds of lobbying and stakeholder acquisition it takes to get anything adopted as a rolled out as adopted innovation. If any of the ideas were/are useful, they’re Googleable and folk are free to run with them…but because they had no big budget holding champion associated with their creation, and hence no stake (even defensively) in seeing some sort of use from them, they unlikely to register anywhere.

Written by Tony Hirst

November 2, 2012 at 6:32 pm

MOOC Reflections

A trackback a week or two ago to my blog from this personal blog post: #SNAc week 1: what are networks and what use is it to study them? highlighted me to a MOOC currently running on Coursera on social network analysis. The link was contextualised in the post as follows: The recommended readings look interesting, but it’s the curse of the netbook again – there’s no way I’m going to read a 20 page PDF on a screen. Some highlighted resources from Twitter and the forum look a bit more possible: … Some nice ‘how to’ posts: … (my linked to post was in the ‘howto’ section).

The whole MOOC hype thing at the moment seems to be dominated by references to the things like Coursera, Udacity and edX (“xMOOCs”). Coursera in particularly is a new sort of intermediary, a website that offers some sort of applied marketing platform to universities, allowing them to publish sample courses in a centralised, browsable, location and in a strange sense legitimising them. I suspect there is some element of Emperor’s New Clothes thinking going on in the universities who have opted in and those who may be considering it: “is this for real?”; “can we afford not to be a part of it?”

Whilst Coursera has an obvious possible business model – charge the universities for hosting their marketing material courses – Udacity’s model appears more pragmatic: provide courses with the option of formal assessment via Pearson VUE assessment centres, and then advertise your achievements to employers on the Udacity site; presumably, the potential employers and recruiters (which got me thinking about what role LinkedIn might possibly play in this space?) are seen as the initial revenue stream for Udacity. Note that Udacity’s “credit” awarding powers are informal – in the first instance, credibility is based on the reputation of the academics who put together the course; in contrast, for courses on Coursera, and the rival edX partnership (which also offers assessment through Pearson VUE assessment centres), credibility comes from the institution that is responsible for putting together the course. (It’s not hard to imagine a model where institutions might even badge courses that someone else has put together…)

Note that Coursera, Udacity and edX are all making an offering based on quite a traditional course model idea and are born out of particular subject disciplines. Contrast this in the first part with something like Khan Academy, which is providing learning opportunities at a finer level of granularity/much smaller “learning chunks” in the form of short video tutorials. Khan Academy also provides the opportunity for Q&A based discussion around each video resource.

Also by way of contrast are the “cMOOC” style offerings inspired by the likes of George Siemens, Stephen Downes, et al., where a looser curriculum based around a set of topics and initially suggested resources is used to bootstrap a set of loosely co-ordinated personal learning journeys: learners are encouraged to discover, share and create resources and feed them into the course network in a far more organic way than the didactic, rigidly structured approach taken by the xMOOC platforms. The cMOOC style also offeres the possibility of breaking down subject disciplines through accepting shared resources contributed because they are relevant to the topic being explored, rather than because they are part of the canon for a particular discipline.

The course without boundaries approach of Jim Groom’s ds106, as recently aided and abetted by Alan Levine, also softens the edges of a traditionally offered course with its problem based syllabus and open assignment bank (particpants are encouraged to submit their own assignment ideas) and turns learning into something of a lifestyle choice… (Disclaimer: regular readers will know that I count the cMOOC/ds106 “renegades” as key forces in developing my own thinking…;-)

Something worth considering about the evolution of open education from early open content/open educational resource (OER) repositories and courseware into the “Massive Open Online Course” thing is just what caused the recent upsurge in interest? Both MIT opencourseware and the OU’s OpenLearn offerings provided “anytime start”, self-directed course units; but my recollection is that it was Thrun & Norvig’s first open course on AI (before Thrun launched Udacity), that captured the popular (i.e. media) imagination because of the huge number of students that enrolled. Rather than the ‘on-demand’ offering of OpenLearn, it seems that the broadcast model, and linear course schedule, along with the cachet of the instructors, were what appealed to a large population of demonstrably self-directed learners (i.e. geeks and programmers, who spend their time learning how to weave machines from ideas).

I also wonder whether the engagement of universities with intermediary online course delivery platforms will legitimise online courses run by other organisations; for example, the Knight Centre Massive Open Online Courses portal (a Moodle environment) is currently advertising it’s first MOOC on infographics and data visualisation:

Similar to other Knight Center online courses, this MOOC is divided into weekly modules. But unlike regular offerings, there will be no application or selection process. Anyone can sign up online and, once registered, participants will receive instructions on how to enroll in the course. Enrollees will have immediate access to the syllabus and introductory information.

The course will include video lectures, tutorials, readings, exercises and quizzes. Forums will be available for discussion topics related to each module. Because of the “massive” aspect of the course, participants will be encouraged to provide feedback on classmates’ exercises while the instructor will provide general responses based on chosen exercises from a student or group of students.

Cairo will focus on how to work with graphics to communicate and analyze data. Previous experience in information graphics and visualization is not needed to take this course. With the readings, video lectures and tutorials available, participants will acquire enough skills to start producing compelling, simple infographics almost immediately. Participants can expect to spend 4-6 hours per week on the course.

Although the course will be free, if participants need to receive a certificate, there will be a $20 administrative fee, paid online via credit card, for those who meet the certificate requirements. The certificate will be issued only to students who actively participated in the course and who complied with most of the course requirements, such as quizzes and exercises. The certificates will be sent via email as a PDF document. No formal course credit of any kind is associated with the certificate.

Another of the things that I’ve been pondering is the role that “content” may or not play a role in this open course thing. Certainly, where participants are encouraged to discover and share resources, or where instructors seek to construct courses around “found resources”, an approach espoused by the OU’s new postgraduate strategy, it seems to me that there is an opportunity to contribute to the wider open learning idea by producing resources that can be “found”. For resources to be available as found resources, we need the following:

  1. Somebody needs to have already created them…
  2. They need to be discoverable by whoever is doing the finding
  3. They need to be appropriately licensed (if we have to go through a painful rights clearnance and rights payment model, the cost benefits of drawing on and freely reusing those resources are severely curtailed).

Whilst the running of a one shot MOOC may attract however many participants, the production of finer grained (and branded) resources that can be used within those courses means that a provider can repeatedly, and effortlessly, contribute to other peoples courses through course participants pulling the resources into those coure contexts. (It also strikes me that educators in one institution could sign up for a course offered by another, and then drop in links to their own applied marketing learning materials.)

One thing I’ve realised from looking at Digital Worlds uncourse blog stats is that some of the posts attract consistent levels of traffic, possibly because they have been embedded to from other course syllabuses. I also occasionally see flurries of downloads of tutorial files, which makes me wonder whether another course has linked to resources I originally produced. If we think of the web in it’s dynamic and static modes (static being the background links that are part of the long term fabric of the web, dynamic as the conversation and link sharing that goes on in social networks, as well as the publication of “alerts” about new fabric (for example, the publication of a new blog post into the static fabric of the web is announced through RSS feeds and social sharing as part of the dynamic conversation)), then the MOOCs appear to be trying to run in a dynamic, broadcast mode. Whereas what interests me is how we can contribute to the static structure of the web, and how we can make better use of it in a learning context?

PS a final thought – running scheduled MOOCs is like a primetime broadcast; anytime independent start is like on-demand video. Or how about this: MOOCs are like blockbuster books, published to great fanfare and selling millions of first day, pre-ordered copies. But there’s also long tail over time consumption of the same books… and maybe also books that sell steadily over time without great fanfare. Running a course once is all well and good; but it feels too ephemeral, and too linear rather than networked thinking to me?

Written by Tony Hirst

October 9, 2012 at 9:04 am

Generating OpenLearn Navigation Mindmaps Automagically

I’ve posted before about using mindmaps as a navigation surface for course materials, or as way of bootstrapping the generation of user annotatable mindmaps around course topics or study weeks. The OU’s XML document format that underpins OU course materials, including the free course units that appear on OpenLearn, makes for easy automated generation of secondary publication products.

So here’s the next step in my exploration of this idea, a data sketch that generates a Freemind .mm format mindmap file for a range of OpenLearn offerings using metadata puled into Scraperwiki. The file can be downloaded to your desktop (save it with a .mm suffix), and then opened – and annotated – within Freemind.

You can find the code here: OpenLearn mindmaps.

By default, the mindmap will describe the learning outcomes associated with each course unit published on the Open University OpenLearn learning zone site.

By hacking the view URL, other mindmaps are possible. For example, we ca make the following additions to the actual mindmap file URL (reached by opening the Scraperwiki view) as follows:

  • ?unit=UNITCODE, where UNITCODE= something like T180_5 or K100_2 and you will get a view over section headings and learning outcomes that appear in the corresponding course unit.
  • ?unitset=UNITSET where UNITSET= something like T180 or K100 – ie the parent course code from which a specific unit was derived. This view will give a map showing headings and Learning Outcomes for all the units derived from a given UNITSET/course code.
  • ?keywordsearch=KEYWORD where KEYWORD= something like: physics This will identify all unit codes marked up with the keyword in the RSS version of the unit and generate a map showing headings and Learning Outcomes for all the units associated with the keyword. (This view is still a little buggy…)

In the first iteration, I haven’t added links to actual course units, so the mindmap doesn’t yet act as a clickable navigation surface, but that it is on the timeline…

It’s also worth noting that there is a flash browser available for simple Freemind mindmaps, which means we could have an online, in-browser service that displays the mindmap as such. (I seem to have a few permissions problems with getting new files onto ouseful.open.ac.uk at the moment – Mac side, I think? – so I haven’t yet been able to demo this. I suspect that browser security policies will require the .mm file to be served from the same server as the flash component, which means a proxy will be required if the data file is pulled from the Scraperwiki view.)

What would be really nice, of course, would be an HTML5 route to rendering a JSONified version of the .mm XML format… (I’m not sure how straightforward it would be to port the original Freemind flash browser Actionscript source code?)

Written by Tony Hirst

May 4, 2012 at 2:14 pm

Posted in Open Content, OU2.0

Tagged with , ,

The Learning Journey Starts Here: Youtube.edu and OpenLearn Resource Linkage

Mulling over the OU’s OULearn pages on Youtube a week or two ago, colleague Bernie Clark pointed out to me how the links from the OU clip descriptions could be rather hit or miss:

Via @lauradee, I see that the OU has a new offering on YouTube.com/edu is far more supportive of links to related content, links that can represent the start of a learning journey through OU educational – and commentary – content on the OU website.

Here’s a way in to the first bit of OU content that seems to have appeared:

This links through to a playlist page with a couple of different sorts of opportunity for linking to resources collated at the “Course materials” or “Lecture materials” level:

(The language gives something away, I think, about the expectation of what sort of content is likely to be uploaded here…)

So here, for example, are links at the level of the course/playlist:

And here are links associated with each lecture, erm, clip:

In this first example, several types of content are being linked to, although from the link itself it’s not immediately obvious what sort of resource a link points to? For example, some of the links lead through to course units on OpenLearn/Learning Zone:

Others link through to “articles” posted on the OpenLearn “news” site (I’m not ever really sure how to refer to that site, or the content posts that appear on it?)

The placing of content links into the Assignments and Others tabs always seems a little arbitrary to me from this single example, but I suspect that when a few more lists have been posted some sort of feeling about what sorts of resources should go where (i.e. what folk might expect by “Assignment” or “Other” resource links). If there’s enough traffic generated through these links, a bit of A/B testing might even be in order relating to the positioning of links within tabs and the behaviour of students once they click through (assuming you can track which link they clicked through, of course…)?

The transcript link is unambiguous though! And, in this case at least), resolves to a PDF hosted somewhere on the OU podcasts/media filestore:

(I’m not sure if caption files are also available?)

Anyway – it’ll be interesting to hear back about whether this enriched linking experience drives more traffic to the OpenLearn resources, as well as whether the positioning of links in the different tab areas has any effect on engagement with materials following a click…

And as far as the linkage itself goes, I’m wondering: how are the links to OpenLearn course units and articles generated/identified, and are those links captured in one of the data.open.ac.uk stores? Or is the process that manages what resource links get associated with lists and list items on Youtube/edu one that doesn’t leave (or readily support the automated creation of) public data traces?

PS How much (if any( of the linked resource goodness is grabbable via the Youtube API, I wonder? If anyone finds out before me, please post details in the comments below:-)

Written by Tony Hirst

April 27, 2012 at 1:53 pm

Asset Stripping OpenLearn – Images

A long time ago, I tinkered with various ways of disaggregating OpenLearn course units into various components – images, audio files, videos, etc. (OpenLearn_XML Asset stripper (long since rotted)). Over the last few weeks, I’ve returned to the idea, using Scraperwiki to trash through the OpenLearn XML (and RSS) in order to build collections out of various different parts of the OpenLearn materials. So for example, a searchable OpenLearn meta-glossary, that generates one big glossary out of all the separate glossary entries in different OpenLearn units, and an OpenLearn learning outcomes explorer, that allows you to search through learning outcomes as described in different OpenLearn courses.

I’ve also been pulling out figure captions and descriptions, so last night I added a view that allows you to preview images used across OpenLearn: OpenLearn image viewer.

There’s a bit of niggle in using the viewer at the moment (as Jenny Gray puts it, “it’ll be a session cookie called MoodleSession in the openlearn.open.ac.uk domain (if you can grab it?)”) which, if you don’t have a current OpenLearn session cookie, requires you to click on one of the broken images in the righthand-most column and then go back to the gallery viewer (at which point, the images should load okay…unless you have some cookie blocking or anti-tracking features in place, which may well break things further:-( )

(If anyone can demonstrate a workaround for me for how to set the cookie before displaying the images, that’s be appreciated…)

To limit the viewed images, you can filter results according to terms appearing in the captions or descriptions or by course unit number.

One thing to note is that although the OpenLearn units are CC licensed, some of the images used in the units (particularly third party images) may not be so liberally licensed. At the moment, there is a disconnect in the OU XML between images and any additional rights information (typically a set of unstructured acknowledgements at the end of the unit XML), which makes a fully automated “open images from OpenLearn” gallery/previewer tricky to knock together. (When I get a chance, I’ll put together a few thoughts about what would be required to support such a service. It probably won’t be much, just an appropriate metadata filed or two…)

PS here’s an example of why the ‘need a cookie to get the image’ thing is really rather crap… I embedded an image from OpenLearn, via a link/url, in a post (making sure to link back to the original page). Good for me – I get a relevant image, I don’t have to upload it anywhere – good for OpenLearn, they get a link back, good for OpenLearn, they get a loggable server hit when anyone views the image (although bad for them in that it’s their server and bandwidth that has to deliver the image).

However, as it stands, it’s bad for OpenLearn because all the users see is a broken link, rather than the image, unless you have a current OpenLearn cookie session already set. The fix for me is more work: download the image, upload it to my own server, and then embed my copy of the image. OpenLearn no longer gets any of the “paradata” surrounding the views on that image, and indeed may never even know that I’m reusing it…

Written by Tony Hirst

April 20, 2012 at 10:09 am

Posted in Open Content, OU2.0

Tagged with

Deconstructing OpenLearn Units – Glossary Items, Learning Outcomes and Image Search

It turns out that part of the grief I encountered here in trying to access OpenLearn XML content was easily resolved (check the comments: mechanise did the trick…), though I’ve still to try to sort out a workaround for accessing OpenLearn images (a problem described here)), but at least now I have another stepping stone: a database of some deconstructed OpenLearn content.

Using Scraperwiki to pull down and parse the OpenLearn XML files, I’ve created some database tables that contain the following elements scraped from across the OpenLearn units by this OpenLearn XML Processor:

  • glossary items;
  • learning objectives;
  • figure captions and descriptions.

You can download CSV data files corresponding to the tables, or the whole SQLite database. (Note that there is also an “errors” table that identifies units that threw an error when I tried to grab, or parse, the OpenLearn XML.)

Unfortunately, I haven’t had a chance yet to pop up a view over the data (I tried, briefly, but today was another of those days where something that’s probably very simple and obvious prevented me from getting the code I wanted to write working; if anyone has an example Scraperwiki view that chucks data into a sortable HTML table or a Simile Exhibit searchable table, please post a link below; or even better, add a view to the scraper:-)

So in the meantime, if ypu want to have a play, you need to make use of the Scraperwiki API wizard.

Here are some example queries:

  • a search for figure descriptions containing the word “communication” – select * from `figures` where desc like ‘%communication%’: try it
  • a search over learning outcomes that include the phrase how to followed at some point by the word dataselect * from `learningoutcomes` where lo like ‘%how to%data%’: try it
  • a search of glossary items for glossary terms that contain the word “period” or a definition that contains the word “ancient” – select * from `glossary` where definition like ‘%ancient%’ or term like ‘%period%’: try it
  • find figures with empty captions – select * from `figures` where caption==”: try it

I’ll try to add some more examples when I get a chance, as well as knocking up a more friendly search interface. Unless you want to try…?!;-)

Written by Tony Hirst

March 15, 2012 at 10:59 am

Do We Need an OpenLearn Content Liberation Front?

For me, one of the defining attributes of openness relates to accessibility of the machine kind: if I can’t write a script to handle the repetitive stuff for me, or can’t automate the embedding of image and/or video resources, then whatever the content is, it’s not open enough in a practical sense for me to do what I want with it.

So here’s an, erm, how can I put this politely, little niggle I have with OpenLearn XML. (For those of you not keeping up, one of the many OpenLearn sites is the OU’s open course materials site; the materials published on the site as course unit contentful HTML pages are also available as structured XML documents. (When I say “structured”, I mean that certain elements of the materials are marked up in a semantically meaningful way; lots of elements aren’t, but we have to start somewhere ;-))

The context is this: following on from my presentation on Making More of Structured Course Materials at the eSTeEM conference last week, I left a chat with Jonathan Fine with the intention of seeing what sorts of secondary product I could easily generate from the OpenLearn content. I’m in the middle of building a scraper and structured content extractor at the moment, grabbing things like learning outcomes, glossary items, references and images, but I almost immediately hit a couple of problems, first with actually locating the OU XML docs, and secondly locating the images…

Getting hold of a machine readable list of OpenLearn units is easy enough via the OpenLearn OPML feed (much easier to work with than the “all units” HTML index page). Units are organised by topic and are listed using the following format:

<outline type="rss" text="Unit content for Water use and the water cycle" htmlUrl="http://openlearn.open.ac.uk/course/view.php?name=S278_12" xmlUrl="http://openlearn.open.ac.uk/rss/file.php/stdfeed/4307/S278_12_rss.xml"/>

URLs of the form http://openlearn.open.ac.uk/course/view.php?name=S278_12 link to a ‘homepage” for each unit, which then links to the first page of actual content, content which is also available in XML form. The content page URLs have the form http://openlearn.open.ac.uk/mod/oucontent/view.php?id=398820&direct=1, where the ID is one-one uniquely mapped to the course name identifier. The XML version of the page can then be accessed by changing direct=1 in the URL to content=1. Only, we don’t know the mapping from course unit name to page id. The easiest way I’ve found of doing that is to load in the RSS feed for each unit and grab the first link URL, which points the first HTML content page view of the unit.

I’ve popped a scraper up on Scraperwiki to build the lookup for XML URLs for OpenLearn units – OpenLearn XML Processor:

import scraperwiki

from lxml import etree

#===
#via http://stackoverflow.com/questions/5757201/help-or-advice-me-get-started-with-lxml/5899005#5899005
def flatten(el):           
    result = [ (el.text or "") ]
    for sel in el:
        result.append(flatten(sel))
        result.append(sel.tail or "")
    return "".join(result)
#===

def getcontenturl(srcUrl):
    rss= etree.parse(srcUrl)
    rssroot=rss.getroot()
    try:
        contenturl= flatten(rssroot.find('./channel/item/link'))
    except:
        contenturl=''
    return contenturl

def getUnitLocations():
    #The OPML file lists all OpenLearn units by topic area
    srcUrl='http://openlearn.open.ac.uk/rss/file.php/stdfeed/1/full_opml.xml'
    tree = etree.parse(srcUrl)
    root = tree.getroot()
    topics=root.findall('.//body/outline')
    #Handle each topic area separately?
    for topic in topics:
        tt = topic.get('text')
        print tt
        for item in topic.findall('./outline'):
            it=item.get('text')
            if it.startswith('Unit content for'):
                it=it.replace('Unit content for','')
                url=item.get('htmlUrl')
                rssurl=item.get('xmlUrl')
                ccu=url.split('=')[1]
                cctmp=ccu.split('_')
                cc=cctmp[0]
                if len(cctmp)>1: ccpart=cctmp[1]
                else: ccpart=1
                slug=rssurl.replace('http://openlearn.open.ac.uk/rss/file.php/stdfeed/','')
                slug=slug.split('/')[0]
                contenturl=getcontenturl(rssurl)
                print tt,it,slug,ccu,cc,ccpart,url,contenturl
                scraperwiki.sqlite.save(unique_keys=['ccu'], table_name='unitsHome', data={'ccu':ccu, 'uname':it,'topic':tt,'slug':slug,'cc':cc,'ccpart':ccpart,'url':url,'rssurl':rssurl,'ccurl':contenturl})

getUnitLocations()

The next step in the plan (because I usually do have a plan; it’s hard to play effectively without some sort of direction in mind…) as far as images goes was to grab the figure elements out of the XML documents and generate an image gallery that allows you to search through OpenLearn images by title/caption and/or description, and preview them. Getting the caption and description from the XML is easy enough, but getting the image URLs is not

Here’s an example of a figure element from an OpenLearn XML document:

<Figure id="fig001">
<Image src="\\DCTM_FSS\content\Teaching and curriculum\Modules\Shared Resources\OpenLearn\S278_5\1.0\s278_5_f001hi.jpg" height="" webthumbnail="false" x_imagesrc="s278_5_f001hi.jpg" x_imagewidth="478" x_imageheight="522"/>
<Caption>Figure 1 The geothermal gradient beneath a continent, showing how temperature increases more rapidly with depth in the lithosphere than it does in the deep mantle.</Caption>
<Alternative>Figure 1</Alternative>
<Description>Figure 1</Description>
</Figure>

Looking at the HTML page for the corresponding unit on OpenLearn, we see it points to the image resource file at http://openlearn.open.ac.uk/file.php/4178/!via/oucontent/course/476/s278_5_f001hi.jpg:

So how can we generate that image URL from the resource link in the XML document? The filename is the same, but how can we generate what are presumably contextually relevant path elements: http://openlearn.open.ac.uk/file.php/4178/!via/oucontent/course/476/

If we look at the OpenLearn OPML file that lists all current OpenLearn units, we can find the first identifier in the path to the RSS file:

<outline type="rss" text="Unit content for Energy resources: Geothermal energy" htmlUrl="http://openlearn.open.ac.uk/course/view.php?name=S278_5" xmlUrl="http://openlearn.open.ac.uk/rss/file.php/stdfeed/4178/S278_5_rss.xml"/>

But I can’t seem to find a crib for the second identifier – 476 – anywhere? Which means I can’t mechanise the creation of links to actually OpenLearn image assets from the XML source. Also note that there are no credits, acknowledgements or license conditions associated with the image contained within the figure description. Which also makes it hard to reuse the image in a legal, rights recognising sense.

[Doh - I can surely just look at URL for an image in an OpenLearn unit RSS feed and pick the path up from there, can't I? Only I can't, because the image links in the RSS feeds are: a) relative links, without path information, and b) broken as a result...]

Reusing images on the basis of the OpenLearn XML “sourcecode” document is therefore: NOT OBVIOUSLY POSSIBLE.

What this suggests to me is that if you release “source code” documents, they may actually need some processing in terms of asset resolution that generates publicly resolvable locators to assets if they are encoded within the source code document as “private” assets/non-resolvable identifiers.

Where necessary, acknowledgements/credits are provided in the backmatter using elements of the form:

<Paragraph>Figure 7 Willes-Richards, J., et al. (1990) ; HDR Resource/Economics’ in Baria, R. (ed.) <i>Hot Dry Rock Geothermal Energy</i>, Copyright CSM Associates Limited</Paragraph>

Whilst OU-XML does support the ability to make a meaningful link to a resource within the XML document, using an element of the form:

<CrossRef idref="fig007">Figure 7</CrossRef>

(which presumably uses the Alternative label as the cross-referenced identifier, although not the figure element id (eg fig007) which is presumably unique within any particular XML document?), this identifier is not used to link the informally stated figure credit back to the uniquely identified figure element?

If the same image asset is used in several course units, there is presumably no way of telling from the element data (or even, necessarily, the credit data?) whether the images are in fact one and the same. That is, we can’t audit the OpenLearn materials in a text mechanised way to see whether or not particular images are reused across two or more OpenLearn units.

Just in passing, it’s maybe also worth noting that in the above case at least, a description for the image is missing. In actual OU course materials, the description element is used to capture a textual description of the image that explicates the image in the context of the surrounding text. This represents a partial fulfilment of accessibility requirements surrounding images and represents, even if not best, at least effective practice.

Where else might content need liberating within OpenLearn content? At the end of the course unit XML documents, in the “backmatter” element, there is often a list of references. References have the form:

<Reference>Sheldon, P. (2005) Earth’s Physical Resources: An Introduction (Book 1 of S278 Earth’s Physical Resources: Origin, Use and Environmental Impact), The Open University, Milton Keynes</Reference>

Hmmm… no structure there… so how easy would it be to reliably generate a link to an authoritative record for that item? (Note that other records occasionally use presentational markup such as italics (or emphasis) tags to presentationally style certain parts of some references (confusing presentation with semantics…).)

Finally, just a quick note on why I’m blogging this publicly rather than raising it, erm, quietly within the OU. My reasoning is similar to the reasoning we use when we tell students to not be afraid of asking questions, because it’s likely that others will also have the same question… I’m asking a question about the structure of an open educational resource, because I don’t quite understand it; by asking the question in public, it may be the case that others can use the same questioning strategy to review the way they present their materials, so when I find those, I don’t have to ask similar sorts of question again;-)

PS sort of related to this, see TechDis’ Terry McAndrew’s Accessible courses need and accessibilty-friendly schema standard.

PPS see also another take on ways of trying to reduce cognitive waste – Joss Winn’s latest bid in progress, which will examine how the OAuth 2.0 specification can be integrated into a single sign on environment alongside Microsoft’s Unified Access Gateway. If that’s an issue or matter of interest in your institution, why not fork the bid and work it up yourself, or maybe even fork it and contribute elements back?;-) (Hmm, if several institutions submitted what was essentially the same bid from multiple institutions, how would they cope during the marking process?!;-)

Written by Tony Hirst

March 13, 2012 at 1:10 pm

Follow

Get every new post delivered to your Inbox.

Join 800 other followers