OERs: Public Service Education and Open Production

I suspect that most people over a certain age have some vague memory of OU programmes broadcast in support of OU courses taking over BBC2 at at various “off-peak” hours of the day (including Saturday mornings, if I recall correctly…)

These courses formed an important part of OU courses, and were also freely available to anyone who wanted to watch them. In certain respects, they allowed the OU to operate as a public service educator, bringing ideas from higher education to a wider audience. (A lot has been said about the role of the UK’s personal computer culture in the days of the ZX Spectrum and the BBC Micro in bootstrapping software skills development, and in particular the UK computer games industry; but we don’t hear much about the role the OU played in raising aspiration and introducing the very idea of what might be involved in higher education through free-to-air broadcasts of OU course materials, which I’m convinced it must have played. I certainly remember watching OU maths and physics programmes as a child, and wanting to know more about “that stuff” even if I couldn’t properly follow it at the time.)

The OU’s broadcast strategy has evolved since then, of course, moving into prime time broadcasts (Child of Our Time, Coast, various outings with James May, The Money Programme, and so on) as well as “online media”: podcasts on iTunes and video content on Youtube, for example.

The original OpenLearn experiment, which saw 10-20hr extracts of OU course material being released for free continues, but as I understand it, is now thought of in the context of a wider OpenLearn engagement strategy that will aggregate all the OU’s public output (from open courseware and OU podcasts to support for OU/BBC co-produced content) under a single banner: OpenLearn

I suspect there will continue to be forays into the world of “social media”, too:

A great benefit of the early days of OU programming on the BBC was that you couldn’t help but stumble across it. You can still stumble across OU co-produced broadcasts on the BBC now, of course, but they don’t fulfil the same role: they aren’t produced as academic programming designed to support particular learning outcomes and aren’t delivered in a particularly academic way. They’re more about entertainment. (This isn’t necessarily a bad thing, but I think it does influence the stance you take towards viewing the material.)

If we think of the originally produced TV programmes as “OERs”, open educational resources, what might we say about them?

– they were publicly available;
– they were authentic, relating to the delivery of actual OU courses;
– the material was viewed by OU students enrolled on the associated course, as well as viewers following a particular series out of general interest, and those who just happened to stumble by the programme;
– they provided pacing, and the opportunity for a continued level of engagement over a period of weeks, on a single academic topic;
– they provided a way of delivering lifelong higher education as part of the national conversation, albeit in the background. But it was always there…

In a sense, the broadcasts offered a way for the world to “follow along” parts of a higher education as it was being delivered.

In many ways, the “Massive Open Online Courses” (MOOCs), in which a for-credit course is also opened up to informal participants, and the various Stanford open courses that are about to start (Free computer science courses, new teaching technology reinvent online education), use a similar approach.

I generally see this as a Good Thing, as universities engaging in public service education whilst at the same time delivering additional support, resources, feedback, assessment and credit to students formally enrolled on the course.

What I’m not sure about is that initiatives like OpenLearn succeed in the “public service education” role, in part because of the discovery problem: you couldn’t help but stumble across OU/BBC Two broadcasts at certain times of the day. Nowadays, I’d be surprised if you ever stumbled across OpenLearn content while searching the web…

A recent JISC report on OER Impact focussed on the (re)use of OERs in higher education, identifying a major use case of OERs as enhancing teaching practice.

(NB I would have embedded the OER Impact project video here, but WordPress.com doesn’t seem to support embeds from Blip…; openness is not just about the licensing, it’s also about the practical ease of (re)use;-)

However, from my quick reading of the OER impact report, it doesn’t really seem to consider the “open course” use case demonstrated by MOOCs, the Stanford courses, or mid-70s OU course broadcasts. (Maybe this was out of scope…!;-)

Nor does it consider the production of OERs (I think that was definitely out of scope).

For the JISC OER3 funding call, I was hoping to put in a bid for a project based around an open “production-in-presentation” model of resource development targeted to a specific community. For a variety of reasons, (not least, I suspect, my lack of project management skills…) that’s unlikely to be submitted in time, so I thought I’d post the main chunk of the bid here as a way of trying to open up the debate a little more widely about the role of OERs, the utility of open production models, and the extent to they can be used to support cross-sector curriculum innovation/discovery as well as co-creation of resources and resource reuse (both within HE and into a target user community).

Outline
Rapid Resource Discovery and Development via Open Production Pair Teaching (ReDOPT) seeks to draft a set of openly licensed resources for potential (re)use in courses in two different institutions … through the real-time production and delivery of an open online short-course in the area of data handling and visualisation. This approach subverts the more traditional technique of developing materials for a course and then retrospectively making them open, by creating the materials in public and in an openly licensed way, in a way that makes them immediately available for informal study as well as open web discovery, embedding them in a target community, and then bringing them back into the closed setting for formal (re)use. The course will be promoted to the data journalism and open data communities as a free “MOOC” (Massive Online Open Course)/P2PU style course, with a view to establishing an immediate direct use by a practitioner community. The project will proceed as follows: over a 10-12 week period, the core project team will use a variant of the Pair Teaching approach to develop and publish an informal open, online course hosted on an .ac.uk domain via a set of narrative linked resources (each one about the length of a blog post and representing 10 minutes to 1 hour of learner activity) mapping out the project team’s own exploration/learning journey through the topic area. The course scope will be guided by a skeleton curriculum determined in advance from a review of current literature, informal interviews/questionnaires and perceived skills and knowledge gaps in the area. The created resources will contain openly licensed custom written/bespoke material, embedded third party content (audio, video, graphical, data), and selected links to relevant third party material. A public custom search engine in the topic area will also be curated during the course. Additional resources created by course participants (some of whom may themselves be part of the project team), will be integrated into the core course and added to the custom search engine by the project team. Part-time, hourly paid staff will also be funded to contribute additional resources into the evolving course. A second phase of the project will embed the resources as learning resources in the target community through the delivery of workshops based around and referring out to the created resources, as well as community building around the resources. Because of timescales involved, this proposal is limited to the production of the draft materials and embedding them as valuable and appropriate resources in the target community, and does not extend as far as the reuse/first formal use case. Success metrics will therefore be limited to impact evaluation, volume and reach of resources produced, community engagement with the live production of the materials, the extent to which project team members intend to directly reuse the materials produced as a result.

The Proposal
1. The aim of the project is to produce a set of educational resources in a practical topic area (data handling and visualisation), that are reusable by both teachers (as teaching resources) and independent learners (as learning resources), through the development of an openly produced online course in the style of an uncourse created in real time using a Pair Teaching approach as opposed to a traditional sole author or OU style course team production process, and to establish those materials as core reusable educational resources in the target community.

3. … : Extend OER through collaborations beyond HE: the proposal represents a collaboration between two HEIs in the production and anticipated formal (re)use of the materials created, as well as directly serving the needs of the fledgling data-driven journalism community and the open public data communities.

4. … : Addressing sector challenges (ii Involving academics on part-time, hourly-paid contracts): the open production model will seek to engage /part time, hourly paid staff/ in creating additional resources around the course themes that they can contribute back to the course under an open license and that cover a specific issue identified by the course lead or that the part-time staff themselves believe will add value to the course. (Note that the course model will also encourage participants in the course to create and share relevant resources without any financial recompense.) Paying hourly rate staff for the creation of additional resources (which may include quizzes or other informal assessment/feedback related resources), or in the role of editors of community produced resources, represents a middle ground between the centrally produced core resources and any freely submitted resources from the community. Incorporating the hourly paid contributor role is based on the assumption that payment may be appropriate for sourcing course enhancing contributions that are of a higher quality (and may take longer to produce) than community sourced contributions, as well as requiring the open licensing of materials so produced. The model also explores a model under which hourly staff can contribute to the shaping of the course on an ad hoc basis if they see opportunities to do so.

5. … Enhancing the student experience (ii Drawing on student-produced materials): The open production model will seek to engage with the community following the course and encourage them to develop and contribute resources back into the community under an open license. For example, the use of problem based exercises and activities will result in the production of resources that can be (re)used within the context of the uncourse itself as an output of the actual exercise or activity.

6. … The project seeks to explore practical solutions to two issues relating to the wider adoption of OERs by producers and consumers, and provide a case study that other projects may draw on. In the first case, how to improve the discoverablity and direct use of resources on the web by “learners” who do not know they are looking for OERs, or even what OERs are, through creating resources that are published as contributions to the development and support of a particular community and as such are likely to benefit from “implicit” search engine optimisation (SEO) resulting from this approach. In the second case, to explore a mechanism that identifies what resources a community might find useful through curriculum negotiation during presentation, and the extent to which “draft” resources might actually encourage reuse and revision.

7. Rather than publishing an open version of a predetermined, fixed set of resources that have already been produced as part of a closed process and then delivered in a formal setting, the intention is thus to develop an openly licensed set of “draft” resources through the “production in presentation” delivery of an informal open “uncourse” (in-project scope), and at a later date reuse those resources in a formally offered closed/for-credit course (out-of-project scope). The uncourse will not incorporate assessment elements, although community engagement and feedback in that context will be in scope. The uncourse approach draws on the idea of “teacher as learner”, with the “teacher” capturing and reflecting on meaningful learning episodes as they explore a topic area and then communicate these through the development of materials that others can learn from, as well as demonstrating authentic problem solving and self-directed learning behaviours that model the independent learning behaviours we are trying to develop in our students.

8. The quality of the resources will be assured at least to the level of fit-for-purpose at the time of release by combining the uncourse production style with a Pair Teaching approach. A quality improvement process will also operate through responding to any issues identified via the community based peer-review and developmental testing process that results from developing the materials in public.

9. The topic area was chosen based on several factors: a) the experience and expertise of the project team; b) the observation that there are no public education programmes around the increasing amounts of open public data; c) the observation that very few journalism academics have expertise in data journalism; d) the observation that practitioners engaged in data journalism do not have time or interest in to become academics, but do appear willing to share their knowledge.

10. The first uncourse will run over a 6-8 week period and result in the central/core development of circa 5 to 10 blog posts styled resources a week, each requiring 20-45 minutes of “student” activity, (approx. 2-6 hours study time per week equivalent) plus additional directed reading/media consumption time (ideally referencing free and openly licensed content). A second presentation of the uncourse will reuse and extend materials produced during the first presentation, as well as integrating resources, where possible, developed by the community in the first phase and monitoring the amount of time taken to revise/reversion them, as required, compared to the time taken to prepare resources from scratch centrally. Examples of real-time, interactive and graphical representations of data will be recorded as video screencasts and made available online. Participants will be encouraged to consider the information design merits of comparative visualisation methods for publication on different media platforms: print, video, interactive and mobile. In all, we hope to deliver up to 50 hours of centrally produced, openly licensed materials by the end of the course. The uncourse will also develop a custom search engine offering coverage of openly licensed and freely accessible resources related to the course topic area.

11. The course approach is inspired to a certain extent by the Massive Online Open Course (MOOC) style courses pioneered by George Siemens, Stephen Downes, Dave Cormier, Jim Groom et al. The MOOC approach encourages learners to explore a given topic space with the help of some wayfinders. Much of the benefit is derived from the connections participants make between each other and the content by sharing, reflecting, and building on the contributions of others across different media spaces, like blogs, Twitter, forums, YouTube, etc.

12. The course model also draws upon the idea of a uncourse, as demonstrated by Hirst in the creation of the Digital Worlds game development blog [ http://digitalworlds.wordpress.com ] that produced a series of resources as part of an openly blogged learning journey that have since been reused directly in an OU course (T151 Digital Worlds); and the Visual Gadgets blog ( http://visualgadgets.blogspot.com ) that drafted materials that later came to be reused in the OU course T215 Communication and information technologies, and then made available under open license as the OpenLearn unit Visualisation: Visual representations of data and information [ http://openlearn.open.ac.uk/course/view.php?id=4442 ]

13. A second phase of the project will explore ways of improving the discovery of resources in an online context, as well as establishing them as important and relevant resources within the target community. Through face-to-face workshops and hack days, we will run a series of workshops at community events that draw on and extend the activities developed during the initial uncourse, and refer participants to the materials. A second presentation of the uncourse will be offered as a way of testing and demonstrating reuse of the resources, as well as providing an exit path from workshop activities. One possible exit path from the uncourse would be entry into formal academic courses.

14. Establishing the resources within the target community is an important aspect of the project. Participation in community events plays an important role in this, and also helps to prove the resources produced. Attendance at events such as the Open Government Data camp will allow us to promote the availability of the resources to the appropriate European community, further identify community needs, and also provide a backdrop for the development of a promotional video with vox pops from the community hopefully expressing support for the resources being produced. The extent to which materials do become adopted and used within the community will be form an important part of the project evaluation.

15. … By embedding resources in the target community, we aim to enhance the practical utility of the resources within that community as well as providing an academic consideration of the issues involved. A key part of the evaluation workpackage, …, will be to rate the quality of the materials produced and the level of engagement with and reuse of them by both educators and members of the target community.

Note that I am still keen on working this bid up a bit more for submission somewhere else…;-)

[Note that the opinions expressed herein are very much my own personal ones…]

PS see also COL-UNESCO consultation: Guidelines for OER in Higher Education – Request for comments: OER Guidelines for Higher Education Stakeholders

eSTeEM Conference Presentation – Making More of Structured Course Materials

A copy of the presentation I gave at the OU-eSTeEM conference (no event URL?) on generating custom course search engines and mining OU XML documents to generate course mindmaps (Making More of Structured Documents presentation; delicious stack/bookmark list of related resources):

Chatting to Jonathan Fine after the event, he gave me the phrase secondary products to describe things like course mindmaps that can be generated from XML source files of OU course materials. From what I can tell, there isn’t much if any work going on in the way of finding novel ways of exploiting the structure of OU structured course materials, other than using them simply as a way of generating different presentational views of the course materials as a whole (that is, HTML versions, maybe mobile friendly versions, PDF versions). (If that’s not the case, please feel free to put me right in the comments:-)

One thing Jonathan has been scouring the documents for is evidence of mathematical content across the courses; he also mentioned a couple of ideas relating to access audits over the content itself, such as extracting figure headings, or image captions. (This reminded me of the OpenLearn XML processor (and redux) I first played with 4 years ago (sigh… and nothing’s changed… sigh….), which stripped assets by type from the first generation of OU XML docs). So on my to do list is to have a deeper look at the structure of OU XML, have a peek at what sorts of things might meaningfully (and easily;-) extracted, and figure out two or three secondary products that can be generated as a result. Note that these products might be products for different audiences, at different times of the course lifecycle: tools for use by course team or LTS during production (such as accessibility checks), products to support maintenance (there is already a link checker, but maybe there is more that can be done here?), products for students (such as the mindmap), products for alumni, products for OpenLearn views over the content, products to support “learning analytics”, and so on. (If you have any ideas of what forms the secondary products might take, or what structures/elements/entities you’d like to see mined from OU XML, please let me know via the comments. For an example of an OU XML doc, see here.

Do We Need an OpenLearn Content Liberation Front?

For me, one of the defining attributes of openness relates to accessibility of the machine kind: if I can’t write a script to handle the repetitive stuff for me, or can’t automate the embedding of image and/or video resources, then whatever the content is, it’s not open enough in a practical sense for me to do what I want with it.

So here’s an, erm, how can I put this politely, little niggle I have with OpenLearn XML. (For those of you not keeping up, one of the many OpenLearn sites is the OU’s open course materials site; the materials published on the site as course unit contentful HTML pages are also available as structured XML documents. (When I say “structured”, I mean that certain elements of the materials are marked up in a semantically meaningful way; lots of elements aren’t, but we have to start somewhere ;-))

The context is this: following on from my presentation on Making More of Structured Course Materials at the eSTeEM conference last week, I left a chat with Jonathan Fine with the intention of seeing what sorts of secondary product I could easily generate from the OpenLearn content. I’m in the middle of building a scraper and structured content extractor at the moment, grabbing things like learning outcomes, glossary items, references and images, but I almost immediately hit a couple of problems, first with actually locating the OU XML docs, and secondly locating the images…

Getting hold of a machine readable list of OpenLearn units is easy enough via the OpenLearn OPML feed (much easier to work with than the “all units” HTML index page). Units are organised by topic and are listed using the following format:

<outline type="rss" text="Unit content for Water use and the water cycle" htmlUrl="http://openlearn.open.ac.uk/course/view.php?name=S278_12" xmlUrl="http://openlearn.open.ac.uk/rss/file.php/stdfeed/4307/S278_12_rss.xml"/>

URLs of the form http://openlearn.open.ac.uk/course/view.php?name=S278_12 link to a ‘homepage” for each unit, which then links to the first page of actual content, content which is also available in XML form. The content page URLs have the form http://openlearn.open.ac.uk/mod/oucontent/view.php?id=398820&direct=1, where the ID is one-one uniquely mapped to the course name identifier. The XML version of the page can then be accessed by changing direct=1 in the URL to content=1. Only, we don’t know the mapping from course unit name to page id. The easiest way I’ve found of doing that is to load in the RSS feed for each unit and grab the first link URL, which points the first HTML content page view of the unit.

I’ve popped a scraper up on Scraperwiki to build the lookup for XML URLs for OpenLearn units – OpenLearn XML Processor:

import scraperwiki

from lxml import etree

#===
#via http://stackoverflow.com/questions/5757201/help-or-advice-me-get-started-with-lxml/5899005#5899005
def flatten(el):           
    result = [ (el.text or "") ]
    for sel in el:
        result.append(flatten(sel))
        result.append(sel.tail or "")
    return "".join(result)
#===

def getcontenturl(srcUrl):
    rss= etree.parse(srcUrl)
    rssroot=rss.getroot()
    try:
        contenturl= flatten(rssroot.find('./channel/item/link'))
    except:
        contenturl=''
    return contenturl

def getUnitLocations():
    #The OPML file lists all OpenLearn units by topic area
    srcUrl='http://openlearn.open.ac.uk/rss/file.php/stdfeed/1/full_opml.xml'
    tree = etree.parse(srcUrl)
    root = tree.getroot()
    topics=root.findall('.//body/outline')
    #Handle each topic area separately?
    for topic in topics:
        tt = topic.get('text')
        print tt
        for item in topic.findall('./outline'):
            it=item.get('text')
            if it.startswith('Unit content for'):
                it=it.replace('Unit content for','')
                url=item.get('htmlUrl')
                rssurl=item.get('xmlUrl')
                ccu=url.split('=')[1]
                cctmp=ccu.split('_')
                cc=cctmp[0]
                if len(cctmp)>1: ccpart=cctmp[1]
                else: ccpart=1
                slug=rssurl.replace('http://openlearn.open.ac.uk/rss/file.php/stdfeed/','')
                slug=slug.split('/')[0]
                contenturl=getcontenturl(rssurl)
                print tt,it,slug,ccu,cc,ccpart,url,contenturl
                scraperwiki.sqlite.save(unique_keys=['ccu'], table_name='unitsHome', data={'ccu':ccu, 'uname':it,'topic':tt,'slug':slug,'cc':cc,'ccpart':ccpart,'url':url,'rssurl':rssurl,'ccurl':contenturl})

getUnitLocations()

The next step in the plan (because I usually do have a plan; it’s hard to play effectively without some sort of direction in mind…) as far as images goes was to grab the figure elements out of the XML documents and generate an image gallery that allows you to search through OpenLearn images by title/caption and/or description, and preview them. Getting the caption and description from the XML is easy enough, but getting the image URLs is not

Here’s an example of a figure element from an OpenLearn XML document:

<Figure id="fig001">
<Image src="\\DCTM_FSS\content\Teaching and curriculum\Modules\Shared Resources\OpenLearn\S278_5\1.0\s278_5_f001hi.jpg" height="" webthumbnail="false" x_imagesrc="s278_5_f001hi.jpg" x_imagewidth="478" x_imageheight="522"/>
<Caption>Figure 1 The geothermal gradient beneath a continent, showing how temperature increases more rapidly with depth in the lithosphere than it does in the deep mantle.</Caption>
<Alternative>Figure 1</Alternative>
<Description>Figure 1</Description>
</Figure>

Looking at the HTML page for the corresponding unit on OpenLearn, we see it points to the image resource file at http://openlearn.open.ac.uk/file.php/4178/!via/oucontent/course/476/s278_5_f001hi.jpg:

So how can we generate that image URL from the resource link in the XML document? The filename is the same, but how can we generate what are presumably contextually relevant path elements: http://openlearn.open.ac.uk/file.php/4178/!via/oucontent/course/476/

If we look at the OpenLearn OPML file that lists all current OpenLearn units, we can find the first identifier in the path to the RSS file:

<outline type="rss" text="Unit content for Energy resources: Geothermal energy" htmlUrl="http://openlearn.open.ac.uk/course/view.php?name=S278_5" xmlUrl="http://openlearn.open.ac.uk/rss/file.php/stdfeed/4178/S278_5_rss.xml"/>

But I can’t seem to find a crib for the second identifier – 476 – anywhere? Which means I can’t mechanise the creation of links to actually OpenLearn image assets from the XML source. Also note that there are no credits, acknowledgements or license conditions associated with the image contained within the figure description. Which also makes it hard to reuse the image in a legal, rights recognising sense.

[Doh – I can surely just look at URL for an image in an OpenLearn unit RSS feed and pick the path up from there, can’t I? Only I can’t, because the image links in the RSS feeds are: a) relative links, without path information, and b) broken as a result…]

Reusing images on the basis of the OpenLearn XML “sourcecode” document is therefore: NOT OBVIOUSLY POSSIBLE.

What this suggests to me is that if you release “source code” documents, they may actually need some processing in terms of asset resolution that generates publicly resolvable locators to assets if they are encoded within the source code document as “private” assets/non-resolvable identifiers.

Where necessary, acknowledgements/credits are provided in the backmatter using elements of the form:

<Paragraph>Figure 7 Willes-Richards, J., et al. (1990) ; HDR Resource/Economics’ in Baria, R. (ed.) <i>Hot Dry Rock Geothermal Energy</i>, Copyright CSM Associates Limited</Paragraph>

Whilst OU-XML does support the ability to make a meaningful link to a resource within the XML document, using an element of the form:

<CrossRef idref="fig007">Figure 7</CrossRef>

(which presumably uses the Alternative label as the cross-referenced identifier, although not the figure element id (eg fig007) which is presumably unique within any particular XML document?), this identifier is not used to link the informally stated figure credit back to the uniquely identified figure element?

If the same image asset is used in several course units, there is presumably no way of telling from the element data (or even, necessarily, the credit data?) whether the images are in fact one and the same. That is, we can’t audit the OpenLearn materials in a text mechanised way to see whether or not particular images are reused across two or more OpenLearn units.

Just in passing, it’s maybe also worth noting that in the above case at least, a description for the image is missing. In actual OU course materials, the description element is used to capture a textual description of the image that explicates the image in the context of the surrounding text. This represents a partial fulfilment of accessibility requirements surrounding images and represents, even if not best, at least effective practice.

Where else might content need liberating within OpenLearn content? At the end of the course unit XML documents, in the “backmatter” element, there is often a list of references. References have the form:

<Reference>Sheldon, P. (2005) Earth’s Physical Resources: An Introduction (Book 1 of S278 Earth’s Physical Resources: Origin, Use and Environmental Impact), The Open University, Milton Keynes</Reference>

Hmmm… no structure there… so how easy would it be to reliably generate a link to an authoritative record for that item? (Note that other records occasionally use presentational markup such as italics (or emphasis) tags to presentationally style certain parts of some references (confusing presentation with semantics…).)

Finally, just a quick note on why I’m blogging this publicly rather than raising it, erm, quietly within the OU. My reasoning is similar to the reasoning we use when we tell students to not be afraid of asking questions, because it’s likely that others will also have the same question… I’m asking a question about the structure of an open educational resource, because I don’t quite understand it; by asking the question in public, it may be the case that others can use the same questioning strategy to review the way they present their materials, so when I find those, I don’t have to ask similar sorts of question again;-)

PS sort of related to this, see TechDis’ Terry McAndrew’s Accessible courses need and accessibilty-friendly schema standard.

PPS see also another take on ways of trying to reduce cognitive waste – Joss Winn’s latest bid in progress, which will examine how the OAuth 2.0 specification can be integrated into a single sign on environment alongside Microsoft’s Unified Access Gateway. If that’s an issue or matter of interest in your institution, why not fork the bid and work it up yourself, or maybe even fork it and contribute elements back?;-) (Hmm, if several institutions submitted what was essentially the same bid from multiple institutions, how would they cope during the marking process?!;-)

Deconstructing OpenLearn Units – Glossary Items, Learning Outcomes and Image Search

It turns out that part of the grief I encountered here in trying to access OpenLearn XML content was easily resolved (check the comments: mechanise did the trick…), though I’ve still to try to sort out a workaround for accessing OpenLearn images (a problem described here)), but at least now I have another stepping stone: a database of some deconstructed OpenLearn content.

Using Scraperwiki to pull down and parse the OpenLearn XML files, I’ve created some database tables that contain the following elements scraped from across the OpenLearn units by this OpenLearn XML Processor:

  • glossary items;
  • learning objectives;
  • figure captions and descriptions.

You can download CSV data files corresponding to the tables, or the whole SQLite database. (Note that there is also an “errors” table that identifies units that threw an error when I tried to grab, or parse, the OpenLearn XML.)

Unfortunately, I haven’t had a chance yet to pop up a view over the data (I tried, briefly, but today was another of those days where something that’s probably very simple and obvious prevented me from getting the code I wanted to write working; if anyone has an example Scraperwiki view that chucks data into a sortable HTML table or a Simile Exhibit searchable table, please post a link below; or even better, add a view to the scraper:-)

So in the meantime, if ypu want to have a play, you need to make use of the Scraperwiki API wizard.

Here are some example queries:

  • a search for figure descriptions containing the word “communication” – select * from `figures` where desc like ‘%communication%’: try it
  • a search over learning outcomes that include the phrase how to followed at some point by the word dataselect * from `learningoutcomes` where lo like ‘%how to%data%’: try it
  • a search of glossary items for glossary terms that contain the word “period” or a definition that contains the word “ancient” – select * from `glossary` where definition like ‘%ancient%’ or term like ‘%period%’: try it
  • find figures with empty captions – select * from `figures` where caption==”: try it

I’ll try to add some more examples when I get a chance, as well as knocking up a more friendly search interface. Unless you want to try…?!;-)

Infoskills 2.012 – Practical Exercises in Social Media Network Analysis #change11

As ever, it seems the longer I have to prepare something, the less likely I am to do it. I was supposed to be running a #change11 MOOC session this week – Infoskills 2.012 How to do a lot with a little – but having had it in the diary for a 6 months or so, I have, of course, done nothing to prepare for it… (I didn’t come up with the 2.012 – not sure who did?)

Anyway….over the weekend, I gave a presentation (Social Media Visualisation Hacks) that, typically, bewildered the audience with a blizzard of things that are possible when it comes to looking at social networks but that are still alien to most:

As ever, the presentation is not complete (i.e. the slides really need to be complemented by a commentary), but that’s something I hope to start working on improving – maybe starting this week…

The deck is a review – of sorts – of some of the various ways we can look at social networks and the activity that takes place within them. The slides are prompts, keys, search phrase suggestions that provide a starting point for finding out more. Many of the slides contain screenshots – and if you peer closely enough, you can often see the URL. For posts on my blog, searching with the word ouseful followed by key terms from the post title will often turn up the result on major search engines. Many of the slides identify a “hack” that is described in pseudo-tutorial form on the this blog, or on Martin Hawksey’s MASHe blog.

I put together a delicious stack of links relating to the presentation here: #drg12 – Visualising Social Networks (Tutorial Posts)

For a tutorial stack that focusses more on Yahoo Pipes (though who knows how long that is still for this world given the perilous nature of Yahoo at the moment), see: Twitter Pipes

My #change11 week was supposed to be about new info skills, with a practical emphasis. A couple of other presentations relating to how we might appropriate (a-pro-pre-eight) tools and applications can be found here: Appropriate IT – My ILI2011 Presentation and Just Do IT Yourself… MY UKSG Presentation.

If all you do in Google is 2.3 keyword searches, this deck – Making the Most of Google – (though possibly a little dated by now), may give you some new ideas.

For a more formal take on infoskills for the new age, (though I need to write a critique of this from my own left-field position), see the Cambridge University Library/Arcadia Project “New Curriculum for Information Literacy (ANCIL)” project via the Arcadia project.

If you want to do some formal reading in the visualisation space, check out 7 Classic Foundational Vis Papers You Might not Want to Publicly Confess you Don’t Know.

Via @cogdog/Alan Levine, I am reminded of Jon Udell’s Seven ways to think like the web. You do think like that, right?!;-)

Mulling over @downes’ half hour post on Education as Platform: The MOOC Experience and what we can do to make it better, I see the MOOC framework as providing an element of co-ordination, pacing and a legacy resource package. For my week, I was expected(?!) to put together some readings and exercises and maybe a webinar. But I haven’t prepared anything (I tried giving a talk at Dev8D earlier this year completely unprepared (though I did have an old presentation I could have used) and it felt to me like a car crash/a disaster, so I know I do need to prep things (even though it may not seem like it if you’ve heard me speak before!;-))

But maybe that’s okay, for one week of this MOOC? The OUSeful.info blog, where you’re maybe reading this, is an ongoing presentation, with a post typically every day or so. WHen I learn something related to the general themes of this blog, I post it here, often as a partial tutorial (partial in the sense that you often have to work through the tutorial for the words to make sense – they complement the things you should be seeing on screen – if you look – as you work through the tutorial; in a sense, the tutorial posts are often second screen complements to and drivers of an activity on another screen).

I’ve personally tried registering with MOOCs a couple of times, but never completed any of the activities. Some of the MOOC related readings or activities pass my way through blogs I follow, or tweeted links that pique my interest, and sometimes I try them out. I guess I’m creating my own unstructured uncourse* daily anyway… (*uncourses complement MOOCs, sort of… They’re courses created live in partial response to feedback, but also reflecting the “teacher”‘s learning journey through a topic. Here’s an example that led to a formal OU course: Digital Worlds uncourse blog experiment. The philosophy is based on the “teacher” modelling – and documenting – a learning journey. Uncourses fully expect the “teacher” not being totally knowledgeable about the subject area, but being happy to demonstrate how they go about making sense of a topic that may well be new to them).

So… this is my #change11 offering. It’s not part of the “formal” course, (how weird does that sound?!) unless it is… As the MOOC is now in week 29, if its principles have been taken on-board, you should be starting to figure out your own distributed co-ordination mechanisms by now. Because what else will you do when the course is over? Or will it be a course that never ends, yet ceases to have a central co-ordination point?

PS if you want to chat, this blog is open to comments; you can also find me on Twitter as @psychemedia

PPS seems like I’ve had at least one critical response (via trackbacks) towards my lacksadaisical “contribution” towards my “teaching” week on the #Change11 MOOC. True. Sorry. But not. I should have kept it simple, posted my motto – identify a problem, then hack a solution to it, every day – and left it at that… It’s how I learn about this stuff… (and any teaching I receive tends to be indirect – by virtue stuff other folk have published that I’ve discovered through web search, (aka search queries – questins – that I’ve had to formulate to help me answer the problem I have identified/created…).

Tinkering With Scraperwiki – The Bottom Line, OpenCorporates Reconciliation and the Google Viz API

Having got to grips with adding a basic sortable table view to a Scraperwiki view using the Google Chart Tools (Exporting and Displaying Scraperwiki Datasets Using the Google Visualisation API), I thought I’d have a look at wiring in an interactive dashboard control.

You can see the result at BBC Bottom Line programme explorer:

The page loads in the contents of a source Scraperwiki database (so only good for smallish datasets in this version) and pops them into a table. The searchbox is bound to the Synopsis column and and allows you to search for terms or phrases within the Synopsis cells, returning rows for which there is a hit.

Here’s the function that I used to set up the table and search control, bind them together and render them:

    google.load('visualization', '1.1', {packages:['controls']});

    google.setOnLoadCallback(drawTable);

    function drawTable() {

      var json_data = new google.visualization.DataTable(%(json)s, 0.6);

    var json_table = new google.visualization.ChartWrapper({'chartType': 'Table','containerId':'table_div_json','options': {allowHtml: true}});
    //i expected this limit on the view to work?
    //json_table.setColumns([0,1,2,3,4,5,6,7])

    var formatter = new google.visualization.PatternFormat('<a href="http://www.bbc.co.uk/programmes/{0}">{0}</a>');
    formatter.format(json_data, [1]); // Apply formatter and set the formatted value of the first column.

    formatter = new google.visualization.PatternFormat('<a href="{1}">{0}</a>');
    formatter.format(json_data, [7,8]);

    var stringFilter = new google.visualization.ControlWrapper({
      'controlType': 'StringFilter',
      'containerId': 'control1',
      'options': {
        'filterColumnLabel': 'Synopsis',
        'matchType': 'any'
      }
    });

  var dashboard = new google.visualization.Dashboard(document.getElementById('dashboard')).bind(stringFilter, json_table).draw(json_data);

    }

The formatter is used to linkify the two URLs. However, I couldn’t get the table to hide the final column (the OpenCorporates URI) in the displayed table? (Doing something wrong, somewhere…) You can find the full code for the Scraperwiki view here.

Now you may (or may not) be wondering where the OpenCorporates ID came from. The data used to populate the table is scraped from the JSON version of the BBC programme pages for the OU co-produced business programme The Bottom Line (Bottom Line scraper). (I’ve been pondering for sometime whether there is enough content there to try to build something that might usefully support or help promote OUBS/OU business courses or link across to free OU business courses on OpenLearn…) Supplementary content items for each programme identify the name of each contributor and the company they represent in a conventional way. (Their role is also described in what looks to be a conventionally constructed text string, though I didn’t try to extract this explicitly – yet. (I’m guessing the Reuters OpenCalais API would also make light work of that?))

Having got access to the company name, I thought it might be interesting to try to get a corporate identifier back for each one using the OpenCorporates (Google Refine) Reconciliation API (Google Refine reconciliation service documentation).

Here’s a fragment from the scraper showing how to lookup a company name using the OpenCorporates reconciliation API and get the data back:

ocrecURL='http://opencorporates.com/reconcile?query='+urllib.quote_plus("".join(i for i in record['company'] if ord(i)<128))
    try:
        recData=simplejson.load(urllib.urlopen(ocrecURL))
    except:
        recData={'result':[]}
    print ocrecURL,[recData]
    if len(recData['result'])>0:
        if recData['result'][0]['score']>=0.7:
            record['ocData']=recData['result'][0]
            record['ocID']=recData['result'][0]['uri']
            record['ocName']=recData['result'][0]['name']

The ocrecURL is constructed from the company name, sanitised in a hack fashion. If we get any results back, we check the (relevance) score of the first one. (The results seem to be ordered in descending score order. I didn’t check to see whether this was defined or by convention.) If it seems relevant, we go with it. From a quick skim of company reconciliations, I noticed at least one false positive – Reed – but on the whole it seemed to work fairly well. (If we look up more details about the company from OpenCorporates, and get back the company URL, for example, we might be able to compare the domain with the domain given in the link on the Bottom Line page. A match would suggest quite strongly that we have got the right company…)

As @stuartbrown suggeted in a tweet, a possible next step is to link the name of each guest to a Linked Data identifier for them, for example, using DBPedia (although I wonder – is @opencorporates also minting IDs for company directors?). I also need to find some way of pulling out some proper, detailed subject tags for each episode that could be used to populate a drop down list filter control…

PS for more Google Dashboard controls, check out the Google interactive playground…

PPS see also: OpenLearn Glossary Search and OpenLearn LEarning Outcomes Search

Scraperwiki Powered OpenLearn Searches – Learning Outcomes and Glossary Items

A quick follow up to Tinkering With Scraperwiki – The Bottom Line, OpenCorporates Reconciliation and the Google Viz API demonstrating how to reuse that pattern (a little more tinkering is required to fully generalise it, but that’ll probably have to wait until after the Easter wifi-free family tour… I also need to do a demo of a pure HTML/JS version of the approach).

In particular, a search over OpenLearn learning outcomes:

and a search over OpenLearn glossary items:

Both are powered by tables from my OpenLearn XML Processor scraperwiki.

The Learning Journey Starts Here: Youtube.edu and OpenLearn Resource Linkage

Mulling over the OU’s OULearn pages on Youtube a week or two ago, colleague Bernie Clark pointed out to me how the links from the OU clip descriptions could be rather hit or miss:

Via @lauradee, I see that the OU has a new offering on YouTube.com/edu is far more supportive of links to related content, links that can represent the start of a learning journey through OU educational – and commentary – content on the OU website.

Here’s a way in to the first bit of OU content that seems to have appeared:

This links through to a playlist page with a couple of different sorts of opportunity for linking to resources collated at the “Course materials” or “Lecture materials” level:

(The language gives something away, I think, about the expectation of what sort of content is likely to be uploaded here…)

So here, for example, are links at the level of the course/playlist:

And here are links associated with each lecture, erm, clip:

In this first example, several types of content are being linked to, although from the link itself it’s not immediately obvious what sort of resource a link points to? For example, some of the links lead through to course units on OpenLearn/Learning Zone:

Others link through to “articles” posted on the OpenLearn “news” site (I’m not ever really sure how to refer to that site, or the content posts that appear on it?)

The placing of content links into the Assignments and Others tabs always seems a little arbitrary to me from this single example, but I suspect that when a few more lists have been posted some sort of feeling about what sorts of resources should go where (i.e. what folk might expect by “Assignment” or “Other” resource links). If there’s enough traffic generated through these links, a bit of A/B testing might even be in order relating to the positioning of links within tabs and the behaviour of students once they click through (assuming you can track which link they clicked through, of course…)?

The transcript link is unambiguous though! And, in this case at least), resolves to a PDF hosted somewhere on the OU podcasts/media filestore:

(I’m not sure if caption files are also available?)

Anyway – it’ll be interesting to hear back about whether this enriched linking experience drives more traffic to the OpenLearn resources, as well as whether the positioning of links in the different tab areas has any effect on engagement with materials following a click…

And as far as the linkage itself goes, I’m wondering: how are the links to OpenLearn course units and articles generated/identified, and are those links captured in one of the data.open.ac.uk stores? Or is the process that manages what resource links get associated with lists and list items on Youtube/edu one that doesn’t leave (or readily support the automated creation of) public data traces?

PS How much (if any( of the linked resource goodness is grabbable via the Youtube API, I wonder? If anyone finds out before me, please post details in the comments below:-)

PEERing at Education…

I just had a “doh!” moment in the context of OERs – Open Educational Resources, typically so called because they are Resources produced by an Educator under an Open content license (which to all intents and purposes is a copyright waiver). One of the things that appeals to me about OERs is that there is no reason for them not to be publicly discoverable which makes them the ideal focus for PEER – Public Engagement with Educational Resources. Which is what the OU traditionally offered through 6am TV broadcasts of not-quite-lectures…

Or how about this one?

And which the OU is now doing through iTunesU and several Youtube Channels, such as OU Learn:


(Also check out some of the other OU playlists…or OU/BBC co-pros currently on iPlayer;-)

PS It also seems to me that users tend not to get too hung up about how things are licensed, particularly educational ones, because education is about public benefit and putting constraints on education is just plain stoopid. Discovery is nine tenths of law, as it were. The important thing about having something licensed as an OER is that no-one can stop you from sharing it… (which even if you’re the creator of a resource, you may not b able to do; academics, for example, often hand over the copyright of their teaching materials to their employer, and their employer’s copyright over their research output (similarly transferred as a condition of employment) to commercial publishers who then sell the content back to their employers.

MOOC Reflections

A trackback a week or two ago to my blog from this personal blog post: #SNAc week 1: what are networks and what use is it to study them? highlighted me to a MOOC currently running on Coursera on social network analysis. The link was contextualised in the post as follows: The recommended readings look interesting, but it’s the curse of the netbook again – there’s no way I’m going to read a 20 page PDF on a screen. Some highlighted resources from Twitter and the forum look a bit more possible: … Some nice ‘how to’ posts: … (my linked to post was in the ‘howto’ section).

The whole MOOC hype thing at the moment seems to be dominated by references to the things like Coursera, Udacity and edX (“xMOOCs”). Coursera in particularly is a new sort of intermediary, a website that offers some sort of applied marketing platform to universities, allowing them to publish sample courses in a centralised, browsable, location and in a strange sense legitimising them. I suspect there is some element of Emperor’s New Clothes thinking going on in the universities who have opted in and those who may be considering it: “is this for real?”; “can we afford not to be a part of it?”

Whilst Coursera has an obvious possible business model – charge the universities for hosting their marketing material courses – Udacity’s model appears more pragmatic: provide courses with the option of formal assessment via Pearson VUE assessment centres, and then advertise your achievements to employers on the Udacity site; presumably, the potential employers and recruiters (which got me thinking about what role LinkedIn might possibly play in this space?) are seen as the initial revenue stream for Udacity. Note that Udacity’s “credit” awarding powers are informal – in the first instance, credibility is based on the reputation of the academics who put together the course; in contrast, for courses on Coursera, and the rival edX partnership (which also offers assessment through Pearson VUE assessment centres), credibility comes from the institution that is responsible for putting together the course. (It’s not hard to imagine a model where institutions might even badge courses that someone else has put together…)

Note that Coursera, Udacity and edX are all making an offering based on quite a traditional course model idea and are born out of particular subject disciplines. Contrast this in the first part with something like Khan Academy, which is providing learning opportunities at a finer level of granularity/much smaller “learning chunks” in the form of short video tutorials. Khan Academy also provides the opportunity for Q&A based discussion around each video resource.

Also by way of contrast are the “cMOOC” style offerings inspired by the likes of George Siemens, Stephen Downes, et al., where a looser curriculum based around a set of topics and initially suggested resources is used to bootstrap a set of loosely co-ordinated personal learning journeys: learners are encouraged to discover, share and create resources and feed them into the course network in a far more organic way than the didactic, rigidly structured approach taken by the xMOOC platforms. The cMOOC style also offeres the possibility of breaking down subject disciplines through accepting shared resources contributed because they are relevant to the topic being explored, rather than because they are part of the canon for a particular discipline.

The course without boundaries approach of Jim Groom’s ds106, as recently aided and abetted by Alan Levine, also softens the edges of a traditionally offered course with its problem based syllabus and open assignment bank (particpants are encouraged to submit their own assignment ideas) and turns learning into something of a lifestyle choice… (Disclaimer: regular readers will know that I count the cMOOC/ds106 “renegades” as key forces in developing my own thinking…;-)

Something worth considering about the evolution of open education from early open content/open educational resource (OER) repositories and courseware into the “Massive Open Online Course” thing is just what caused the recent upsurge in interest? Both MIT opencourseware and the OU’s OpenLearn offerings provided “anytime start”, self-directed course units; but my recollection is that it was Thrun & Norvig’s first open course on AI (before Thrun launched Udacity), that captured the popular (i.e. media) imagination because of the huge number of students that enrolled. Rather than the ‘on-demand’ offering of OpenLearn, it seems that the broadcast model, and linear course schedule, along with the cachet of the instructors, were what appealed to a large population of demonstrably self-directed learners (i.e. geeks and programmers, who spend their time learning how to weave machines from ideas).

I also wonder whether the engagement of universities with intermediary online course delivery platforms will legitimise online courses run by other organisations; for example, the Knight Centre Massive Open Online Courses portal (a Moodle environment) is currently advertising it’s first MOOC on infographics and data visualisation:

Similar to other Knight Center online courses, this MOOC is divided into weekly modules. But unlike regular offerings, there will be no application or selection process. Anyone can sign up online and, once registered, participants will receive instructions on how to enroll in the course. Enrollees will have immediate access to the syllabus and introductory information.

The course will include video lectures, tutorials, readings, exercises and quizzes. Forums will be available for discussion topics related to each module. Because of the “massive” aspect of the course, participants will be encouraged to provide feedback on classmates’ exercises while the instructor will provide general responses based on chosen exercises from a student or group of students.

Cairo will focus on how to work with graphics to communicate and analyze data. Previous experience in information graphics and visualization is not needed to take this course. With the readings, video lectures and tutorials available, participants will acquire enough skills to start producing compelling, simple infographics almost immediately. Participants can expect to spend 4-6 hours per week on the course.

Although the course will be free, if participants need to receive a certificate, there will be a $20 administrative fee, paid online via credit card, for those who meet the certificate requirements. The certificate will be issued only to students who actively participated in the course and who complied with most of the course requirements, such as quizzes and exercises. The certificates will be sent via email as a PDF document. No formal course credit of any kind is associated with the certificate.

Another of the things that I’ve been pondering is the role that “content” may or not play a role in this open course thing. Certainly, where participants are encouraged to discover and share resources, or where instructors seek to construct courses around “found resources”, an approach espoused by the OU’s new postgraduate strategy, it seems to me that there is an opportunity to contribute to the wider open learning idea by producing resources that can be “found”. For resources to be available as found resources, we need the following:

  1. Somebody needs to have already created them…
  2. They need to be discoverable by whoever is doing the finding
  3. They need to be appropriately licensed (if we have to go through a painful rights clearnance and rights payment model, the cost benefits of drawing on and freely reusing those resources are severely curtailed).

Whilst the running of a one shot MOOC may attract however many participants, the production of finer grained (and branded) resources that can be used within those courses means that a provider can repeatedly, and effortlessly, contribute to other peoples courses through course participants pulling the resources into those coure contexts. (It also strikes me that educators in one institution could sign up for a course offered by another, and then drop in links to their own applied marketing learning materials.)

One thing I’ve realised from looking at Digital Worlds uncourse blog stats is that some of the posts attract consistent levels of traffic, possibly because they have been embedded to from other course syllabuses. I also occasionally see flurries of downloads of tutorial files, which makes me wonder whether another course has linked to resources I originally produced. If we think of the web in it’s dynamic and static modes (static being the background links that are part of the long term fabric of the web, dynamic as the conversation and link sharing that goes on in social networks, as well as the publication of “alerts” about new fabric (for example, the publication of a new blog post into the static fabric of the web is announced through RSS feeds and social sharing as part of the dynamic conversation)), then the MOOCs appear to be trying to run in a dynamic, broadcast mode. Whereas what interests me is how we can contribute to the static structure of the web, and how we can make better use of it in a learning context?

PS a final thought – running scheduled MOOCs is like a primetime broadcast; anytime independent start is like on-demand video. Or how about this: MOOCs are like blockbuster books, published to great fanfare and selling millions of first day, pre-ordered copies. But there’s also long tail over time consumption of the same books… and maybe also books that sell steadily over time without great fanfare. Running a course once is all well and good; but it feels too ephemeral, and too linear rather than networked thinking to me?