OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for the ‘Open Education’ Category

Losing Experimental Edtech Value from IPython Notebooks Because of New Security Policies?

leave a comment »

Just like the way VLEs locked down what those who wanted to try to stuff out could do with educational websites, usually on the grounds of “security”, so a chunk of lightweight functionality with possible educational value that I was about to start to exploring inside IPython notebooks has been locked out by the new IPython notebook security policy:

Affected use cases
Some use cases that work in IPython 1.0 will become less convenient in 2.0 as a result of the security changes. We do our best to minimize these annoyance, but security is always at odds with convenience.

Javascript and CSS in Markdown cells
While never officially supported, it had become common practice to put hidden Javascript or CSS styling in Markdown cells, so that they would not be visible on the page. Since Markdown cells are now sanitized (by Google Caja), all Javascript (including click event handlers, etc.) and CSS will be stripped.

Here’s what I’ve been exploring – using a simple button:

ipynb button

to reveal an answer:

ipynb button reveal

It’s a 101 interaction style in “e-learning” (do we still call it that?!) and one that I was hoping to explore more given the interactive richness of the IPython notebook environment.

Here’s how I implemented it – a tiny bit of Javascript hidden in one of the markdown cells:

<script type="text/javascript">
   function showHide(id) {
       var e = document.getElementById(id);
       if(e.style.display == 'block')
          e.style.display = 'none';
       else
          e.style.display = 'block';
   }
</script>

and then a quick call from a button onclick event handler to reveal the answer block:

<input type="button" value="Answer" onclick="showHide('ans2')">

<div id="ans2" style="display:none">I can see several ways of generating common identifiers:

<ul><li>using the **gss** code from the area data, I could generate identifiers of the form `http://http://statistics.data.gov.uk/id/statistical-geography/GSS`</li>
<li>from the housing start data, I could split the *Reference Area* on space characters and then extract the GSS code from the first item in the split list</li>
<li>The *districtname* in the area data looks like it make have "issues" with spacing in area names. If we remove spaces and turn everything to lower case in the area data *districtname* and the *Reference Area* in the housing data, we *may* be able create matching keys. But it could be a risky strategy...</li>
</ul></div>

This won’t work anymore – and I don’t have the time to learn whether custom CSS can do this, and if so, how.

I don’t really want to have to go back to the approach I tried before I demoed the button triggered reveal example to myself…

ipynb another interaction

That is, putting answers into a python library and then using code to pull the text answer in…

ipynb color styling

Note also the use of colour in the cells – this is something else I wanted to try to explore, the use of styling to prompt interactions; in the case of IPython notebooks, I quite like the idea of students taking ownership of the notebooks and adding content to it, whether by adding commentary text to cells we have written in, adding their own comment cells (perhaps using a different style – so a different cell type?), amending code stubs we have written, adding in their own code, perhaps as code complements to comment prompts we have provided, etc etc.

ipynb starting to think about different interactions...

The quick hack, try and see option that immediately came to mind to support these sorts of interaction seems to have been locked out (or maybe not – rather than spending half an hour on a quick hack I’ll have to spend have an hour reading docs…). This is exactly the sort of thing that cuts down on our ability to mix ideas and solutions picked up from wherever, and just try them out quickly; and whilst I can see the rationale, it’s just another of those things to add to the when the web was more open pile. (I was going to spend half an hour blogging a post to let other members of the course team I’m on know how to add revealed answers to their notebooks, but as I’ve just spent 18 hours trying to build a VM box that supports python3 and the latest IPythion notebook, I’m a bit fed up at the thought of having to stick with the earlier version py’n’notebook VM I built because it’s easier for us to experiment with…)

I have to admit that some of the new notebook features look like they could be interesting from a teaching point of view in certain subject areas – the ability to publish interactive widgets where the controls talk to parameters accessed via the notebook code cells, but that wasn’t on my to do list for the next week…

What I was planning to do was explore what we’d need to do to get elements of the notebook behaving like elements in OU course materials, under the assumption that our online materials have designs that go hand in hand with good pedagogy. (This is a post in part about OU stuff, so necessarily it contains the p-word.)

ou teaching styling

Something else on the to do list was to explore how to tweak the branding of the notebook, for example to add in an OU logo or (for my other day per week), a School of Data logo. (I need to check the code openness status of IPython notebooks… How bad form would it be to remove the IPy logo for example? And where should a corporate log go? In the toolbar, or at the top of the content part of the notebook? If you just contribute content, I guess the latter; if you add notebook functionality, maybe the topbar is okay?)

There are a few examples of styling notebooks out there, but I wonder – will those recipes still work?

Ho hum – this post probably comes across as negative about IPython notebooks, but it shouldn’t because they’re a wonderful environment (for example, Doodling With IPython Notebooks for Education and Time to Drop Calculators in Favour of Notebook Programming?). I’m just a bit fed up that after a couple of days graft I don’t get to have half and hour’s fun messing around with look and feel. Instead, I need to hit the docs to find out what’s possible and what isn’t because the notebooks are no longer an open environment as they were… Bah..:-(

Written by Tony Hirst

April 11, 2014 at 6:10 pm

Posted in Open Education, OU2.0, Tinkering

Tagged with ,

MOOC Reflections

A trackback a week or two ago to my blog from this personal blog post: #SNAc week 1: what are networks and what use is it to study them? highlighted me to a MOOC currently running on Coursera on social network analysis. The link was contextualised in the post as follows: The recommended readings look interesting, but it’s the curse of the netbook again – there’s no way I’m going to read a 20 page PDF on a screen. Some highlighted resources from Twitter and the forum look a bit more possible: … Some nice ‘how to’ posts: … (my linked to post was in the ‘howto’ section).

The whole MOOC hype thing at the moment seems to be dominated by references to the things like Coursera, Udacity and edX (“xMOOCs”). Coursera in particularly is a new sort of intermediary, a website that offers some sort of applied marketing platform to universities, allowing them to publish sample courses in a centralised, browsable, location and in a strange sense legitimising them. I suspect there is some element of Emperor’s New Clothes thinking going on in the universities who have opted in and those who may be considering it: “is this for real?”; “can we afford not to be a part of it?”

Whilst Coursera has an obvious possible business model – charge the universities for hosting their marketing material courses – Udacity’s model appears more pragmatic: provide courses with the option of formal assessment via Pearson VUE assessment centres, and then advertise your achievements to employers on the Udacity site; presumably, the potential employers and recruiters (which got me thinking about what role LinkedIn might possibly play in this space?) are seen as the initial revenue stream for Udacity. Note that Udacity’s “credit” awarding powers are informal – in the first instance, credibility is based on the reputation of the academics who put together the course; in contrast, for courses on Coursera, and the rival edX partnership (which also offers assessment through Pearson VUE assessment centres), credibility comes from the institution that is responsible for putting together the course. (It’s not hard to imagine a model where institutions might even badge courses that someone else has put together…)

Note that Coursera, Udacity and edX are all making an offering based on quite a traditional course model idea and are born out of particular subject disciplines. Contrast this in the first part with something like Khan Academy, which is providing learning opportunities at a finer level of granularity/much smaller “learning chunks” in the form of short video tutorials. Khan Academy also provides the opportunity for Q&A based discussion around each video resource.

Also by way of contrast are the “cMOOC” style offerings inspired by the likes of George Siemens, Stephen Downes, et al., where a looser curriculum based around a set of topics and initially suggested resources is used to bootstrap a set of loosely co-ordinated personal learning journeys: learners are encouraged to discover, share and create resources and feed them into the course network in a far more organic way than the didactic, rigidly structured approach taken by the xMOOC platforms. The cMOOC style also offeres the possibility of breaking down subject disciplines through accepting shared resources contributed because they are relevant to the topic being explored, rather than because they are part of the canon for a particular discipline.

The course without boundaries approach of Jim Groom’s ds106, as recently aided and abetted by Alan Levine, also softens the edges of a traditionally offered course with its problem based syllabus and open assignment bank (particpants are encouraged to submit their own assignment ideas) and turns learning into something of a lifestyle choice… (Disclaimer: regular readers will know that I count the cMOOC/ds106 “renegades” as key forces in developing my own thinking…;-)

Something worth considering about the evolution of open education from early open content/open educational resource (OER) repositories and courseware into the “Massive Open Online Course” thing is just what caused the recent upsurge in interest? Both MIT opencourseware and the OU’s OpenLearn offerings provided “anytime start”, self-directed course units; but my recollection is that it was Thrun & Norvig’s first open course on AI (before Thrun launched Udacity), that captured the popular (i.e. media) imagination because of the huge number of students that enrolled. Rather than the ‘on-demand’ offering of OpenLearn, it seems that the broadcast model, and linear course schedule, along with the cachet of the instructors, were what appealed to a large population of demonstrably self-directed learners (i.e. geeks and programmers, who spend their time learning how to weave machines from ideas).

I also wonder whether the engagement of universities with intermediary online course delivery platforms will legitimise online courses run by other organisations; for example, the Knight Centre Massive Open Online Courses portal (a Moodle environment) is currently advertising it’s first MOOC on infographics and data visualisation:

Similar to other Knight Center online courses, this MOOC is divided into weekly modules. But unlike regular offerings, there will be no application or selection process. Anyone can sign up online and, once registered, participants will receive instructions on how to enroll in the course. Enrollees will have immediate access to the syllabus and introductory information.

The course will include video lectures, tutorials, readings, exercises and quizzes. Forums will be available for discussion topics related to each module. Because of the “massive” aspect of the course, participants will be encouraged to provide feedback on classmates’ exercises while the instructor will provide general responses based on chosen exercises from a student or group of students.

Cairo will focus on how to work with graphics to communicate and analyze data. Previous experience in information graphics and visualization is not needed to take this course. With the readings, video lectures and tutorials available, participants will acquire enough skills to start producing compelling, simple infographics almost immediately. Participants can expect to spend 4-6 hours per week on the course.

Although the course will be free, if participants need to receive a certificate, there will be a $20 administrative fee, paid online via credit card, for those who meet the certificate requirements. The certificate will be issued only to students who actively participated in the course and who complied with most of the course requirements, such as quizzes and exercises. The certificates will be sent via email as a PDF document. No formal course credit of any kind is associated with the certificate.

Another of the things that I’ve been pondering is the role that “content” may or not play a role in this open course thing. Certainly, where participants are encouraged to discover and share resources, or where instructors seek to construct courses around “found resources”, an approach espoused by the OU’s new postgraduate strategy, it seems to me that there is an opportunity to contribute to the wider open learning idea by producing resources that can be “found”. For resources to be available as found resources, we need the following:

  1. Somebody needs to have already created them…
  2. They need to be discoverable by whoever is doing the finding
  3. They need to be appropriately licensed (if we have to go through a painful rights clearnance and rights payment model, the cost benefits of drawing on and freely reusing those resources are severely curtailed).

Whilst the running of a one shot MOOC may attract however many participants, the production of finer grained (and branded) resources that can be used within those courses means that a provider can repeatedly, and effortlessly, contribute to other peoples courses through course participants pulling the resources into those coure contexts. (It also strikes me that educators in one institution could sign up for a course offered by another, and then drop in links to their own applied marketing learning materials.)

One thing I’ve realised from looking at Digital Worlds uncourse blog stats is that some of the posts attract consistent levels of traffic, possibly because they have been embedded to from other course syllabuses. I also occasionally see flurries of downloads of tutorial files, which makes me wonder whether another course has linked to resources I originally produced. If we think of the web in it’s dynamic and static modes (static being the background links that are part of the long term fabric of the web, dynamic as the conversation and link sharing that goes on in social networks, as well as the publication of “alerts” about new fabric (for example, the publication of a new blog post into the static fabric of the web is announced through RSS feeds and social sharing as part of the dynamic conversation)), then the MOOCs appear to be trying to run in a dynamic, broadcast mode. Whereas what interests me is how we can contribute to the static structure of the web, and how we can make better use of it in a learning context?

PS a final thought – running scheduled MOOCs is like a primetime broadcast; anytime independent start is like on-demand video. Or how about this: MOOCs are like blockbuster books, published to great fanfare and selling millions of first day, pre-ordered copies. But there’s also long tail over time consumption of the same books… and maybe also books that sell steadily over time without great fanfare. Running a course once is all well and good; but it feels too ephemeral, and too linear rather than networked thinking to me?

Written by Tony Hirst

October 9, 2012 at 9:04 am

Therapy Time: Networked Personal Learning and a Reflection on the Urban Peasant…

Way back when I was a postgrad, I used to spend a coffee fuelled morning reading in bed, and then get up to eat a cooked breakfast whilst watching the Urban Peasant:

My abiding memory, in part confirmed by several of the asides in the above clip (can you guess which?!), was that of “agile cooking” and flexible recipes. A chicken curry (pork’s fine too, or beef, even fish if you like; or potato if you want a vegetarian version) could be served with rice (or bread, or a baked potato); if you didn’t like curry, you could leave out the spices or curry powder, and just use a stock cube. If a recipe called for chopped vegetables, you could also grate them or slice them or dice them or…”it’s your decision”. Potato and peas could equally well be carrot or parsnip and beans. If you needed to add water to a recipe, you could add wine, or beer, or fruit juice or whatever instead; if you wanted to have scrambled egg on toast, you could also fry it, or poach it, or boil it. And the toast could be a crumpet or a muffin or just use “whatever you’ve got”.

The ethos was very much one of: start with an idea, and/or see what you’ve got, and then work with it – a real hacker ethic. It also encouraged you to try alternative ideas out, to be adaptive. And I’m pretty sure mistakes happened too – but that was fine…

When I play with data, I often have a goal in mind (albeit a loose one), used to provide a focus for exploring a data set I want to explore a little (typically using Schneiderman’s “Overview first, zoom and filter, then details-on-demand” approach), to see what potential it might hold, or to act a testbed for a tool or technique I want to try out. The problem then becomes one of coming up with some sort of recipe that works with the data and tools I have to hand, as well as the techniques and processes I’ve used before. Sometimes, a recipe I’m working on requires me to get another ingredient out of the fridge, or another utensil out of the cupboard. Sometimes I use a tea towel as an oven glove, or a fork as a knife. Sometimes I taste the food-in-process to know when it’s done, sometimes I go by the colour, texture, consistency, shape, smell or clouds of smoke that have started to appear.

Because I haven’t had any formal training in any of this “stuff”, using “approved” academic sources (I’ve recently been living by R-Bloggers (which is populated by quite a few academics) and Stack Overflow, for example), I suffer from a lack of confidence in talking about it in an academic way (see for example For My One Thousandth Blogpost: The Un-Academic), and a similar lack of confidence in feeling that I could ever charge anybody a fee for telling them what I (think I) know (leave aside for the moment that I effectively charge the OU my salary, benefits and on-costs… hmmm?!). I used to do the academic thing way back when as a postgrad and early postdoc, but fell out of the habit over the last few years because there seemed to me to be a huge amount of investment of time required for very little impact or consequence of what I was doing. Yes, it’s important for things be “right”, but I’m not sure my maths is up to generating formal proofs of new algorithms. I may be able to do the engineering or technologist thing of getting something working, -ish, good enough “for now”, research-style coding, but it’s always mindful of an engineering style trade-off: that it might not be “right” and is just something I figured out that seems to work, but that it’ll do because it lets me get something done… As Artur Bergman puts it using rather colourful language – “yes, correlation isn’t causation, but…”


(This clip was originally brought to mind by a recent commentary from Stephen Downes on The Internet Blowhard’s Favorite Phrase, and the original post it refers to.)

Also mixed up in the notion of “right” is seeing things as “right” if they are formally recognised or accepted as such, which is where assessment and peer review come in: you let other people you trust make an assessment about whatever it is you do/have done, publicly recognising your achievements which in turn allows you to make a justifiable claim to them. (I am reminded here of the definition of knowledge as justified true belief. That word “justified” is interesting, isn’t it…?)

As well as resisting getting in the whole grant bidding cycle for turnover generating, public money cycling projects that are set up to fail, I’ve also recently started to fall out of OU-style formal teaching roles… again, in part because of the long lead times involved with producing course materials and my preference for network based, rather than teamwork based, working style. (I so need to revisit formal models of teamwork and try to come up with a corresponding formulation for networks rather than teams…Or do a lit review to find one that’s already out there…!) I tend to write in 1 hour chunks based on 3-4 hours work, then post whatever it is I’ve done. One reason for doing this is becuase I figure most people read or do things in 5 to 15 minutes or one to two hour chunks and that in a network-centric, distributed online online educational setting small chunks are more likely to be discoverable and immediately useful (directly and immediately learnable from) chunks. There’s no shame in using a well crafted Wikipedia as a starting point for discovering more detailed – and academic – resources: at least you stand a good chance of finding the Wikipedia page! In the same way, I try to link out out to supporting resources from most of my posts so that readers (including myself as a revisitor to these pages in that set) have some additional context, supporting or conflicting material to help get more value from it. (Related: Why I (Edu)Blog.)

Thinking about my own personal lack of confidence, which in part arises from the way I have informally learned whatever it is that I have actually learned over the last few years and not had it formally validated by anybody else, my interest in espousing an informal style networked learning on others is an odd one… Because based on my own experience, it doesn’t give me the feeling that what I know is valid (justified..?), or even necessarily trustable by anybody other than me (because I know how it’s caveated because of what I have personally learned about it, rather than just being told about it), even if it is pragmatic and at least occasionally appears to be useful. (Hmm… I don’t think an OU editor would let me get away with a sentence like that in a piece of OU course material!) Maybe I need to start keeping a second, formalised reflective learning journal as the OU Skills for OU Study suggests to log what I learn, and provide some sort of indexable and searchable metadata around it? In fact, this approach might be a useful approach if I do another uncourse? (It also brings to mind the word equation: Learning material + metadata = learning object (it was something like that, wasn’t it?!))

To the extent that this blog is an act of informal, open teaching, I think it offers three main things: a) “knowledge transferring” discoverable resources on a variety of specialised topics; b) fragmentary records of created knowledge (I *think* I’ve managed to make up odd bits of new stuff over the last few years…); c) a model of some sort of online distributed network centric learning behaviour (see also the Digital Worlds Uncourse Blog Experiment in this respect).

I guess one of the things I do get to validate against is the real world. When I used to go into schools doing robotics activities*, kids would ask me if their robot or programme was “right”. In many cases, there wasn’t really a notion of “right”, it was more a case of:

  • were there things that were obviously wrong?
  • did the thing work as anticipated (or indeed, did any elements of it work at all?!;-)?
  • were there any bits that could be improved, adapted or done in another more elegant way?

So it is with some of my visualisation “experiments” – are they not wrong (is the data right, is there a sensible relationship between the data and the visual mappings)? do they “work” at all (eg in the sense of communicating a particular trend, or revealing a particular anomaly)? could they be improved? By running the robot program, or trying to read the story a data visualisation appears to be telling us, we can get a sense of how “right” it is; but there is often no single “right” for it to be. Which is where doubt can crop in… Because if something is “not right”, then maybe it’s “wrong”…?

In the world of distributed, networked learning, I think one thing we need to work on is developing an appropriate sense of validation and legitimisation of personal learning. Things like badges are weak extrinsic signs that some would claim have a role in this, but I wonder how networks and communities can be shaped and architected, or how their dynamics might work, so that learners develop not only a well-founded intrinsic confidence about what they have self-learned, but also a feeling that what they have self-learned is as legitimate as something they have been formally taught? (I know, I know: “I was at the University of Life, me”… As I am, now… which reminds me, I’ve a Coursera video and Feynman lecture on Youtube to watch, and a couple of code mentor answers to questions I’ve raised on various Stack Exchange sites to read through; and I probably should check to see if there are any questions recently posted to Stack Overflow that I may be able to answer and use to link out to other, more academic “open educational” resources…)

[Rereading this post, I think I am suffering from a lack of formality and the sense of justification that comes with it. Hmmm...]

* This is something I’ve recently been asked to do again for an MK local primary school in the new year; the contact queried how much I might charge and whilst in the past I would have said “no need”, for some reason this time I felt obliged to seek advice about from the Deanery about whether I should charge, and if so how much. This a huge personal cultural shift away from my traditional “of course/pro bono” attitude, and it felt wrong, somehow. To the extent that universities are public bodies, they should work with other public services in their local and extended communities. But of course, I get the sense we’re not really being encouraged to think of ourselves as public bodies very much any more, we’re commercial services… And that feeling affects the personal responsibility I feel when acting for and on behalf of the university. As it turns out, the Deanery seems keen that we participate freely in community events… But I note here that I felt (for the first time) as if I had to check first. So what’s in the air?

See also: Terran Lane’s On Leaving Academia and (via @boyledsweetie) Inspirational teaching: since when did entertainment not matter?

Written by Tony Hirst

October 4, 2012 at 1:32 pm

PEERing at Education…

I just had a “doh!” moment in the context of OERs – Open Educational Resources, typically so called because they are Resources produced by an Educator under an Open content license (which to all intents and purposes is a copyright waiver). One of the things that appeals to me about OERs is that there is no reason for them not to be publicly discoverable which makes them the ideal focus for PEER – Public Engagement with Educational Resources. Which is what the OU traditionally offered through 6am TV broadcasts of not-quite-lectures…

Or how about this one?

And which the OU is now doing through iTunesU and several Youtube Channels, such as OU Learn:


(Also check out some of the other OU playlists…or OU/BBC co-pros currently on iPlayer;-)

PS It also seems to me that users tend not to get too hung up about how things are licensed, particularly educational ones, because education is about public benefit and putting constraints on education is just plain stoopid. Discovery is nine tenths of law, as it were. The important thing about having something licensed as an OER is that no-one can stop you from sharing it… (which even if you’re the creator of a resource, you may not b able to do; academics, for example, often hand over the copyright of their teaching materials to their employer, and their employer’s copyright over their research output (similarly transferred as a condition of employment) to commercial publishers who then sell the content back to their employers.

Written by Tony Hirst

May 16, 2012 at 3:33 pm

Posted in Open Education, OU2.0

Tagged with

The Learning Journey Starts Here: Youtube.edu and OpenLearn Resource Linkage

Mulling over the OU’s OULearn pages on Youtube a week or two ago, colleague Bernie Clark pointed out to me how the links from the OU clip descriptions could be rather hit or miss:

Via @lauradee, I see that the OU has a new offering on YouTube.com/edu is far more supportive of links to related content, links that can represent the start of a learning journey through OU educational – and commentary – content on the OU website.

Here’s a way in to the first bit of OU content that seems to have appeared:

This links through to a playlist page with a couple of different sorts of opportunity for linking to resources collated at the “Course materials” or “Lecture materials” level:

(The language gives something away, I think, about the expectation of what sort of content is likely to be uploaded here…)

So here, for example, are links at the level of the course/playlist:

And here are links associated with each lecture, erm, clip:

In this first example, several types of content are being linked to, although from the link itself it’s not immediately obvious what sort of resource a link points to? For example, some of the links lead through to course units on OpenLearn/Learning Zone:

Others link through to “articles” posted on the OpenLearn “news” site (I’m not ever really sure how to refer to that site, or the content posts that appear on it?)

The placing of content links into the Assignments and Others tabs always seems a little arbitrary to me from this single example, but I suspect that when a few more lists have been posted some sort of feeling about what sorts of resources should go where (i.e. what folk might expect by “Assignment” or “Other” resource links). If there’s enough traffic generated through these links, a bit of A/B testing might even be in order relating to the positioning of links within tabs and the behaviour of students once they click through (assuming you can track which link they clicked through, of course…)?

The transcript link is unambiguous though! And, in this case at least), resolves to a PDF hosted somewhere on the OU podcasts/media filestore:

(I’m not sure if caption files are also available?)

Anyway – it’ll be interesting to hear back about whether this enriched linking experience drives more traffic to the OpenLearn resources, as well as whether the positioning of links in the different tab areas has any effect on engagement with materials following a click…

And as far as the linkage itself goes, I’m wondering: how are the links to OpenLearn course units and articles generated/identified, and are those links captured in one of the data.open.ac.uk stores? Or is the process that manages what resource links get associated with lists and list items on Youtube/edu one that doesn’t leave (or readily support the automated creation of) public data traces?

PS How much (if any( of the linked resource goodness is grabbable via the Youtube API, I wonder? If anyone finds out before me, please post details in the comments below:-)

Written by Tony Hirst

April 27, 2012 at 1:53 pm

Scraperwiki Powered OpenLearn Searches – Learning Outcomes and Glossary Items

A quick follow up to Tinkering With Scraperwiki – The Bottom Line, OpenCorporates Reconciliation and the Google Viz API demonstrating how to reuse that pattern (a little more tinkering is required to fully generalise it, but that’ll probably have to wait until after the Easter wifi-free family tour… I also need to do a demo of a pure HTML/JS version of the approach).

In particular, a search over OpenLearn learning outcomes:

and a search over OpenLearn glossary items:

Both are powered by tables from my OpenLearn XML Processor scraperwiki.

Written by Tony Hirst

April 5, 2012 at 12:02 pm

Tinkering With Scraperwiki – The Bottom Line, OpenCorporates Reconciliation and the Google Viz API

Having got to grips with adding a basic sortable table view to a Scraperwiki view using the Google Chart Tools (Exporting and Displaying Scraperwiki Datasets Using the Google Visualisation API), I thought I’d have a look at wiring in an interactive dashboard control.

You can see the result at BBC Bottom Line programme explorer:

The page loads in the contents of a source Scraperwiki database (so only good for smallish datasets in this version) and pops them into a table. The searchbox is bound to the Synopsis column and and allows you to search for terms or phrases within the Synopsis cells, returning rows for which there is a hit.

Here’s the function that I used to set up the table and search control, bind them together and render them:

    google.load('visualization', '1.1', {packages:['controls']});

    google.setOnLoadCallback(drawTable);

    function drawTable() {

      var json_data = new google.visualization.DataTable(%(json)s, 0.6);

    var json_table = new google.visualization.ChartWrapper({'chartType': 'Table','containerId':'table_div_json','options': {allowHtml: true}});
    //i expected this limit on the view to work?
    //json_table.setColumns([0,1,2,3,4,5,6,7])

    var formatter = new google.visualization.PatternFormat('<a href="http://www.bbc.co.uk/programmes/{0}">{0}</a>');
    formatter.format(json_data, [1]); // Apply formatter and set the formatted value of the first column.

    formatter = new google.visualization.PatternFormat('<a href="{1}">{0}</a>');
    formatter.format(json_data, [7,8]);

    var stringFilter = new google.visualization.ControlWrapper({
      'controlType': 'StringFilter',
      'containerId': 'control1',
      'options': {
        'filterColumnLabel': 'Synopsis',
        'matchType': 'any'
      }
    });

  var dashboard = new google.visualization.Dashboard(document.getElementById('dashboard')).bind(stringFilter, json_table).draw(json_data);

    }

The formatter is used to linkify the two URLs. However, I couldn’t get the table to hide the final column (the OpenCorporates URI) in the displayed table? (Doing something wrong, somewhere…) You can find the full code for the Scraperwiki view here.

Now you may (or may not) be wondering where the OpenCorporates ID came from. The data used to populate the table is scraped from the JSON version of the BBC programme pages for the OU co-produced business programme The Bottom Line (Bottom Line scraper). (I’ve been pondering for sometime whether there is enough content there to try to build something that might usefully support or help promote OUBS/OU business courses or link across to free OU business courses on OpenLearn…) Supplementary content items for each programme identify the name of each contributor and the company they represent in a conventional way. (Their role is also described in what looks to be a conventionally constructed text string, though I didn’t try to extract this explicitly – yet. (I’m guessing the Reuters OpenCalais API would also make light work of that?))

Having got access to the company name, I thought it might be interesting to try to get a corporate identifier back for each one using the OpenCorporates (Google Refine) Reconciliation API (Google Refine reconciliation service documentation).

Here’s a fragment from the scraper showing how to lookup a company name using the OpenCorporates reconciliation API and get the data back:

ocrecURL='http://opencorporates.com/reconcile?query='+urllib.quote_plus("".join(i for i in record['company'] if ord(i)<128))
    try:
        recData=simplejson.load(urllib.urlopen(ocrecURL))
    except:
        recData={'result':[]}
    print ocrecURL,[recData]
    if len(recData['result'])>0:
        if recData['result'][0]['score']>=0.7:
            record['ocData']=recData['result'][0]
            record['ocID']=recData['result'][0]['uri']
            record['ocName']=recData['result'][0]['name']

The ocrecURL is constructed from the company name, sanitised in a hack fashion. If we get any results back, we check the (relevance) score of the first one. (The results seem to be ordered in descending score order. I didn’t check to see whether this was defined or by convention.) If it seems relevant, we go with it. From a quick skim of company reconciliations, I noticed at least one false positive – Reed – but on the whole it seemed to work fairly well. (If we look up more details about the company from OpenCorporates, and get back the company URL, for example, we might be able to compare the domain with the domain given in the link on the Bottom Line page. A match would suggest quite strongly that we have got the right company…)

As @stuartbrown suggeted in a tweet, a possible next step is to link the name of each guest to a Linked Data identifier for them, for example, using DBPedia (although I wonder – is @opencorporates also minting IDs for company directors?). I also need to find some way of pulling out some proper, detailed subject tags for each episode that could be used to populate a drop down list filter control…

PS for more Google Dashboard controls, check out the Google interactive playground…

PPS see also: OpenLearn Glossary Search and OpenLearn LEarning Outcomes Search

Written by Tony Hirst

April 5, 2012 at 8:55 am

Infoskills 2.012 – Practical Exercises in Social Media Network Analysis #change11

As ever, it seems the longer I have to prepare something, the less likely I am to do it. I was supposed to be running a #change11 MOOC session this week – Infoskills 2.012 How to do a lot with a little – but having had it in the diary for a 6 months or so, I have, of course, done nothing to prepare for it… (I didn’t come up with the 2.012 – not sure who did?)

Anyway….over the weekend, I gave a presentation (Social Media Visualisation Hacks) that, typically, bewildered the audience with a blizzard of things that are possible when it comes to looking at social networks but that are still alien to most:

As ever, the presentation is not complete (i.e. the slides really need to be complemented by a commentary), but that’s something I hope to start working on improving – maybe starting this week…

The deck is a review – of sorts – of some of the various ways we can look at social networks and the activity that takes place within them. The slides are prompts, keys, search phrase suggestions that provide a starting point for finding out more. Many of the slides contain screenshots – and if you peer closely enough, you can often see the URL. For posts on my blog, searching with the word ouseful followed by key terms from the post title will often turn up the result on major search engines. Many of the slides identify a “hack” that is described in pseudo-tutorial form on the this blog, or on Martin Hawksey’s MASHe blog.

I put together a delicious stack of links relating to the presentation here: #drg12 – Visualising Social Networks (Tutorial Posts)

For a tutorial stack that focusses more on Yahoo Pipes (though who knows how long that is still for this world given the perilous nature of Yahoo at the moment), see: Twitter Pipes

My #change11 week was supposed to be about new info skills, with a practical emphasis. A couple of other presentations relating to how we might appropriate (a-pro-pre-eight) tools and applications can be found here: Appropriate IT – My ILI2011 Presentation and Just Do IT Yourself… MY UKSG Presentation.

If all you do in Google is 2.3 keyword searches, this deck – Making the Most of Google – (though possibly a little dated by now), may give you some new ideas.

For a more formal take on infoskills for the new age, (though I need to write a critique of this from my own left-field position), see the Cambridge University Library/Arcadia Project “New Curriculum for Information Literacy (ANCIL)” project via the Arcadia project.

If you want to do some formal reading in the visualisation space, check out 7 Classic Foundational Vis Papers You Might not Want to Publicly Confess you Don’t Know.

Via @cogdog/Alan Levine, I am reminded of Jon Udell’s Seven ways to think like the web. You do think like that, right?!;-)

Mulling over @downes’ half hour post on Education as Platform: The MOOC Experience and what we can do to make it better, I see the MOOC framework as providing an element of co-ordination, pacing and a legacy resource package. For my week, I was expected(?!) to put together some readings and exercises and maybe a webinar. But I haven’t prepared anything (I tried giving a talk at Dev8D earlier this year completely unprepared (though I did have an old presentation I could have used) and it felt to me like a car crash/a disaster, so I know I do need to prep things (even though it may not seem like it if you’ve heard me speak before!;-))

But maybe that’s okay, for one week of this MOOC? The OUSeful.info blog, where you’re maybe reading this, is an ongoing presentation, with a post typically every day or so. WHen I learn something related to the general themes of this blog, I post it here, often as a partial tutorial (partial in the sense that you often have to work through the tutorial for the words to make sense – they complement the things you should be seeing on screen – if you look – as you work through the tutorial; in a sense, the tutorial posts are often second screen complements to and drivers of an activity on another screen).

I’ve personally tried registering with MOOCs a couple of times, but never completed any of the activities. Some of the MOOC related readings or activities pass my way through blogs I follow, or tweeted links that pique my interest, and sometimes I try them out. I guess I’m creating my own unstructured uncourse* daily anyway… (*uncourses complement MOOCs, sort of… They’re courses created live in partial response to feedback, but also reflecting the “teacher”‘s learning journey through a topic. Here’s an example that led to a formal OU course: Digital Worlds uncourse blog experiment. The philosophy is based on the “teacher” modelling – and documenting – a learning journey. Uncourses fully expect the “teacher” not being totally knowledgeable about the subject area, but being happy to demonstrate how they go about making sense of a topic that may well be new to them).

So… this is my #change11 offering. It’s not part of the “formal” course, (how weird does that sound?!) unless it is… As the MOOC is now in week 29, if its principles have been taken on-board, you should be starting to figure out your own distributed co-ordination mechanisms by now. Because what else will you do when the course is over? Or will it be a course that never ends, yet ceases to have a central co-ordination point?

PS if you want to chat, this blog is open to comments; you can also find me on Twitter as @psychemedia

PPS seems like I’ve had at least one critical response (via trackbacks) towards my lacksadaisical “contribution” towards my “teaching” week on the #Change11 MOOC. True. Sorry. But not. I should have kept it simple, posted my motto – identify a problem, then hack a solution to it, every day – and left it at that… It’s how I learn about this stuff… (and any teaching I receive tends to be indirect – by virtue stuff other folk have published that I’ve discovered through web search, (aka search queries – questins – that I’ve had to formulate to help me answer the problem I have identified/created…).

Written by Tony Hirst

March 27, 2012 at 9:53 am

Posted in Open Education, Tutorial, Uncourse

Tagged with

Deconstructing OpenLearn Units – Glossary Items, Learning Outcomes and Image Search

It turns out that part of the grief I encountered here in trying to access OpenLearn XML content was easily resolved (check the comments: mechanise did the trick…), though I’ve still to try to sort out a workaround for accessing OpenLearn images (a problem described here)), but at least now I have another stepping stone: a database of some deconstructed OpenLearn content.

Using Scraperwiki to pull down and parse the OpenLearn XML files, I’ve created some database tables that contain the following elements scraped from across the OpenLearn units by this OpenLearn XML Processor:

  • glossary items;
  • learning objectives;
  • figure captions and descriptions.

You can download CSV data files corresponding to the tables, or the whole SQLite database. (Note that there is also an “errors” table that identifies units that threw an error when I tried to grab, or parse, the OpenLearn XML.)

Unfortunately, I haven’t had a chance yet to pop up a view over the data (I tried, briefly, but today was another of those days where something that’s probably very simple and obvious prevented me from getting the code I wanted to write working; if anyone has an example Scraperwiki view that chucks data into a sortable HTML table or a Simile Exhibit searchable table, please post a link below; or even better, add a view to the scraper:-)

So in the meantime, if ypu want to have a play, you need to make use of the Scraperwiki API wizard.

Here are some example queries:

  • a search for figure descriptions containing the word “communication” – select * from `figures` where desc like ‘%communication%’: try it
  • a search over learning outcomes that include the phrase how to followed at some point by the word dataselect * from `learningoutcomes` where lo like ‘%how to%data%’: try it
  • a search of glossary items for glossary terms that contain the word “period” or a definition that contains the word “ancient” – select * from `glossary` where definition like ‘%ancient%’ or term like ‘%period%’: try it
  • find figures with empty captions – select * from `figures` where caption==”: try it

I’ll try to add some more examples when I get a chance, as well as knocking up a more friendly search interface. Unless you want to try…?!;-)

Written by Tony Hirst

March 15, 2012 at 10:59 am

Do We Need an OpenLearn Content Liberation Front?

For me, one of the defining attributes of openness relates to accessibility of the machine kind: if I can’t write a script to handle the repetitive stuff for me, or can’t automate the embedding of image and/or video resources, then whatever the content is, it’s not open enough in a practical sense for me to do what I want with it.

So here’s an, erm, how can I put this politely, little niggle I have with OpenLearn XML. (For those of you not keeping up, one of the many OpenLearn sites is the OU’s open course materials site; the materials published on the site as course unit contentful HTML pages are also available as structured XML documents. (When I say “structured”, I mean that certain elements of the materials are marked up in a semantically meaningful way; lots of elements aren’t, but we have to start somewhere ;-))

The context is this: following on from my presentation on Making More of Structured Course Materials at the eSTeEM conference last week, I left a chat with Jonathan Fine with the intention of seeing what sorts of secondary product I could easily generate from the OpenLearn content. I’m in the middle of building a scraper and structured content extractor at the moment, grabbing things like learning outcomes, glossary items, references and images, but I almost immediately hit a couple of problems, first with actually locating the OU XML docs, and secondly locating the images…

Getting hold of a machine readable list of OpenLearn units is easy enough via the OpenLearn OPML feed (much easier to work with than the “all units” HTML index page). Units are organised by topic and are listed using the following format:

<outline type="rss" text="Unit content for Water use and the water cycle" htmlUrl="http://openlearn.open.ac.uk/course/view.php?name=S278_12" xmlUrl="http://openlearn.open.ac.uk/rss/file.php/stdfeed/4307/S278_12_rss.xml"/>

URLs of the form http://openlearn.open.ac.uk/course/view.php?name=S278_12 link to a ‘homepage” for each unit, which then links to the first page of actual content, content which is also available in XML form. The content page URLs have the form http://openlearn.open.ac.uk/mod/oucontent/view.php?id=398820&direct=1, where the ID is one-one uniquely mapped to the course name identifier. The XML version of the page can then be accessed by changing direct=1 in the URL to content=1. Only, we don’t know the mapping from course unit name to page id. The easiest way I’ve found of doing that is to load in the RSS feed for each unit and grab the first link URL, which points the first HTML content page view of the unit.

I’ve popped a scraper up on Scraperwiki to build the lookup for XML URLs for OpenLearn units – OpenLearn XML Processor:

import scraperwiki

from lxml import etree

#===
#via http://stackoverflow.com/questions/5757201/help-or-advice-me-get-started-with-lxml/5899005#5899005
def flatten(el):           
    result = [ (el.text or "") ]
    for sel in el:
        result.append(flatten(sel))
        result.append(sel.tail or "")
    return "".join(result)
#===

def getcontenturl(srcUrl):
    rss= etree.parse(srcUrl)
    rssroot=rss.getroot()
    try:
        contenturl= flatten(rssroot.find('./channel/item/link'))
    except:
        contenturl=''
    return contenturl

def getUnitLocations():
    #The OPML file lists all OpenLearn units by topic area
    srcUrl='http://openlearn.open.ac.uk/rss/file.php/stdfeed/1/full_opml.xml'
    tree = etree.parse(srcUrl)
    root = tree.getroot()
    topics=root.findall('.//body/outline')
    #Handle each topic area separately?
    for topic in topics:
        tt = topic.get('text')
        print tt
        for item in topic.findall('./outline'):
            it=item.get('text')
            if it.startswith('Unit content for'):
                it=it.replace('Unit content for','')
                url=item.get('htmlUrl')
                rssurl=item.get('xmlUrl')
                ccu=url.split('=')[1]
                cctmp=ccu.split('_')
                cc=cctmp[0]
                if len(cctmp)>1: ccpart=cctmp[1]
                else: ccpart=1
                slug=rssurl.replace('http://openlearn.open.ac.uk/rss/file.php/stdfeed/','')
                slug=slug.split('/')[0]
                contenturl=getcontenturl(rssurl)
                print tt,it,slug,ccu,cc,ccpart,url,contenturl
                scraperwiki.sqlite.save(unique_keys=['ccu'], table_name='unitsHome', data={'ccu':ccu, 'uname':it,'topic':tt,'slug':slug,'cc':cc,'ccpart':ccpart,'url':url,'rssurl':rssurl,'ccurl':contenturl})

getUnitLocations()

The next step in the plan (because I usually do have a plan; it’s hard to play effectively without some sort of direction in mind…) as far as images goes was to grab the figure elements out of the XML documents and generate an image gallery that allows you to search through OpenLearn images by title/caption and/or description, and preview them. Getting the caption and description from the XML is easy enough, but getting the image URLs is not

Here’s an example of a figure element from an OpenLearn XML document:

<Figure id="fig001">
<Image src="\\DCTM_FSS\content\Teaching and curriculum\Modules\Shared Resources\OpenLearn\S278_5\1.0\s278_5_f001hi.jpg" height="" webthumbnail="false" x_imagesrc="s278_5_f001hi.jpg" x_imagewidth="478" x_imageheight="522"/>
<Caption>Figure 1 The geothermal gradient beneath a continent, showing how temperature increases more rapidly with depth in the lithosphere than it does in the deep mantle.</Caption>
<Alternative>Figure 1</Alternative>
<Description>Figure 1</Description>
</Figure>

Looking at the HTML page for the corresponding unit on OpenLearn, we see it points to the image resource file at http://openlearn.open.ac.uk/file.php/4178/!via/oucontent/course/476/s278_5_f001hi.jpg:

So how can we generate that image URL from the resource link in the XML document? The filename is the same, but how can we generate what are presumably contextually relevant path elements: http://openlearn.open.ac.uk/file.php/4178/!via/oucontent/course/476/

If we look at the OpenLearn OPML file that lists all current OpenLearn units, we can find the first identifier in the path to the RSS file:

<outline type="rss" text="Unit content for Energy resources: Geothermal energy" htmlUrl="http://openlearn.open.ac.uk/course/view.php?name=S278_5" xmlUrl="http://openlearn.open.ac.uk/rss/file.php/stdfeed/4178/S278_5_rss.xml"/>

But I can’t seem to find a crib for the second identifier – 476 – anywhere? Which means I can’t mechanise the creation of links to actually OpenLearn image assets from the XML source. Also note that there are no credits, acknowledgements or license conditions associated with the image contained within the figure description. Which also makes it hard to reuse the image in a legal, rights recognising sense.

[Doh - I can surely just look at URL for an image in an OpenLearn unit RSS feed and pick the path up from there, can't I? Only I can't, because the image links in the RSS feeds are: a) relative links, without path information, and b) broken as a result...]

Reusing images on the basis of the OpenLearn XML “sourcecode” document is therefore: NOT OBVIOUSLY POSSIBLE.

What this suggests to me is that if you release “source code” documents, they may actually need some processing in terms of asset resolution that generates publicly resolvable locators to assets if they are encoded within the source code document as “private” assets/non-resolvable identifiers.

Where necessary, acknowledgements/credits are provided in the backmatter using elements of the form:

<Paragraph>Figure 7 Willes-Richards, J., et al. (1990) ; HDR Resource/Economics’ in Baria, R. (ed.) <i>Hot Dry Rock Geothermal Energy</i>, Copyright CSM Associates Limited</Paragraph>

Whilst OU-XML does support the ability to make a meaningful link to a resource within the XML document, using an element of the form:

<CrossRef idref="fig007">Figure 7</CrossRef>

(which presumably uses the Alternative label as the cross-referenced identifier, although not the figure element id (eg fig007) which is presumably unique within any particular XML document?), this identifier is not used to link the informally stated figure credit back to the uniquely identified figure element?

If the same image asset is used in several course units, there is presumably no way of telling from the element data (or even, necessarily, the credit data?) whether the images are in fact one and the same. That is, we can’t audit the OpenLearn materials in a text mechanised way to see whether or not particular images are reused across two or more OpenLearn units.

Just in passing, it’s maybe also worth noting that in the above case at least, a description for the image is missing. In actual OU course materials, the description element is used to capture a textual description of the image that explicates the image in the context of the surrounding text. This represents a partial fulfilment of accessibility requirements surrounding images and represents, even if not best, at least effective practice.

Where else might content need liberating within OpenLearn content? At the end of the course unit XML documents, in the “backmatter” element, there is often a list of references. References have the form:

<Reference>Sheldon, P. (2005) Earth’s Physical Resources: An Introduction (Book 1 of S278 Earth’s Physical Resources: Origin, Use and Environmental Impact), The Open University, Milton Keynes</Reference>

Hmmm… no structure there… so how easy would it be to reliably generate a link to an authoritative record for that item? (Note that other records occasionally use presentational markup such as italics (or emphasis) tags to presentationally style certain parts of some references (confusing presentation with semantics…).)

Finally, just a quick note on why I’m blogging this publicly rather than raising it, erm, quietly within the OU. My reasoning is similar to the reasoning we use when we tell students to not be afraid of asking questions, because it’s likely that others will also have the same question… I’m asking a question about the structure of an open educational resource, because I don’t quite understand it; by asking the question in public, it may be the case that others can use the same questioning strategy to review the way they present their materials, so when I find those, I don’t have to ask similar sorts of question again;-)

PS sort of related to this, see TechDis’ Terry McAndrew’s Accessible courses need and accessibilty-friendly schema standard.

PPS see also another take on ways of trying to reduce cognitive waste – Joss Winn’s latest bid in progress, which will examine how the OAuth 2.0 specification can be integrated into a single sign on environment alongside Microsoft’s Unified Access Gateway. If that’s an issue or matter of interest in your institution, why not fork the bid and work it up yourself, or maybe even fork it and contribute elements back?;-) (Hmm, if several institutions submitted what was essentially the same bid from multiple institutions, how would they cope during the marking process?!;-)

Written by Tony Hirst

March 13, 2012 at 1:10 pm

Follow

Get every new post delivered to your Inbox.

Join 729 other followers