OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for the ‘Open Education’ Category

Losing Experimental Edtech Value from IPython Notebooks Because of New Security Policies?

Just like the way VLEs locked down what those who wanted to try to stuff out could do with educational websites, usually on the grounds of “security”, so a chunk of lightweight functionality with possible educational value that I was about to start to exploring inside IPython notebooks has been locked out by the new IPython notebook security policy:

Affected use cases
Some use cases that work in IPython 1.0 will become less convenient in 2.0 as a result of the security changes. We do our best to minimize these annoyance, but security is always at odds with convenience.

Javascript and CSS in Markdown cells
While never officially supported, it had become common practice to put hidden Javascript or CSS styling in Markdown cells, so that they would not be visible on the page. Since Markdown cells are now sanitized (by Google Caja), all Javascript (including click event handlers, etc.) and CSS will be stripped.

Here’s what I’ve been exploring – using a simple button:

ipynb button

to reveal an answer:

ipynb button reveal

It’s a 101 interaction style in “e-learning” (do we still call it that?!) and one that I was hoping to explore more given the interactive richness of the IPython notebook environment.

Here’s how I implemented it – a tiny bit of Javascript hidden in one of the markdown cells:

<script type="text/javascript">
   function showHide(id) {
       var e = document.getElementById(id);
       if(e.style.display == 'block')
          e.style.display = 'none';
       else
          e.style.display = 'block';
   }
</script>

and then a quick call from a button onclick event handler to reveal the answer block:

<input type="button" value="Answer" onclick="showHide('ans2')">

<div id="ans2" style="display:none">I can see several ways of generating common identifiers:

<ul><li>using the **gss** code from the area data, I could generate identifiers of the form `http://http://statistics.data.gov.uk/id/statistical-geography/GSS`</li>
<li>from the housing start data, I could split the *Reference Area* on space characters and then extract the GSS code from the first item in the split list</li>
<li>The *districtname* in the area data looks like it make have "issues" with spacing in area names. If we remove spaces and turn everything to lower case in the area data *districtname* and the *Reference Area* in the housing data, we *may* be able create matching keys. But it could be a risky strategy...</li>
</ul></div>

This won’t work anymore – and I don’t have the time to learn whether custom CSS can do this, and if so, how.

I don’t really want to have to go back to the approach I tried before I demoed the button triggered reveal example to myself…

ipynb another interaction

That is, putting answers into a python library and then using code to pull the text answer in…

ipynb color styling

Note also the use of colour in the cells – this is something else I wanted to try to explore, the use of styling to prompt interactions; in the case of IPython notebooks, I quite like the idea of students taking ownership of the notebooks and adding content to it, whether by adding commentary text to cells we have written in, adding their own comment cells (perhaps using a different style – so a different cell type?), amending code stubs we have written, adding in their own code, perhaps as code complements to comment prompts we have provided, etc etc.

ipynb starting to think about different interactions...

The quick hack, try and see option that immediately came to mind to support these sorts of interaction seems to have been locked out (or maybe not – rather than spending half an hour on a quick hack I’ll have to spend have an hour reading docs…). This is exactly the sort of thing that cuts down on our ability to mix ideas and solutions picked up from wherever, and just try them out quickly; and whilst I can see the rationale, it’s just another of those things to add to the when the web was more open pile. (I was going to spend half an hour blogging a post to let other members of the course team I’m on know how to add revealed answers to their notebooks, but as I’ve just spent 18 hours trying to build a VM box that supports python3 and the latest IPythion notebook, I’m a bit fed up at the thought of having to stick with the earlier version py’n’notebook VM I built because it’s easier for us to experiment with…)

I have to admit that some of the new notebook features look like they could be interesting from a teaching point of view in certain subject areas – the ability to publish interactive widgets where the controls talk to parameters accessed via the notebook code cells, but that wasn’t on my to do list for the next week…

What I was planning to do was explore what we’d need to do to get elements of the notebook behaving like elements in OU course materials, under the assumption that our online materials have designs that go hand in hand with good pedagogy. (This is a post in part about OU stuff, so necessarily it contains the p-word.)

ou teaching styling

Something else on the to do list was to explore how to tweak the branding of the notebook, for example to add in an OU logo or (for my other day per week), a School of Data logo. (I need to check the code openness status of IPython notebooks… How bad form would it be to remove the IPy logo for example? And where should a corporate log go? In the toolbar, or at the top of the content part of the notebook? If you just contribute content, I guess the latter; if you add notebook functionality, maybe the topbar is okay?)

There are a few examples of styling notebooks out there, but I wonder – will those recipes still work?

Ho hum – this post probably comes across as negative about IPython notebooks, but it shouldn’t because they’re a wonderful environment (for example, Doodling With IPython Notebooks for Education and Time to Drop Calculators in Favour of Notebook Programming?). I’m just a bit fed up that after a couple of days graft I don’t get to have half and hour’s fun messing around with look and feel. Instead, I need to hit the docs to find out what’s possible and what isn’t because the notebooks are no longer an open environment as they were… Bah..:-(

Written by Tony Hirst

April 11, 2014 at 6:10 pm

Posted in Open Education, OU2.0, Tinkering

Tagged with ,

MOOC Reflections

A trackback a week or two ago to my blog from this personal blog post: #SNAc week 1: what are networks and what use is it to study them? highlighted me to a MOOC currently running on Coursera on social network analysis. The link was contextualised in the post as follows: The recommended readings look interesting, but it’s the curse of the netbook again – there’s no way I’m going to read a 20 page PDF on a screen. Some highlighted resources from Twitter and the forum look a bit more possible: … Some nice ‘how to’ posts: … (my linked to post was in the ‘howto’ section).

The whole MOOC hype thing at the moment seems to be dominated by references to the things like Coursera, Udacity and edX (“xMOOCs”). Coursera in particularly is a new sort of intermediary, a website that offers some sort of applied marketing platform to universities, allowing them to publish sample courses in a centralised, browsable, location and in a strange sense legitimising them. I suspect there is some element of Emperor’s New Clothes thinking going on in the universities who have opted in and those who may be considering it: “is this for real?”; “can we afford not to be a part of it?”

Whilst Coursera has an obvious possible business model – charge the universities for hosting their marketing material courses – Udacity’s model appears more pragmatic: provide courses with the option of formal assessment via Pearson VUE assessment centres, and then advertise your achievements to employers on the Udacity site; presumably, the potential employers and recruiters (which got me thinking about what role LinkedIn might possibly play in this space?) are seen as the initial revenue stream for Udacity. Note that Udacity’s “credit” awarding powers are informal – in the first instance, credibility is based on the reputation of the academics who put together the course; in contrast, for courses on Coursera, and the rival edX partnership (which also offers assessment through Pearson VUE assessment centres), credibility comes from the institution that is responsible for putting together the course. (It’s not hard to imagine a model where institutions might even badge courses that someone else has put together…)

Note that Coursera, Udacity and edX are all making an offering based on quite a traditional course model idea and are born out of particular subject disciplines. Contrast this in the first part with something like Khan Academy, which is providing learning opportunities at a finer level of granularity/much smaller “learning chunks” in the form of short video tutorials. Khan Academy also provides the opportunity for Q&A based discussion around each video resource.

Also by way of contrast are the “cMOOC” style offerings inspired by the likes of George Siemens, Stephen Downes, et al., where a looser curriculum based around a set of topics and initially suggested resources is used to bootstrap a set of loosely co-ordinated personal learning journeys: learners are encouraged to discover, share and create resources and feed them into the course network in a far more organic way than the didactic, rigidly structured approach taken by the xMOOC platforms. The cMOOC style also offeres the possibility of breaking down subject disciplines through accepting shared resources contributed because they are relevant to the topic being explored, rather than because they are part of the canon for a particular discipline.

The course without boundaries approach of Jim Groom’s ds106, as recently aided and abetted by Alan Levine, also softens the edges of a traditionally offered course with its problem based syllabus and open assignment bank (particpants are encouraged to submit their own assignment ideas) and turns learning into something of a lifestyle choice… (Disclaimer: regular readers will know that I count the cMOOC/ds106 “renegades” as key forces in developing my own thinking…;-)

Something worth considering about the evolution of open education from early open content/open educational resource (OER) repositories and courseware into the “Massive Open Online Course” thing is just what caused the recent upsurge in interest? Both MIT opencourseware and the OU’s OpenLearn offerings provided “anytime start”, self-directed course units; but my recollection is that it was Thrun & Norvig’s first open course on AI (before Thrun launched Udacity), that captured the popular (i.e. media) imagination because of the huge number of students that enrolled. Rather than the ‘on-demand’ offering of OpenLearn, it seems that the broadcast model, and linear course schedule, along with the cachet of the instructors, were what appealed to a large population of demonstrably self-directed learners (i.e. geeks and programmers, who spend their time learning how to weave machines from ideas).

I also wonder whether the engagement of universities with intermediary online course delivery platforms will legitimise online courses run by other organisations; for example, the Knight Centre Massive Open Online Courses portal (a Moodle environment) is currently advertising it’s first MOOC on infographics and data visualisation:

Similar to other Knight Center online courses, this MOOC is divided into weekly modules. But unlike regular offerings, there will be no application or selection process. Anyone can sign up online and, once registered, participants will receive instructions on how to enroll in the course. Enrollees will have immediate access to the syllabus and introductory information.

The course will include video lectures, tutorials, readings, exercises and quizzes. Forums will be available for discussion topics related to each module. Because of the “massive” aspect of the course, participants will be encouraged to provide feedback on classmates’ exercises while the instructor will provide general responses based on chosen exercises from a student or group of students.

Cairo will focus on how to work with graphics to communicate and analyze data. Previous experience in information graphics and visualization is not needed to take this course. With the readings, video lectures and tutorials available, participants will acquire enough skills to start producing compelling, simple infographics almost immediately. Participants can expect to spend 4-6 hours per week on the course.

Although the course will be free, if participants need to receive a certificate, there will be a $20 administrative fee, paid online via credit card, for those who meet the certificate requirements. The certificate will be issued only to students who actively participated in the course and who complied with most of the course requirements, such as quizzes and exercises. The certificates will be sent via email as a PDF document. No formal course credit of any kind is associated with the certificate.

Another of the things that I’ve been pondering is the role that “content” may or not play a role in this open course thing. Certainly, where participants are encouraged to discover and share resources, or where instructors seek to construct courses around “found resources”, an approach espoused by the OU’s new postgraduate strategy, it seems to me that there is an opportunity to contribute to the wider open learning idea by producing resources that can be “found”. For resources to be available as found resources, we need the following:

  1. Somebody needs to have already created them…
  2. They need to be discoverable by whoever is doing the finding
  3. They need to be appropriately licensed (if we have to go through a painful rights clearnance and rights payment model, the cost benefits of drawing on and freely reusing those resources are severely curtailed).

Whilst the running of a one shot MOOC may attract however many participants, the production of finer grained (and branded) resources that can be used within those courses means that a provider can repeatedly, and effortlessly, contribute to other peoples courses through course participants pulling the resources into those coure contexts. (It also strikes me that educators in one institution could sign up for a course offered by another, and then drop in links to their own applied marketing learning materials.)

One thing I’ve realised from looking at Digital Worlds uncourse blog stats is that some of the posts attract consistent levels of traffic, possibly because they have been embedded to from other course syllabuses. I also occasionally see flurries of downloads of tutorial files, which makes me wonder whether another course has linked to resources I originally produced. If we think of the web in it’s dynamic and static modes (static being the background links that are part of the long term fabric of the web, dynamic as the conversation and link sharing that goes on in social networks, as well as the publication of “alerts” about new fabric (for example, the publication of a new blog post into the static fabric of the web is announced through RSS feeds and social sharing as part of the dynamic conversation)), then the MOOCs appear to be trying to run in a dynamic, broadcast mode. Whereas what interests me is how we can contribute to the static structure of the web, and how we can make better use of it in a learning context?

PS a final thought – running scheduled MOOCs is like a primetime broadcast; anytime independent start is like on-demand video. Or how about this: MOOCs are like blockbuster books, published to great fanfare and selling millions of first day, pre-ordered copies. But there’s also long tail over time consumption of the same books… and maybe also books that sell steadily over time without great fanfare. Running a course once is all well and good; but it feels too ephemeral, and too linear rather than networked thinking to me?

Written by Tony Hirst

October 9, 2012 at 9:04 am

Therapy Time: Networked Personal Learning and a Reflection on the Urban Peasant…

Way back when I was a postgrad, I used to spend a coffee fuelled morning reading in bed, and then get up to eat a cooked breakfast whilst watching the Urban Peasant:

My abiding memory, in part confirmed by several of the asides in the above clip (can you guess which?!), was that of “agile cooking” and flexible recipes. A chicken curry (pork’s fine too, or beef, even fish if you like; or potato if you want a vegetarian version) could be served with rice (or bread, or a baked potato); if you didn’t like curry, you could leave out the spices or curry powder, and just use a stock cube. If a recipe called for chopped vegetables, you could also grate them or slice them or dice them or…”it’s your decision”. Potato and peas could equally well be carrot or parsnip and beans. If you needed to add water to a recipe, you could add wine, or beer, or fruit juice or whatever instead; if you wanted to have scrambled egg on toast, you could also fry it, or poach it, or boil it. And the toast could be a crumpet or a muffin or just use “whatever you’ve got”.

The ethos was very much one of: start with an idea, and/or see what you’ve got, and then work with it – a real hacker ethic. It also encouraged you to try alternative ideas out, to be adaptive. And I’m pretty sure mistakes happened too – but that was fine…

When I play with data, I often have a goal in mind (albeit a loose one), used to provide a focus for exploring a data set I want to explore a little (typically using Schneiderman’s “Overview first, zoom and filter, then details-on-demand” approach), to see what potential it might hold, or to act a testbed for a tool or technique I want to try out. The problem then becomes one of coming up with some sort of recipe that works with the data and tools I have to hand, as well as the techniques and processes I’ve used before. Sometimes, a recipe I’m working on requires me to get another ingredient out of the fridge, or another utensil out of the cupboard. Sometimes I use a tea towel as an oven glove, or a fork as a knife. Sometimes I taste the food-in-process to know when it’s done, sometimes I go by the colour, texture, consistency, shape, smell or clouds of smoke that have started to appear.

Because I haven’t had any formal training in any of this “stuff”, using “approved” academic sources (I’ve recently been living by R-Bloggers (which is populated by quite a few academics) and Stack Overflow, for example), I suffer from a lack of confidence in talking about it in an academic way (see for example For My One Thousandth Blogpost: The Un-Academic), and a similar lack of confidence in feeling that I could ever charge anybody a fee for telling them what I (think I) know (leave aside for the moment that I effectively charge the OU my salary, benefits and on-costs… hmmm?!). I used to do the academic thing way back when as a postgrad and early postdoc, but fell out of the habit over the last few years because there seemed to me to be a huge amount of investment of time required for very little impact or consequence of what I was doing. Yes, it’s important for things be “right”, but I’m not sure my maths is up to generating formal proofs of new algorithms. I may be able to do the engineering or technologist thing of getting something working, -ish, good enough “for now”, research-style coding, but it’s always mindful of an engineering style trade-off: that it might not be “right” and is just something I figured out that seems to work, but that it’ll do because it lets me get something done… As Artur Bergman puts it using rather colourful language – “yes, correlation isn’t causation, but…”

(This clip was originally brought to mind by a recent commentary from Stephen Downes on The Internet Blowhard’s Favorite Phrase, and the original post it refers to.)

Also mixed up in the notion of “right” is seeing things as “right” if they are formally recognised or accepted as such, which is where assessment and peer review come in: you let other people you trust make an assessment about whatever it is you do/have done, publicly recognising your achievements which in turn allows you to make a justifiable claim to them. (I am reminded here of the definition of knowledge as justified true belief. That word “justified” is interesting, isn’t it…?)

As well as resisting getting in the whole grant bidding cycle for turnover generating, public money cycling projects that are set up to fail, I’ve also recently started to fall out of OU-style formal teaching roles… again, in part because of the long lead times involved with producing course materials and my preference for network based, rather than teamwork based, working style. (I so need to revisit formal models of teamwork and try to come up with a corresponding formulation for networks rather than teams…Or do a lit review to find one that’s already out there…!) I tend to write in 1 hour chunks based on 3-4 hours work, then post whatever it is I’ve done. One reason for doing this is becuase I figure most people read or do things in 5 to 15 minutes or one to two hour chunks and that in a network-centric, distributed online online educational setting small chunks are more likely to be discoverable and immediately useful (directly and immediately learnable from) chunks. There’s no shame in using a well crafted Wikipedia as a starting point for discovering more detailed – and academic – resources: at least you stand a good chance of finding the Wikipedia page! In the same way, I try to link out out to supporting resources from most of my posts so that readers (including myself as a revisitor to these pages in that set) have some additional context, supporting or conflicting material to help get more value from it. (Related: Why I (Edu)Blog.)

Thinking about my own personal lack of confidence, which in part arises from the way I have informally learned whatever it is that I have actually learned over the last few years and not had it formally validated by anybody else, my interest in espousing an informal style networked learning on others is an odd one… Because based on my own experience, it doesn’t give me the feeling that what I know is valid (justified..?), or even necessarily trustable by anybody other than me (because I know how it’s caveated because of what I have personally learned about it, rather than just being told about it), even if it is pragmatic and at least occasionally appears to be useful. (Hmm… I don’t think an OU editor would let me get away with a sentence like that in a piece of OU course material!) Maybe I need to start keeping a second, formalised reflective learning journal as the OU Skills for OU Study suggests to log what I learn, and provide some sort of indexable and searchable metadata around it? In fact, this approach might be a useful approach if I do another uncourse? (It also brings to mind the word equation: Learning material + metadata = learning object (it was something like that, wasn’t it?!))

To the extent that this blog is an act of informal, open teaching, I think it offers three main things: a) “knowledge transferring” discoverable resources on a variety of specialised topics; b) fragmentary records of created knowledge (I *think* I’ve managed to make up odd bits of new stuff over the last few years…); c) a model of some sort of online distributed network centric learning behaviour (see also the Digital Worlds Uncourse Blog Experiment in this respect).

I guess one of the things I do get to validate against is the real world. When I used to go into schools doing robotics activities*, kids would ask me if their robot or programme was “right”. In many cases, there wasn’t really a notion of “right”, it was more a case of:

  • were there things that were obviously wrong?
  • did the thing work as anticipated (or indeed, did any elements of it work at all?!;-)?
  • were there any bits that could be improved, adapted or done in another more elegant way?

So it is with some of my visualisation “experiments” – are they not wrong (is the data right, is there a sensible relationship between the data and the visual mappings)? do they “work” at all (eg in the sense of communicating a particular trend, or revealing a particular anomaly)? could they be improved? By running the robot program, or trying to read the story a data visualisation appears to be telling us, we can get a sense of how “right” it is; but there is often no single “right” for it to be. Which is where doubt can crop in… Because if something is “not right”, then maybe it’s “wrong”…?

In the world of distributed, networked learning, I think one thing we need to work on is developing an appropriate sense of validation and legitimisation of personal learning. Things like badges are weak extrinsic signs that some would claim have a role in this, but I wonder how networks and communities can be shaped and architected, or how their dynamics might work, so that learners develop not only a well-founded intrinsic confidence about what they have self-learned, but also a feeling that what they have self-learned is as legitimate as something they have been formally taught? (I know, I know: “I was at the University of Life, me”… As I am, now… which reminds me, I’ve a Coursera video and Feynman lecture on Youtube to watch, and a couple of code mentor answers to questions I’ve raised on various Stack Exchange sites to read through; and I probably should check to see if there are any questions recently posted to Stack Overflow that I may be able to answer and use to link out to other, more academic “open educational” resources…)

[Rereading this post, I think I am suffering from a lack of formality and the sense of justification that comes with it. Hmmm...]

* This is something I’ve recently been asked to do again for an MK local primary school in the new year; the contact queried how much I might charge and whilst in the past I would have said “no need”, for some reason this time I felt obliged to seek advice about from the Deanery about whether I should charge, and if so how much. This a huge personal cultural shift away from my traditional “of course/pro bono” attitude, and it felt wrong, somehow. To the extent that universities are public bodies, they should work with other public services in their local and extended communities. But of course, I get the sense we’re not really being encouraged to think of ourselves as public bodies very much any more, we’re commercial services… And that feeling affects the personal responsibility I feel when acting for and on behalf of the university. As it turns out, the Deanery seems keen that we participate freely in community events… But I note here that I felt (for the first time) as if I had to check first. So what’s in the air?

See also: Terran Lane’s On Leaving Academia and (via @boyledsweetie) Inspirational teaching: since when did entertainment not matter?

Written by Tony Hirst

October 4, 2012 at 1:32 pm

PEERing at Education…

I just had a “doh!” moment in the context of OERs – Open Educational Resources, typically so called because they are Resources produced by an Educator under an Open content license (which to all intents and purposes is a copyright waiver). One of the things that appeals to me about OERs is that there is no reason for them not to be publicly discoverable which makes them the ideal focus for PEER – Public Engagement with Educational Resources. Which is what the OU traditionally offered through 6am TV broadcasts of not-quite-lectures…

Or how about this one?

And which the OU is now doing through iTunesU and several Youtube Channels, such as OU Learn:


(Also check out some of the other OU playlists…or OU/BBC co-pros currently on iPlayer;-)

PS It also seems to me that users tend not to get too hung up about how things are licensed, particularly educational ones, because education is about public benefit and putting constraints on education is just plain stoopid. Discovery is nine tenths of law, as it were. The important thing about having something licensed as an OER is that no-one can stop you from sharing it… (which even if you’re the creator of a resource, you may not b able to do; academics, for example, often hand over the copyright of their teaching materials to their employer, and their employer’s copyright over their research output (similarly transferred as a condition of employment) to commercial publishers who then sell the content back to their employers.

Written by Tony Hirst

May 16, 2012 at 3:33 pm

Posted in Open Education, OU2.0

Tagged with

The Learning Journey Starts Here: Youtube.edu and OpenLearn Resource Linkage

Mulling over the OU’s OULearn pages on Youtube a week or two ago, colleague Bernie Clark pointed out to me how the links from the OU clip descriptions could be rather hit or miss:

Via @lauradee, I see that the OU has a new offering on YouTube.com/edu is far more supportive of links to related content, links that can represent the start of a learning journey through OU educational – and commentary – content on the OU website.

Here’s a way in to the first bit of OU content that seems to have appeared:

This links through to a playlist page with a couple of different sorts of opportunity for linking to resources collated at the “Course materials” or “Lecture materials” level:

(The language gives something away, I think, about the expectation of what sort of content is likely to be uploaded here…)

So here, for example, are links at the level of the course/playlist:

And here are links associated with each lecture, erm, clip:

In this first example, several types of content are being linked to, although from the link itself it’s not immediately obvious what sort of resource a link points to? For example, some of the links lead through to course units on OpenLearn/Learning Zone:

Others link through to “articles” posted on the OpenLearn “news” site (I’m not ever really sure how to refer to that site, or the content posts that appear on it?)

The placing of content links into the Assignments and Others tabs always seems a little arbitrary to me from this single example, but I suspect that when a few more lists have been posted some sort of feeling about what sorts of resources should go where (i.e. what folk might expect by “Assignment” or “Other” resource links). If there’s enough traffic generated through these links, a bit of A/B testing might even be in order relating to the positioning of links within tabs and the behaviour of students once they click through (assuming you can track which link they clicked through, of course…)?

The transcript link is unambiguous though! And, in this case at least), resolves to a PDF hosted somewhere on the OU podcasts/media filestore:

(I’m not sure if caption files are also available?)

Anyway – it’ll be interesting to hear back about whether this enriched linking experience drives more traffic to the OpenLearn resources, as well as whether the positioning of links in the different tab areas has any effect on engagement with materials following a click…

And as far as the linkage itself goes, I’m wondering: how are the links to OpenLearn course units and articles generated/identified, and are those links captured in one of the data.open.ac.uk stores? Or is the process that manages what resource links get associated with lists and list items on Youtube/edu one that doesn’t leave (or readily support the automated creation of) public data traces?

PS How much (if any( of the linked resource goodness is grabbable via the Youtube API, I wonder? If anyone finds out before me, please post details in the comments below:-)

Written by Tony Hirst

April 27, 2012 at 1:53 pm

Scraperwiki Powered OpenLearn Searches – Learning Outcomes and Glossary Items

A quick follow up to Tinkering With Scraperwiki – The Bottom Line, OpenCorporates Reconciliation and the Google Viz API demonstrating how to reuse that pattern (a little more tinkering is required to fully generalise it, but that’ll probably have to wait until after the Easter wifi-free family tour… I also need to do a demo of a pure HTML/JS version of the approach).

In particular, a search over OpenLearn learning outcomes:

and a search over OpenLearn glossary items:

Both are powered by tables from my OpenLearn XML Processor scraperwiki.

Written by Tony Hirst

April 5, 2012 at 12:02 pm

Tinkering With Scraperwiki – The Bottom Line, OpenCorporates Reconciliation and the Google Viz API

Having got to grips with adding a basic sortable table view to a Scraperwiki view using the Google Chart Tools (Exporting and Displaying Scraperwiki Datasets Using the Google Visualisation API), I thought I’d have a look at wiring in an interactive dashboard control.

You can see the result at BBC Bottom Line programme explorer:

The page loads in the contents of a source Scraperwiki database (so only good for smallish datasets in this version) and pops them into a table. The searchbox is bound to the Synopsis column and and allows you to search for terms or phrases within the Synopsis cells, returning rows for which there is a hit.

Here’s the function that I used to set up the table and search control, bind them together and render them:

    google.load('visualization', '1.1', {packages:['controls']});

    google.setOnLoadCallback(drawTable);

    function drawTable() {

      var json_data = new google.visualization.DataTable(%(json)s, 0.6);

    var json_table = new google.visualization.ChartWrapper({'chartType': 'Table','containerId':'table_div_json','options': {allowHtml: true}});
    //i expected this limit on the view to work?
    //json_table.setColumns([0,1,2,3,4,5,6,7])

    var formatter = new google.visualization.PatternFormat('<a href="http://www.bbc.co.uk/programmes/{0}">{0}</a>');
    formatter.format(json_data, [1]); // Apply formatter and set the formatted value of the first column.

    formatter = new google.visualization.PatternFormat('<a href="{1}">{0}</a>');
    formatter.format(json_data, [7,8]);

    var stringFilter = new google.visualization.ControlWrapper({
      'controlType': 'StringFilter',
      'containerId': 'control1',
      'options': {
        'filterColumnLabel': 'Synopsis',
        'matchType': 'any'
      }
    });

  var dashboard = new google.visualization.Dashboard(document.getElementById('dashboard')).bind(stringFilter, json_table).draw(json_data);

    }

The formatter is used to linkify the two URLs. However, I couldn’t get the table to hide the final column (the OpenCorporates URI) in the displayed table? (Doing something wrong, somewhere…) You can find the full code for the Scraperwiki view here.

Now you may (or may not) be wondering where the OpenCorporates ID came from. The data used to populate the table is scraped from the JSON version of the BBC programme pages for the OU co-produced business programme The Bottom Line (Bottom Line scraper). (I’ve been pondering for sometime whether there is enough content there to try to build something that might usefully support or help promote OUBS/OU business courses or link across to free OU business courses on OpenLearn…) Supplementary content items for each programme identify the name of each contributor and the company they represent in a conventional way. (Their role is also described in what looks to be a conventionally constructed text string, though I didn’t try to extract this explicitly – yet. (I’m guessing the Reuters OpenCalais API would also make light work of that?))

Having got access to the company name, I thought it might be interesting to try to get a corporate identifier back for each one using the OpenCorporates (Google Refine) Reconciliation API (Google Refine reconciliation service documentation).

Here’s a fragment from the scraper showing how to lookup a company name using the OpenCorporates reconciliation API and get the data back:

ocrecURL='http://opencorporates.com/reconcile?query='+urllib.quote_plus("".join(i for i in record['company'] if ord(i)<128))
    try:
        recData=simplejson.load(urllib.urlopen(ocrecURL))
    except:
        recData={'result':[]}
    print ocrecURL,[recData]
    if len(recData['result'])>0:
        if recData['result'][0]['score']>=0.7:
            record['ocData']=recData['result'][0]
            record['ocID']=recData['result'][0]['uri']
            record['ocName']=recData['result'][0]['name']

The ocrecURL is constructed from the company name, sanitised in a hack fashion. If we get any results back, we check the (relevance) score of the first one. (The results seem to be ordered in descending score order. I didn’t check to see whether this was defined or by convention.) If it seems relevant, we go with it. From a quick skim of company reconciliations, I noticed at least one false positive – Reed – but on the whole it seemed to work fairly well. (If we look up more details about the company from OpenCorporates, and get back the company URL, for example, we might be able to compare the domain with the domain given in the link on the Bottom Line page. A match would suggest quite strongly that we have got the right company…)

As @stuartbrown suggeted in a tweet, a possible next step is to link the name of each guest to a Linked Data identifier for them, for example, using DBPedia (although I wonder – is @opencorporates also minting IDs for company directors?). I also need to find some way of pulling out some proper, detailed subject tags for each episode that could be used to populate a drop down list filter control…

PS for more Google Dashboard controls, check out the Google interactive playground…

PPS see also: OpenLearn Glossary Search and OpenLearn LEarning Outcomes Search

Written by Tony Hirst

April 5, 2012 at 8:55 am

Follow

Get every new post delivered to your Inbox.

Join 797 other followers