Archive for the ‘Presentation’ Category
I was due to be at #odw13 today, but circumstances beyond my control intruded…
The presentation I was down to give related to some of the things we could do with company data from OpenCorporates. Here’s a related thing that covers some of what I was intending to talk about…
(I’m experimenting with a new way of putting together presentations by actually writing notes for each slide. Please let me know via the comments whether you think this approach makes my slidedecks any easier to understand!)
Slides without commentary from a presentation I gave to undergrads on data journalism at the University of Lincoln yesterday…
My plan is to write some words around this deck, (or maybe even to record (and perhaps transcribe), some sort of narration…) just not right now…
I’m happy to give variants of this presentation elsewhere, if you can cover costs…
Whilst preparing for my typically overloaded #online12 presentation, I thought I should make at least a passing attempt at contextualising it for the corporate attendees. The framing idea I opted for, but all too briefly reviewed, was whether open public data might be disruptive to the information industry, particularly purveyors of information services in vertical markets.
If you’ve ever read Clayton Christensen’s The Innovator’s Dilemma, you’ll be familiar with the idea behind disruptive innovations: incumbents allow start-ups with cheaper ways of tackling the less profitable, low-quality end of the market to take that part of the market; the start-ups improve their offerings, take market share, and the incumbent withdraws to the more profitable top-end. Learn more about this on OpenLearn: Sustaining and disruptive innovation or listen again to the BBC In Business episode on The Innovator’s Dilemma, from which the following clip is taken.
In the information industry, the following question then arises: will the availability of free, open public data be adopted at the low, or non-consuming end of the market, for example by micro- and small companies who haven’t necessarily be able to buy in to expensive information or data services, either on financial grounds or through lack of perceived benefits? Will the appearance of new aggregation services, often built around screenscrapers and/or public open data sources start to provide useful and useable alternatives at the low end of the market, in part because of their (current) lack of comprehensiveness or quality? And if such services are used, will they then start to improve in quality, comprehensiveness and service offerings, and in so doing start a ratcheting climb to quality that will threaten the incumbents?
Here are a couple of quick examples, based around some doodles I tried out today using data from OpenCorporates and OpenlyLocal. The original sketch (demo1() in the code here) was a simple scraper on Scraperwiki that accepted a person’s name, looked them up via a director search using the new 0.2 version of the OpenCorporates API, pulled back the companies they were associated with, and then looked up the other directors associated with those companies. For example, searching around Nigel Richard Shadbolt, we get this:
One of the problems with the data I got back is that there are duplicate entries for company officers; as Chris Taggart explained, “[data for] UK officers [comes] from two Companies House sources — data dump and API”. Another problem is that officers’ records don’t necessarily have start/end dates associated with them, so it may be the case that directors’ terms of office don’t actually overlap within a particular company. In my own scraper, I don’t check to see whether an officer is marked as “director”, “secretary”, etc, nor do I check to see whether the company is still a going concern or whether it has been dissolved. Some of these issues could be addressed right now, some may need working on. But in general, the data quality – and the way I work with it – should only improve from this quick’n'dirty minimum viable hack. As it is, I now have a tool that at a push will give me a quick snapshot of some of the possible director relationships surrounding a named individual.
The second sketch (demo2() in the code here) grabbed a list of elected council members for the Isle of Wight Council from another of Chris’ properties, OpenlyLocal, extracted the councillors names, and then looked up directorships held by people with exactly the same name using a two stage exact string match search. Here’s the result:
As with many data results, this is probably most meaningful to people who know the councillors – and companies – involved. The results may also surprise people who know the parties involved if they start to look-up the companies that aren’t immediately recognisable: surely X isn’t a director of Y? Here we have another problem – one of identity. The director look-up I use is based on an exact string match: the query to OpenCorporates returns directors with similar names, which I then filter to leave only directors with exactly the same name (I turn the strings to lower case so that case errors don’t cause a string mismatch). (I also filter companies returned to be solely ones with a gb jurisdiction.) In doing the lookup, we therefore have the possibility of false positive matches (X is returned as a director, but it’s not the X we mean, even though they have exactly the same name); and false negative lookups (eg where we look up a made up director John Alex Smith who is actually recorded in one or more filings as (the again made-up) John Alexander Smith.
That said, we do have a minimum viable research tool here that gives us a starting point for doing a very quick (though admittedly heavily caveated) search around companies that a councillor may be (or may have been – I’m not checking dates, remember) associated with.
We also have a tool around which we can start to develop a germ of an idea around conflict of interest detection.
The Isle of Wight Armchair Auditor, maintained by hyperlocal blog @onthewight (and based on an original idea by @adrianshort) hosts local spending information relating to payments made by the Isle of Wight Council. If we look at the payments made to a company, we see the spending is associated with a particular service area.
If you’re a graph thinker, as I am;-), the following might then suggest itself to you:
- From OpenlyLocal, we can get a list of councillors and the committees they are on;
- from OnTheWight’s Armchair Auditor, we can get a list of companies the council has spent money with;
- from OpenCorporates, we can get a list of the companies that councillors may be directors of;
- from OpenCorporates, we should be able to get identifiers for at least some of the companies that the council has spent money with;
- putting those together, we should be able to see whether or not a councillor may be a director of a company that the council is spending with and how much is being spent with them in which spending areas;
- we can possibly go further, if we can associate council committees with spending areas – are there councillors who are members of a committee that is responsible for a particular spending area who are also directors of companies that the council has spent money with in those spending areas? Now there’s nothing wrong with people who have expertise in a particular area sitting on a related committee (it’s probably a Good Thing). And it may be that they got their experience by working as a director for a company in that area. Which again, could be a Good Thing. But it begs a transparency question that a journalist might well be interested in asking. And in this case, with open data to hand, might technology be able to help out? For example, could we automatically generate a watch list to check whether or not councillors who are directors of companies that have received monies in particular spending areas (or more generally) have declared an interest, as would be appropriate? I think so…(caveated of course by the fact that there may be false positives and false negatives in the report…; but it would be a low effort starting point).
Once you get into this graph based thinking, you can take it mich further of course, for example looking to see whether councillors in one council are directors of companies that deal heavily with neighbouring councils… and so on.. (Paranoid? Me? Nah… Just trying to show how graphs work and how easy it can be to start joining dots once you start to get hold of the data…;-)
Anyway – this is all getting off the point and too conspiracy based…! So back to the point, which was along the lines of this: here we have the fumblings of a tool for mixing and matching data from two aggregators of public information, OpenlyLocal and OpenCorporates that might allow us to start running crude conflict of interest checks. (It’s easy enough to see how we can run the same idea using lists of MP names from the TheyWorkForYou API; or looking up directorships previously held by Ministers and the names of companies of lobbiests they meet (does WhosLobbying have an API of such things?). And so on…
Now I imagine there are commercial services around that do this sort of thing properly and comprehensively, and for a fee. But it only took me a couple of hours, for free, to get started, and having started, the paths to improvement become self-evident… and some of them can be achieved quite quickly (it just takes a little (?!) but of time…) So I wonder – could the information industry be at risk of disruption from open public data?
PS if you’re into conspiracies, Cambridge’s Centre for Research in the Arts, Social Sciences and Humanities (CRASSH) has a post-doc positions open with Professior John Naughton on The impact of global networking on the nature, dissemination and impact of conspiracy theories. The position is complemented by several parallel fellowships, including ones on Rational Choice and Democratic Conspiracies and Ideals of Transparency and Suspicion of Democracy.
FWIW, a copy of the slides I used in my ILI2012 presentation earlier this week – Making the most of structured content:data products from OpenLearn XML:
I guess this counts as a dissemination activity for my related eSTEeM project on course related custom search engines, since the work(?!) sort of evolved out of that idea…
The thesis is this:
- Course Units on OpenLearn are available as XML docs – a URL pointing to the XML version of a unit can be derived from the Moodle URL for the HTML version of the course; (the same is true of “closed” OU course materials). The OU machine uses the XML docs as a feedstock for a publication process that generates HTML views, ebook views, etc, etc of a course.
- We can treat XML docs as if they were database records; sets of structured XML elements can be viewed as if they define database tables; the values taken by the structured elements are like database table entries. Which is to say, we can treat each XML docs as a mini-database, or we we can trivially extract the data and pop it into a “proper”/”real” database.
- given a list of courses we can grab all the corresponding XML docs and build a big database of their contents; that is, a single database that contains records pulled from course XML docs.
- the sorts of things that we can pull out of a course include: links, images, glossary items, learning objectives, section and subsection headings;
- if we mine the (sub)section structure of a course from the XML, we can easily provide an interactive treemap version of the sections and subsections in a course; generating a Freemind mindmap document type, we can automatically generate course-section mindmap files that students can view – and annotate – in Freemind. We can also generate bespoke mindmaps, for example based on sections across OpenLearn courses that contain a particular search term.
- By disaggregating individual course units into “typed” elements or faceted components, and then reaggreating items of a similar class or type across all course units, we can provide faceted search across, as well as university wide “meta” view over, different classes of content. For example:
- by aggregating learning objectives from across OpenLearn units, we can trivially create a search tool that provides a faceted search over just the learning objectives associated with each unit; the search returns learning outcomes associated with a search term and links to course units associated with those learning objectives; this might help in identifying reusable course elements based around reuse or extension of learning outcomes;
- by aggregating glossary items from across OpenLearn units, we can trivially create a meta glossary for the whole of OpenLearn (or similarly across all OU courses). That is, we could produce a monolithic OpenLearn, or even OU wide, glossary; or maybe it’s useful to have redefine the same glossary terms using different definitions, rather than reuse the same definition(s) consistently across different courses? As with learning objectives, we can also create a search tool that provides a faceted search over just the glossary items associated with each unit; the search returns glossary items associated with a search term and links to course units associated with those glossary items;
- by aggregating images from across OpenLearn units, we can trivially create a search tool that provides a faceted search over just the descriptions/captions of images associated with each unit; the search returns the images whose description/captions are associated with the search term and links to course units associated with those images. This disaggregation provides a direct way of search for images that have been published through OpenLearn. Rights information may also be available, allowing users to search for images that have been rights cleared, as well as openly licensed images.
- the original route in was the extraction of links from course units that could be used to seed custom search engines that search over resources referenced from a course. This could in principle also include books using Google book search.
I also briefly described an approach for appropriating Google custom search engine promotions as the basis for a search engine mediated course, something I think could be used in a sMoocH (search mediated MOOC hack). But then MOOCs as popularised have f**k all to do with innovation, don’t they, other than in a marketing sense for people with very little imagination.
During questions, @briankelly asked if any of the reported dabblings/demos (and there are several working demo) were just OUseful experiments or whether they could in principle be adopted within the OU, or even more widely across HE. The answers are ‘yes’ and ‘yes’ but in reality ‘yes’ and ‘no’. I haven’t even been able to get round to writing up (or persuading someone else to write up) any of my dabblings as ‘proper’ research, let alone fight the interminable rounds of lobbying and stakeholder acquisition it takes to get anything adopted as a rolled out as adopted innovation. If any of the ideas were/are useful, they’re Googleable and folk are free to run with them…but because they had no big budget holding champion associated with their creation, and hence no stake (even defensively) in seeing some sort of use from them, they unlikely to register anywhere.
Last week I gave a presentation at the DCMS describing some hands-on tools for getting started with creating data powered visualisations (Visualisation Tools to Support Data Engagement) at the invitation of the Arts Council’s James Doeser from the Arts Council in the context of the DCMS CASE (Culture and Sport Evidence) Programme, #CASEprog:
I’ve also posted a resource list as a delicious stack: CASEprog – Visualisation Tools (Resource List).
Whilst preparing the presentation, I had a dig through the DCLG sponsored Improving Visualisation for the Public Sector site, which provides pathways for identifying appropriate visualisation types based on data type, policy objectives/communication goals and anticipated audience level. It struck me that being able to pick an appropriate visualisation type is one thing, but being able to create it is another.
My presentation, for example, was based very much around tools that could provide a way in to actually creating visualisations, as well as shaping and representing data so that it can be plugged straight in to particular visualisation views.
So I’m wondering, is there maybe an opportunity here for a practical programme of work that builds on the DCLG Improving Visulisation toolkit by providing worked, and maybe templated, examples, with access to code and recipes wherever possible, for actually creating examples of exemplar visualisation types from actual open/public data set that can be found on the web?
Could this even be the basis for a set of School of Data practical exercises, I wonder, to actual create some of these examples?
Earlier this week, I popped over to Lincoln to chat to @josswinn and @jmahoney127 about their ON Course course data project (I heartily recommend Jamie’s ON Course project blog), hopefully not setting them off down too many ratholes, erm, err, oops?!, as well as bewildering a cohort of online journalism students with a rapid fire presentation about data driven journalism…
I think I need to draw a map…
Just noticed that I didn’t post a copy of the second of my three presentations last week, Visual Conversations With Data, delivered to the eSTeEM Collquium Pictures to Help People Think and Act on diagramming, et al., in education.
Something I meant to say, but didn’t, is that one of the problems with folks’ prior expectations or assumptions about data visualisations is that they reveal a single truth. I’m not sure that’s the case (I’ve started pondering the phrase “no truth, many truths” in this respect) which is another reason I see the role of visualisation as being a participant in a conversation where you explore questions and ideas and try to actively tease out different perspectives around a hypothesis based on the data.
For a related take on this idea of “many possible truths”, see Paul Bradshaw’s The Straw Man of Data Journalism’s “Scientific” Claim.
Here’s a delicious stack of related resources.
I’m wondering if I over enthused on UK Government engagement with open data and open standards?! Ho hum, that’s my public service duty for the day if so…;-)
A copy of the presentation I gave at the OU-eSTeEM conference (no event URL?) on generating custom course search engines and mining OU XML documents to generate course mindmaps (Making More of Structured Documents presentation; delicious stack/bookmark list of related resources):
Chatting to Jonathan Fine after the event, he gave me the phrase secondary products to describe things like course mindmaps that can be generated from XML source files of OU course materials. From what I can tell, there isn’t much if any work going on in the way of finding novel ways of exploiting the structure of OU structured course materials, other than using them simply as a way of generating different presentational views of the course materials as a whole (that is, HTML versions, maybe mobile friendly versions, PDF versions). (If that’s not the case, please feel free to put me right in the comments:-)
One thing Jonathan has been scouring the documents for is evidence of mathematical content across the courses; he also mentioned a couple of ideas relating to access audits over the content itself, such as extracting figure headings, or image captions. (This reminded me of the OpenLearn XML processor (and redux) I first played with 4 years ago (sigh… and nothing’s changed… sigh….), which stripped assets by type from the first generation of OU XML docs). So on my to do list is to have a deeper look at the structure of OU XML, have a peek at what sorts of things might meaningfully (and easily;-) extracted, and figure out two or three secondary products that can be generated as a result. Note that these products might be products for different audiences, at different times of the course lifecycle: tools for use by course team or LTS during production (such as accessibility checks), products to support maintenance (there is already a link checker, but maybe there is more that can be done here?), products for students (such as the mindmap), products for alumni, products for OpenLearn views over the content, products to support “learning analytics”, and so on. (If you have any ideas of what forms the secondary products might take, or what structures/elements/entities you’d like to see mined from OU XML, please let me know via the comments. For an example of an OU XML doc, see here.
I had the honour of being invited to talk at the JIBS User Group 20 Anniversary AGM yesterday, and as well as having a bit of a rant in the closing plenary about opening up and making internal reuse of data and making FOI requests about SCONUL data*, I also gave this sideways take on Ranganathan’s Five Laws of Library Science for the current age (The Frictionless Library).
Amongst other things, the presentation sketches a possible project (that I think could make for a good workshop day) revisiting each of the laws in network context using the various techniques of constitutional interpretation and (briefly) revisits at least one of the notions of the Invisible Library (see also The Invisible Library (ILI, 2009), another meaningless set of slides…;-)
* Note to self: read up about the current HESA HE Information Landscape Project (Redesigning the higher education data and information landscape). Also check out the “KB+” JISC project (programme?) that will “develo[p] a shared community service that will improve the quality, accuracy, coverage and availability of data for the management, selection, licensing, negotiation, review and access of electronic resources for UK HE” (via @benshowers) and the Talis Aspire Community Edition (aggregated reading lists across several HEIs).
PS I’m working out how to make the slides a little bit more useful as a post hoc/legacy resource by posting them with a bit a context and commentary… But it may take a bit of time…
PPS on the way home, I listened to this Long Now Foundation seminar by Brewster Kahle on Universal Access to All Knowledge, which got me wondering about the extent to which University libraries are depositing resources into the Internet Archive..? There’s a nice piece at the end that makes the point that IPR is such that in terms of the digital record, there’s likely to be a gap in the timeline of archived content right around the 20th century…
PPPS as far as library futures go, here’s a loosely related Roadmapping TEL activity on “Ideas that influence the future of technology enhanced learning” that is currently running on Ideascale.
There were also several discussions during the day relating to information skills needs for 21st century librarians. Some of the ANCIL reports from the Arcadia project on a new information literacy curriculum may be of interest to JIBS members in this regard, I think? Arcadia Project Report
I think there’s a real need for librarians to help folk make sense of the wealth of data out there, and this in part requires a good understanding of network structures and organisations, not just a concentration on hierarchical models.
Hear (sic) also, for example, OU Vice Chancellor Martin Bean on ‘sensemaking’ and the role of the library from his 2010 ALT-C Keynote:
I think it’s also time to start seeing people as information and knowledge resources, as well as just texts…