May 2012 – Page 2 – OUseful.Info, the blog…

From Academic Privilege to Consultations as Peer Review

I’m not sure how to start this post, so I’ll open it with a crappy graph:

As the internet changes the way we access information – and knowledge – and the availability of information influences how we can hold people to account, I find myself increasingly drawn to the way that academia/education, the press, and that ill-defined sector I’ve lumped together as public policy development/government can influence and impact on each other.

All three sectors engage in research and investigation to different degrees, and on different timescales with different budgets. A recent report by the US Federal Communciations Commission (FCC) on Information Needs of Communities included a chapter on The Media Food Chain and the Functions of Journalism, which references the “Tom Rosensteil model” of journalism as a service providing the following eight functions: Authentication, Watch Dog, Witness, Forum Leader, Sense Making, Smart Aggregation, Empowerment, and Role Model. As I keep trying to make sense of what it is that academia actually does (in both its research and teaching/education modes), it seems to me that there is an overlap with many of these functions, although maybe with a different emphasis and supported by a different evidentiary tradition. (On my to do list is the application of these lenses to the functions of academia…unless someone has already done such a mapping? It may also be interesting to consider a view of academia via Kovach and Rosenstein’s Nine Principles of Journalism (as for example critiqued in Nieman Report Special Issue from 2001 on “The Elements of Journalism”).) (For reference, the FCC report points to Rosentein’s model via a Pew Research Centre report on the future of public relations.)

Here are a few more pieces that I think are part of this jigsaw:

Alex Bailin QC and Edward Craven writing in the Inforrm blog about Investigative journalism and the criminal law, a review of the DPP’s recent Interim guidelines for prosecutors on assessing the public interest in cases affecting the media . I don’t think there are similar guidelines for academic researchers, but maybe that’s because academic researchers are not in the business of holding powers that be to account in the way the press does, don’t tend to engage in research in “the public interest” (is this true???) or use research methods that border on the illegal in the pursuit of matters “in the public interest”? (Would such acts get through the research ethics committees, I wonder? In what cases would/do “public interest” and “topic of academic research” intersect?)
Academia’s relationship with the Freedom of Information Act: as public bodies, universities are open to FOI requests, as is publicly funded academic research (e.g. see JISC’s Freedom of Information and research data: Questions and answers; see also this Universities UK submission to inform the post-legislative scrutiny of the Freedom of Information Act from February 2012, and RIN: Freedom of information: helping researchers meet the challenge). But FOI can also be used as research tool in its own right (for example, Freedom of information – a tool for researchers). And if FOIs are used for research, does the act of asking in some sense hold the body to which the request was made to account for something?
Several years ago, there was a shift in science communication from the notion of “Public Understanding of Science” to “Public Engagement with Science”, in particular, the “upstreaming” of engagement (Demos: See-through science: Why public engagement needs to move upstream), summarised by the OU’s Richard Holliman as follows:

What is upstream public engagement with science (and technology; PEST)?
A concept developed in response to perceived limitations of the ‘public understanding of science’. Upstream PEST, which was developed in response to detailed research studies, has been used in a number of different ways in recent years. It is generally understood to be based on a more sophisticated model of science communication – one that acknowledges that communication does (and should) flow in more than one direction, i.e. not just from scientists to the public, but also from various publics to scientists, and from publics to publics, and from various other stakeholders to publics, scientists, and so on.

Those who advocate an upstream PEST approach argue that useful and relevant expertise is distributed among different publics, stakeholders and scientific experts. Expertise is more distributed, nuanced and contingent, particularly in relation to frontier science where complexity, uncertainty and controversy abound. It follows that for (upstream) PEST approaches to be successful, a range of publics should be given early (in the case of emerging technoscience, this should be upstream), and then regular, routine opportunities to contribute to the governance of science.

What’s important here, I think, is the notion that governance of research effort, and at one or two steps remove, the development of public research funding programmes, is influenced by conversations between academia and their publics.
Notwithstanding the current debate around the commercial business models based around the publication of academic research papers (e.g. David Willetts on Open, free access to academic research? and the Working Group on Expanding Access to Published Research Findings), it’s worth noting that “there’s more” when it comes to the sort of commons offered by publications that are recognised as academic publications. As noted above, academic research may tend to matters “academic” rather than “in the public interest”, although we might argue that it is in the public interest to promote claims supported by the higher standards (?) of truth that are expected in the academic tradition, and as resulting from the scientific method, above conjecture, personal opinion and other less defensible routes to “knowledge”. In the Defamation Bill recently introduced to the House of Commons, I note that “Peer-reviewed statement[s] in scientific or academic journal etc” are given special treatment:

6 Peer-reviewed statement in scientific or academic journal etc
(1) The publication of a statement in a scientific or academic journal is privileged if the following conditions are met.
(2) The first condition is that the statement relates to a scientific or academic matter.
(3) The second condition is that before the statement was published in the journal an independent review of the statement’s scientific or academic merit was carried out by—
(a) the editor of the journal, and
(b) one or more persons with expertise in the scientific or academic matter concerned.
(4) Where the publication of a statement in a scientific or academic journal is privileged by virtue of subsection (1), the publication in the same journal of any assessment of the statement’s scientific or academic merit is also privileged if —
(a) the assessment was written by one or more of the persons who carried out the independent review of the statement; and
(b) the assessment was written in the course of that review.
(5) Where the publication of a statement or assessment is privileged by virtue of this section, the publication of a fair and accurate copy of, extract from or summary of the statement or assessment is also privileged.
(6) A publication is not privileged by virtue of this section if it is shown to be made with malice.
(7) Nothing in this section is to be construed—
(a) as protecting the publication of matter the publication of which is prohibited by law;
(b) as limiting any privilege subsisting apart from this section.
(8) The reference in subsection (3)(a) to “the editor of the journal” is to be read, in the case of a journal with more than one editor, as a reference to the editor or editors who were responsible for deciding to publish the statement concerned.

Note that for the purposes of the Bill, ““statement” means words, pictures, visual images, gestures or any other method of signifying meaning.

As far as the academic/press/policy confusion goes, it is worth noting here that academic communications are given privileged treatment. I’m not sure what defines a communication as academic though?
As the previous item suggests, an important role played by academics engaged in research is their engagement with the process of peer review. To what extent does this – or should it – also cover peer review of policy, either formally, or via an open, opt-in, peer review process? The Government Code of Practice on Consultation starts off with the following criterion:

When to consult
Formal consultation should take place at a stage when there is scope to influence the policy outcome.
1.1 Formal, written, public consultation will often be an important stage in the policymaking process. Consultation makes preliminary analysis available for public scrutiny and allows additional evidence to be sought from a range of interested parties so as to inform the development of the policy or its implementation.
1.2 It is important that consultation takes place when the Government is ready to put sufficient information into the public domain to enable an effective and informed dialogue on the issues being consulted on. But equally, there is no point in consulting when everything is already settled. The consultation exercise should be scheduled as early as possible in the project plan as these factors allow.
1.3 When the Government is making information available to stakeholders rather than seeking views or evidence to influence policy, e.g. communicating a policy decision or clarifying an issue, this should not be labelled as a consultation and is therefore not in the scope of this Code. Moreover, informal consultation of interested parties, outside the scope of this Code, is sometimes an option and there is separate guidance on this.
1.4 It will often be necessary to engage in an informal dialogue with stakeholders prior to a formal consultation to obtain initial evidence and to gain an understanding of the issues that will need to be raised in the formal consultation. These informal dialogues are also outside the scope of this code.
1.5 Over the course of the development of some policies, the Government may decide that more than one formal consultation exercise is appropriate. When further consultation is a more detailed look at specific elements of the policy, a decision will need to be taken regarding the scale of these additional consultative activities. In deciding how to carry out such re-consultation, the department will need to weigh up the level of interest expressed by consultees in the initial exercise and the burden that running several consultation exercises will place on consultees and any potential delay in implementing the policy. In most cases where additional exercises are appropriate, consultation on a more limited scale will be more appropriate. In these cases this Code need not be observed but may provide useful guidance.
…

Note also criterion 5:

The burden of consultation
Keeping the burden of consultation to a minimum is essential if consultations are to be effective and if consultees’ buy-in to the process is to be obtained.
5.1 When preparing a consultation exercise it is important to consider carefully how the burden of consultation can be minimised. While interested parties may welcome the opportunity to contribute their views or evidence, they will not welcome being asked the same questions time and time again. If the Government has previously obtained relevant information from the same audience, consideration should be given as to whether this information could be reused to inform the policymaking process, e.g. is the information still relevant and were all interested groups canvassed? Details of how any such information was gained should be clearly stated so that consultees can comment on the existing information or contribute further to this evidence-base.
5.2 If some of the information that the Government is looking for is already in the public domain through market research, surveys, position papers, etc., it should be considered how this can be used to inform the consultation exercise and thereby reduce the burden of consultation.
5.3 In the planning phase, policy teams should speak to their Consultation Coordinator and other policy teams with an interest in similar sectors in order to look for opportunities for joining up work so as to minimise the burden of consultations aimed at the same groups.
5.4 Consultation exercises that allow consultees to answer questions directly online can help reduce the burden of consultation for those with the technology to participate. However, the bureaucracy involved in registering (e.g. to obtain a username and password) should be kept to a minimum.
5.5 Formal consultation should not be entered into lightly. Departmental Consultation Coordinators and, most importantly, potential consultees will often be happy to advise about the need to carry out a formal consultation exercise and acceptable alternatives to a formal exercise.

To a certain extent, we might consider this a model for supporting upstream engagement with policy development. (The reality seems to me, more often than not, that consultation exercises are actually pre-emptive PR strikes that allow arguments against an already decided upon policy to be defused before the formal announcement of the policy in the hours after (or even before…) the consultation closes, notwithstanding point 1.2;-)

We might also see consultation exercises, and requests for comments on reports, as an opportunity for academia to engage in open peer review of policy developments, as well as a ‘pathway to impact’ (eg Research Councils UK – Pathways to Impact. Hmmm…the Pathways to impact route map may be handy for putting together a slightly more detailed version of the crappy graph…;-) As I’ve posted previously (e.g. News, Courses and Scrutiny), there may also be opportunities for using consultations as a hook, or frame, for educational materials in context, where the context is (evidence based) policy development and the consultation around it.

So – more pieces… Now I just need to start putting them together…

Draft Communications Data Bill, The Second Coming?

As many will already know, the 2012 Queen’s Speech included mention of a Draft Communications Data Bill (would JISC folk class this as being all about paradata, I wonder?!;-)- here are the relevant briefing notes as published by the Home Office Press Office – Queen’s Speech Briefing Notes:

Draft Communications Data Bill
“My Government intends to bring forward measures to maintain the ability of the law enforcement and intelligence agencies to access vital communications data under strict safeguards to protect the public, subject to scrutiny of draft clauses.”

The purpose of the draft Bill is to:

The draft Bill would protect the public by ensuring that law enforcement agencies and others continue to have access to communications data so that they can bring offenders to justice.

What is communications data:
– Communications data is information about a communication, not the communication itself. – Communication data is NOT the content of any communication – the text of an email, or conversation on a telephone.
– Communications data includes the time and duration of the communication, the telephone number or email address which has been contacted and sometimes the location of the originator of the communication.

The main benefits of the draft Bill would be:

– The ability of the police and intelligence agencies to continue to access communications data which is vital in supporting their work in protecting the public.
– An updated framework for the collection, retention and acquisition of communications data which enables a flexible response to technological change.

The main elements of the draft Bill are:

– Establishing an updated framework for the collection and retention of communications data by communication service providers (CSPs) to ensure communications data remains available to law enforcement and other authorised public authorities.
– Establishing an updated framework to facilitate the lawful, efficient and effective obtaining of communications data by authorised public authorities including law enforcement and intelligence agencies.
– Establishing strict safeguards including: a 12 month limit of the length of time for which communications data may be retained by CSPs and measures to protect the data from unauthorised access or disclosure. (It will continue to be the role of the Information Commissioner to keep under review the operation of the provisions relating to the security of retained communications data and their destruction at the end of the 12 month retention period).
– Providing for appropriate independent oversight including: extending the role of the Interception of Communications Commissioner to oversee the collection of communications data by communications service providers; providing a communications service provider with the ability to consult an independent Government / Industry body (the Technical Advisory Board) to consider the impact of obligations placed upon them; extending the role of the independent Investigatory Powers Tribunal (made up of senior judicial figures) to ensure that individuals have a proper avenue of complaint and independent investigation if they think the powers have been used unlawfully.
– Removing other statutory powers with weaker safeguards to acquire communications data.

Existing legislation in this area is:

Regulation of Investigatory Powers Act 2000
The Data Retention (EC Directive) Regulations 2009

It’s worth remembering that this is the second time in recent years that a draft communications data bill has been mooted. Here’s how it was described last time round, in the 2008/2009 draft legislative programme:

“A communications data bill would help ensure that crucial capabilities in the use of communications data for counter-terrorism and investigation of crime continue to be available. These powers would be subject to strict safeguards to ensure the right balance between privacy and protecting the public;”
…
The purpose of the Bill is to: allow communications data capabilities for the prevention and detection of crime and protection of national security to keep up with changing technology through providing for the collection and retention of such data, including data not required for the business purposes of communications service providers; and to ensure strict safeguards continue to strike the proper balance between privacy and protecting the public.

The main elements of the Bill are:
– Modify the procedures for acquiring communications data and allow this data to be retained
– Transpose EU Directive 2006/24/EC on the retention of communications data into UK law.

The main benefits of the Bill are:
– Communications data plays a key role in counter-terrorism investigations, the prevention and detection of crime and protecting the public. The Bill would bring the legislative framework on access to communications data up to date with changes taking place in the telecommunications industry and the move to using Internet Protocol (IP) core network
– Unless the legislation is updated to reflect these changes, the ability of authorities to carry out their counter-terror, crime prevention and public safety duties and to counter these threats will be undermined.

(See also some briefing notes from the time (January 2009).)

What strikes me immediately about the earlier statement was its use of anti-terrorism rhetoric to justify the introduction of the proposed bill, rhetoric which appears to have been dropped this time round.

It’s also worth noting that the 2008 proposals regarding EU Directive 2006/24/EC (retention of communications data) were passed in to law via a Statutory Instrument, The Data Retention (EC Directive) Regulations 2009, regulations that it appears will be up for review/revision via the new draft bill. In those regulations:

[2b] – “communications data” means traffic data and location data and the related data necessary to identify the subscriber or user;
…
[2d] – “location data” means data processed in an electronic communications network indicating the geographical position of the terminal equipment of a user of a public electronic communications service, including data relating to: (i) the latitude, longitude or altitude of the terminal equipment, (ii) the direction of travel of the user, or (iii) the time the location information was recorded;
[2e] – “public communications provider” means: (i) a provider of a public electronic communications network, or (ii) a provider of a public electronic communications service; and “public electronic communications network” and “public electronic communications service” have the meaning given in section 151 of the Communications Act 2003(1); [from that act: “public electronic communications network” means an electronic communications network provided wholly or mainly for the purpose of making electronic communications services available to members of the public; “public electronic communications service” means any electronic communications service that is provided so as to be available for use by members of the public;]
…
[2g] – “traffic data” means data processed for the purpose of the conveyance of a communication on an electronic communications network or for the billing in respect of that communication and includes data relating to the routing, duration or time of a communication;
[2h] – “user ID” means a unique identifier allocated to persons when they subscribe to or register with an internet access service or internet communications service.

[3] These Regulations apply to communications data if, or to the extent that, the data are generated or processed in the United Kingdom by public communications providers in the process of supplying the communications services concerned.

As more and more online services start to look at what data they may be able to collect about their users, it’s maybe worth bearing in mind the extent to which they are a “public electronic communications service” and any proposed legislation they may have to conform to.

As and when this draft bill is announced formally, I think it could provide a good opportunity for a wider discussion about the ethics of communications/paradata collection and use.

PS Although it’s unlikely to get very far, I notice that a Private Member’s Bill on Online Safety was introduced last week with the intention to Make provision about the promotion of online safety; to require internet service providers and mobile phone operators to provide a service that excludes pornographic images; and to require electronic device manufacturers to provide a means of filtering content where “electronic device” means a device that is capable of connecting to an internet access service and downloading content.

Viewing OpenLearn Mindmaps Using d3.js

In a comment on Generating OpenLearn Navigation Mindmaps Automagically, Pete Mitton hinted that the d3.js tree layout example might be worth looking at as a way of visualising hierarchical OpenLearn mindmaps/navigation layouts.

It just so happens that there is a networkx utility that can publish a tree structure represented as a networkx directed graph in the JSONic form that d3.js works with (networkx.readwrite.json_graph), so I had a little play with the code I used to generate Freemind mind maps from OpenLearn units and refactored it to generate a networkx graph, and from that a d3.js view:

(The above view is a direct copy of Mike Bostock’s example code, feeding from an automagically generated JSON representation of an OpenLearn unit.)

For demo purposes, I did a couple of views: a pure HTML/JSON view, and a Python one, that throws the JSON into an HTML template.

The d3.js JSON generating code can be found on Scraperwiki too: OpenLearn Tree JSON. When you run the view, it parses the OpenLearn XML and generates a JSON representation of the unit (pass the unit code via a ?ucode=UNITCODE URL parameter, for example https://scraperwiki.com/views/openlearn_tree_json/?unit=OER_1.

The Python powered d3.js view also responds to the unit URL parameter, for example:
https://views.scraperwiki.com/run/d3_demo/?unit=OER_1

The d3.js view is definitely very pretty, although at times the layout is a little cluttered. I guess the next step is a functional one, though, which is to find how to linkify some of the elements so the tree view can act as a navigational surface.

On “Engineering”…

I’ve been pondering what is is to be an engineer, lately, in the context of trying to work out what it is that I actually do and what sort of “contract” I feel I’m honouring (and with whom) by doing whatever that is that spend my days doing…

According to Wikipedia, [t]he term engineering … deriv[es] from the word engineer, which itself dates back to 1325, when an engine’er (literally, one who operates an engine) originally referred to “a constructor of military engines.” … The word “engine” itself is of even older origin, ultimately deriving from the Latin ingenium (c. 1250).

Via Wiktionary, [e]ngine originally meant ‘ingenuity, cunning’ which eventually developed into meaning ‘the product of ingenuity, a plot or snare’ and ‘tool, weapon’. Engines as the products of cunning, then, and hence, naturally, war machines. And engineers as their operators, or constructors.

One of the formative books in my life (mid-teens, I think) was Richard Gregory’s Mind in Science, from which I took away the idea of tools as things that embodied and executed an idea. You see a way of doing something or how to do something, and then put that idea into an artefact – a tool – that does it. Code is a particularly expressive medium in this respect, AI (in the sense of Artificial Intelligence) one way of explicitly trying to give machines ideas, or embody mind in machine. (I have an AI background – my PhD in evolutionary computation was pursued in a cognitive science unit (HRCL, as was) at the OU; what led me to “AI”, I think, was a school of thought relating to the practice of how to use code to embody mind and natural process in machines, as well as how to use code that can act on, and be acted on by, the physical world.)

So part of what I (think I) do is build tools, executable expressions of ideas. I’m not so interested in how they are used. I’ve also started sketching maps a lot, lately, of social networks and other things that can be represented as graphs. These are tools too – macroscopes for peering at structural relationships within a system – and again, once produced, I’m not so interested in how they’re used. (What excites me is finding the process that allows the idea to be represented or executed.)

If we go back to the idea of “engineer”, and dig a little deeper by tracing the notion of ingenium, we find this take on it:

ingenium is the original and natural faculty of humans; it is the proper faculty with which we achieve certain knowledge. It is original because it is the first “facility” young people untouched by prejudices exemplify upon seeing similarities between disparate things. It is natural because it is to us what the power to create is to God. just as God easily begets a world of nature, so we ingeniously make discoveries in the sciences and artifacts in the arts. Ingenium is a productive and creative form of knowledge. It is poietic in the creation of the imagination; it is rhetorical in the creation of language, through which all sciences are formalized. Hence, it requires its own logic, a logic that combines both the art of finding or inventing arguments and that of judging them. Vico argues that topical art allows the mind to locate the object of knowledge and to see it in all its aspects and not through “the dark glass” of clear and distinct ideas. The logic of discovery and invention which Vico uses against Descartes’s analytics is the art of apprehending the true. With this Vico come full circle in his arguments against Descartes. [From the Introduction by L.M. Palmer to Vico on Ingenium, in Giambattista Vico: On the Most Ancient Wisdom of the Italians. Trans. L.M. Palmer. London: Cornell University Press, 1988. 31-34, 96-104. Originally published 1710.]

And for some reason, at first reading, that brings me peace…

…which I shall savour on a quick dog walk. I wonder if the woodpecker will be out in the woods today?

Structured Data for Course Web Pages and Customised Custom Search Engine Sorting

As visitors to any online shopping site will know, it’s often possible to sort search query results by price, or number of ‘review stars’, or filter items to show only books by a specified author, or publisher, for example. Via Phil Bradley, I see it’s now possible to introduce custom filtering and sorting elements into Google Custom Search Engine results.

(If you’re not familiar with Google’s Custom Search Engines (CSE), they’re search engines that only search over, or prioritise results from, a limited set of web pages/web domains. Google CSEs power my Course Detective and UK University Libraries search engines. (Hmm… I suspect Course Detective has rotted a bit by now…:-(

What this means is that if web pages are appropriately marked up, they can be sorted, filterd or ranked accordingly when returned as a search result in a Google CSE. So for example, if course pages were marked up with academic level, start date, NSS satisfaction score, or price, they could be sorted along those lines.

So how do pages need to be marked up in order to benefit from this feature? There are several ways:

Simply add meta-tags to a web page. For example, <meta name=”course.identifier” content=”B203″ />
using Rich Snippets supporting markup (i.e. microdata/microformats/RDFa)
As PageMap data added to a sitemap, or webpage. PageMap data also allows for the definition of actions, such as “Download”, that can be emphasised as such within a custom search result. (Facebook is similarly going down the path of trying to encourage developers to use verb driven, action related semantics (Facebook Actions))

I wonder about the extent to which JISC’s current course data programme of activities could be used to encourage institutions to explore the publication of some of their course data in this way? For example, might it be possible to transform XCRI feeds such as the Open University XCRI feed, into PageMap annotated sitemaps?

Something like a tweaked Course Detective CSE could then act as a quick demonstrator of what benefits can be immediately realised? So for example, from the Google CSE documentation on Filtering and sorting search results (I have to admit I haven’t played with any of this yet…), it seems that as well as filtering results by attribute, it’s also possible to use them to filter and rank (or at least, bias) results:

sort by attribute: for example, we might sort courses by start date;
bias by attribute: for example, boost results according to student satsifaction levels;
restrict to range: for example, only show courses within a specified range of SCQF levels.

Not to self: have a rummage around the XCRI data definitions/vocabularies resources… I also wonder if there is a mapping of XCRI elements onto simple attribute names that could be used to populate eg meta tag or PageMap name attributes?

Feeding on OU/BBC Co-Produced Content (Upcoming and Currently Available on iPlayer)

What feeds are available listing upcoming broadcasts of OU/BBC co-produced material or programmes currently available on iPlayer?

One of the things I’ve been pondering with respect to my OU/BBC programmes currently on iPlayer demo and OU/BBC co-pros upcoming on iPlayer (code) is how to start linking effectively across from programmes to Open University educational resources.

Chatting with KMi’s Mathieu d’Aquin a few days ago, he mentioned KMi were looking at ways of automating the creation of relevant semantic linkage that could be used to provide linkage between BBC programmes and OU content and maybe feed into the the BBC’s dynamic semantic publishing workflow.

In the context of OU and BBC programmes, one high level hook is the course code. Although I don’t think these feeds are widely promoted as a live service yet, I did see a preview(?) of an OU/BBC co-pro series feed that includes linkage options such as related course code (one only? Or does the schema allow for more than one linked course?) and OU nominated academic (one only? Or does the schema allow for more than one linked academic? More than one), as well as some subject terms and the sponsoring Faculty:

  <item>
    <title><![CDATA[OU on the BBC: Symphony]]></title>
    <link>http://www.open.edu/openlearn/whats-on/ou-on-the-bbc-history-the-symphony</link>
    <description><![CDATA[Explore the secrets of the symphony, the highest form of expression of Western classical music]]></description>
    <image title="The Berrill Building">http://www.open.edu/openlearn/files/ole/ole_images/general_images/ou_ats.jpg</image>
    <bbc_programme_page_code>b016vgw7</bbc_programme_page_code>
    <ou_faculty_reference>Music Department</ou_faculty_reference>
    <ou_course_code>A179</ou_course_code>
    <nominated_academic_oucu></nominated_academic_oucu>
    <transmissions>
        <transmission>
            <showdate>21:00:00 24/11/2011</showdate>
            <location><![CDATA[BBC Four]]></location>
            <weblink></weblink>
        </transmission>
        <transmission>
            <showdate>19:30:00 16/03/2012</showdate>
            <location><![CDATA[BBC Four]]></location>
            <weblink></weblink>
        </transmission>
        <transmission>
            <showdate>03:00:00 17/03/2012</showdate>
            <location><![CDATA[BBC Four]]></location>
            <weblink></weblink>
        </transmission>
        <transmission>
            <showdate>19:30:00 23/03/2012</showdate>
            <location><![CDATA[BBC Four]]></location>
            <weblink></weblink>
        </transmission>
        <transmission>
            <showdate>03:00:00 24/03/2012</showdate>
            <location><![CDATA[BBC Four]]></location>
            <weblink></weblink>
        </transmission>
    </transmissions>
     <comments>http://www.open.edu/openlearn/whats-on/ou-on-the-bbc-history-the-symphony#comments</comments>
 <category domain="http://www.open.edu/openlearn/whats-on">What's On</category>
 <category domain="http://www.open.edu/openlearn/tags/bbc-four">BBC Four</category>
 <category domain="http://www.open.edu/openlearn/tags/music">music</category>
 <category domain="http://www.open.edu/openlearn/tags/symphony">symphony</category>
 <pubDate>Tue, 18 Oct 2011 10:38:03 +0000</pubDate>
 <dc:creator>mc23488</dc:creator>
 <guid isPermaLink="false">147728 at http://www.open.edu/openlearn</guid>
  </item>

I’m not sure what the guid is? Nor do there seem to be slots for links to related OpenLearn resources other than the top link element? However, the course code does provide a way into course related educational resources via data.open.ac.uk, the nominated academic link may provide a route to associated research interests (for example, via ORO, the OU open research repository), the BBC programme code provides a route in to the BBC programme metadata, and the category tags may provide other linkage somewhere depending on what vocabulary gets used for specifying categories!

I guess I need to build myself a little demo to se what we can do with a fed of this sort..?!;-)

I’m not sure if plans are similarly afoot to publish BBC programme metadata actual the actual programme instance (“episode”) level? It’s good to see that the OpenLearn What’s On feed has been tidied up little to include title elements, although it’s still tricky to work out what the feed is actually of?

For example, here’s the feed I saw a few days ago:

 <item>
    <title><![CDATA[OU on the BBC: Divine Women  - 9:00pm 25/04/2012 - BBC Two and BBC HD]]></title>
    <link>http://www.open.edu/openlearn/whats-on/ou-on-the-bbc-divine-women</link>
    <description><![CDATA[Historian Bettany Hughes reveals the hidden history of women in religion, from dominatrix goddesses to feisty political operators and warrior empresses&nbsp;]]></description>
    <location><![CDATA[BBC Two and BBC HD]]></location>
	<image title="The Berrill Building">http://www.open.edu/openlearn/files/ole/ole_images/general_images/ou_ats.jpg</image>
    <showdate>21:00:00 25/04/2012</showdate>
     <pubDate>Tue, 24 Apr 2012 11:19:10 +0000</pubDate>
 <dc:creator>sb26296</dc:creator>
 <guid isPermaLink="false">151446 at http://www.open.edu/openlearn</guid>
  </item>

It contains an upcoming show date for programmes that will be broadcast over the next week or so, and a link to a related page on OpenLearn for the episode, although no direct information about the BBC programme code for each item to be broadcast.

In the meantime, why not see what OU/BBC co-pros are currently available on iPlayer?

Or for a bitesize videos, how about this extensive range of clips from OU/BBC co-pros?

Enjoy! :-)

Trademark Galleries on Scraperwiki, via OpenCorporates

What trademarks – familiar to us all – are registered with which companies? And how often are trademarked brands in larger outlets actually ‘exclusive offerings’, aka “own range” products in weak disguise? In Looking up Images Trademarked By Companies Using OpenCorporates and Google Refine, I demonstrated a recipe for using OpenCorporates.com as a way in to previewing at least some of the trademarks registered to a specified (UK registered?) company. Here’s a recipe using Scraperwiki to do a similar thing…

But first, do you recognise any of these trademarks…

… as belonging to Tesco?

The recipe goes something like this… For a given company name keyword, look up that company on OpenCorporates and get a list of company identifiers back:

import scraperwiki,simplejson,urllib

target='tesco'

rurl='http://opencorporates.com/reconcile/gb?query='+urllib.quote(target)
#note - the opencorporates api also offers a search:  companies/search
entities=simplejson.load(urllib.urlopen(rurl))
ocids=[]
for entity in entities['result']:
    ocids.append(entity['id'].lstrip('/companies/'))

print ocids

For each of those companies, we’ll need to look up the company details on OpenCorporates. To benefit from an increased API limit, it makes sense to use an OpenCorporates API key to do this. The idea of API keys is typically that they are assigned to specific users, which is to say, you’re supposed to keep them secret. But how do we do this on Scraperwiki, an otherwise open environment? Here’s trick I found on the Scraperwiki blog that allows you to keep things like API keys secret… Make the scraper a protected one, and then hide the keys in scraper description on the scraper’s homepage using the following convention:

__BEGIN_QSENVVARS__ OC_KEY = XXXXXX __END_QSENVVARS__

where XXXXXX is your API key value (don’t use quotes around the value – use the actual value).

Here’s how we pick up the key:

import os, cgi
try:
    qsenv = dict(cgi.parse_qsl(os.getenv("QUERY_STRING")))
    ockey=qsenv["OC_KEY"]
except:
    ockey=''

To get the trademark data, we need to pull the company data for each company ID of interest, look to see if there are any trademark records associated with it in the OpenCorporates database, and if so, pull those records:

def getOCcompanyData(ocid):
    ocurl='http://api.opencorporates.com/companies/'+ocid+'/data'+'?api_token='+ockey
    ocdata=simplejson.load(urllib.urlopen(ocurl))
    return ocdata

def getOCtmData(ocid,octmid):
    octmurl='http://api.opencorporates.com/data/'+str(octmid)+'?api_token='+ockey
    data=simplejson.load(urllib.urlopen(octmurl))
    octmdata=data['datum']['attributes']
    print 'tm data',octmdata
    categories=[]
    for category in octmdata['goods_and_services_classifications']:
        if 'en' in octmdata['goods_and_services_classifications'][category]:
            categories.append(octmdata['goods_and_services_classifications'][category]['en']+'('+category+')')
    tmdata={}
    tmdata['ocid']=ocid
    tmdata['ocname']=octmdata['holder_name']
    #tmdata['ocname']=ocnames[ocid]
    tmdata['categories']=" :: ".join(categories)
    tmdata['imgtype']=octmdata['mark_image_type']
    tmdata['marktext']=octmdata['mark_text']
    tmdata['repaddr']=octmdata['representative_address_lines']
    tmdata['repname']=octmdata['representative_name_lines']
    tmdata['regnum']=octmdata['international_registration_number']
    #if an image tradmarked, we can work out its URL on the WIPO site...
    if tmdata['imgtype']=='JPG' or tmdata['imgtype']=='GIF':
        tmdata['imgurl']='http://www.wipo.int/romarin/images/' + tmdata['regnum'][0:2] +'/' + tmdata['regnum'][2:4] + '/' + tmdata['regnum'] + '.'+ tmdata['imgtype'].lower()
    else: tmdata['imgurl']='' 
    print tmdata
    scraperwiki.sqlite.save(unique_keys=['regnum'], table_name='trademarks', data=tmdata)

    return octmdata

def grabOCTrademarks(ocid,ocdata):
    for tm in ocdata['data']:
        if tm['datum']['data_type']=='WipoTrademark':
            octmid=tm['datum']['id']
            octmdata=getOCtmData(ocid,octmid)

for ocid in ocids:
    print 'company data',getOCcompanyData(ocid)
    ocdata=getOCcompanyData(ocid)
    octmdata=grabOCTrademarks(ocid,ocdata)

You can find the actual scraper here: Scraperwiki scraper: OpenCorporates Trademark demo

The code for the view (shown above) can be found here: Scraperwiki View: OpenCorporates Trademark demo (code) and here’s the actual view

Note that I think the OpenCorporates databases only has a very fragmentary coverage over trademark records for each company. Even keeping up with daily updates relating to trademarks and trademarked images on WIPO looks like it could be quite a major undertaking, which is maybe why companies such as Thomson Reuters recently saw an opportunity to Create Most Comprehensive Collection of Searchable Trademark Data in the World.

That’s not to say I don’t see value in this micro-attempts at making sense of the world around us such as this one, though. Trademark brands are pervasive, but it’s often not obvious who actually owns them, and may even mask a lack of competition with an appearance of competition (for example, when a company owns two different trademark brands that a consumer thinks of as being competitive offerings by different providers, when in fact they are both owned by the same company; or when you mistake a company offshoot for a third party franchise, as I realised when I saw that BP owned the Wild Bean Cafe trademark…)

Generating OpenLearn Navigation Mindmaps Automagically

I’ve posted before about using mindmaps as a navigation surface for course materials, or as way of bootstrapping the generation of user annotatable mindmaps around course topics or study weeks. The OU’s XML document format that underpins OU course materials, including the free course units that appear on OpenLearn, makes for easy automated generation of secondary publication products.

So here’s the next step in my exploration of this idea, a data sketch that generates a Freemind .mm format mindmap file for a range of OpenLearn offerings using metadata puled into Scraperwiki. The file can be downloaded to your desktop (save it with a .mm suffix), and then opened – and annotated – within Freemind.

You can find the code here: OpenLearn mindmaps.

By default, the mindmap will describe the learning outcomes associated with each course unit published on the Open University OpenLearn learning zone site.

By hacking the view URL, other mindmaps are possible. For example, we ca make the following additions to the actual mindmap file URL (reached by opening the Scraperwiki view) as follows:

?unit=UNITCODE, where UNITCODE= something like T180_5 or K100_2 and you will get a view over section headings and learning outcomes that appear in the corresponding course unit.
?unitset=UNITSET where UNITSET= something like T180 or K100 – ie the parent course code from which a specific unit was derived. This view will give a map showing headings and Learning Outcomes for all the units derived from a given UNITSET/course code.
?keywordsearch=KEYWORD where KEYWORD= something like: physics This will identify all unit codes marked up with the keyword in the RSS version of the unit and generate a map showing headings and Learning Outcomes for all the units associated with the keyword. (This view is still a little buggy…)

In the first iteration, I haven’t added links to actual course units, so the mindmap doesn’t yet act as a clickable navigation surface, but that it is on the timeline…

It’s also worth noting that there is a flash browser available for simple Freemind mindmaps, which means we could have an online, in-browser service that displays the mindmap as such. (I seem to have a few permissions problems with getting new files onto ouseful.open.ac.uk at the moment – Mac side, I think? – so I haven’t yet been able to demo this. I suspect that browser security policies will require the .mm file to be served from the same server as the flash component, which means a proxy will be required if the data file is pulled from the Scraperwiki view.)

What would be really nice, of course, would be an HTML5 route to rendering a JSONified version of the .mm XML format… (I’m not sure how straightforward it would be to port the original Freemind flash browser Actionscript source code?)

Twitter Volume Controls

With a steady stream of tweets coming out today containing local election results, @GuardianData (as @datastore was recently renamed) asked whether or not regular, stream swamping updates were in order:

A similar problem can occur when folk are livetweeting an event – for a short period, one or two users can dominate a stream with a steady outpouring of live tweets.

Whilst I’m happy to see the stream, I did wonder about how we could easily wrangle a volume control, so here are a handful of possible approaches:

Tweets starting @USER ... are only seen in the stream of people following both the sender of the tweet and @USER. So if @GuardianData set up another, non-tweeting, account, @GuardianDataBlitz, and sent election results to that account (“@GuardianDataBlitz Mayor referendum results summary: Bradford NO (55.13% on ), Manchester NO (53.24%), Coventry NO (63.58%), Nottingham NO (57.49%) #vote2012” for example), only @GuardianData followers following @GuardianDataBlitz would see the result. There are a couple of problems with this approach, of course: for one, @GuardianDataBlitz takes up too many characters (although that can be easily addressed), but more significantly it means that most followers of @GuardianData will miss out on the data stream. (They can be expected to necessarily know about the full fat feed switch.)
For Twitter users using a Twitter client that supports global filtering of tweets across all streams within a client, we may be able to set up a filter to exclude tweets of the form (@GuardianData AND #vote2012). This is a high maintenance approach, though, and will lead to the global filter getting cluttered over time, or at least requiring maintenance.
The third approach – again targeted at folk who can set up global filters – is for @GuardianData to include a volume control in their tweets, eg Mayor referendum results summary: Bradford NO (55.13% on ), Manchester NO (53.24%), Coventry NO (63.58%), Nottingham NO (57.49%) #vote2012 #blitz. Now users can set a volume control by filtering out terms tagged #gblitz. To remind people that they have a volume filter in place, @GuardianData could occasionally post blitz items with #inblitz to show folk who have the filter turned on what they’re missing? Downsides to this approach are that it pollutes the tweets with more confusing metadata maybe confuses folk about what hashtag is being used.
A more generic approach might be to use a loudness indicator or channel that can be filtered against, so for example channel 11: ^11 or ^loud (reusing the ^ convention that is used to identify individuals tweeting on a team account)? Reminders to folk who may have a volume filter set could take the form ^on11 or ^onloud on some of the tweets? Semantic channels might also be defined: ^ER (Election Results), ^LT (Live Tweets) etc, again with occasional reminders to folk who’ve set filters (^onLT, etc, or “We’re tweeting local election results on the LT ^channel today”)). Again, this is a bit of a hack that’s only likely to appeal to “advanced” users and does require them to take some action; I guess it depends whether the extra clutter is worth it?

So – any other volume control approaches I’ve missed?

PS by the by, here’s a search query (just for @daveyp;-) that I’ve been using to try to track results as folk tweet them:

-RT (#atthecount OR #vote2012 OR #le2012) AND (gain OR held OR los OR hold) AND (con OR lib OR lab OR ukip)

I did wonder about trying to parse out ward names to try an automate the detection of possible results as they appeared in the stream, but opted to go to bed instead! It’s something I could imagine trying to work up on Datasift, though…

Local and Sector Specific Data Verticals

Although a wealth of public data sets are being made available by government departments and local bodies, it can often be hard to track down. Data.gov.uk maintains an index of a wide variety of publicly released datasets, and more can be found via data released under FOI requests, either made through WhatDoTheyKnow or via a web search of government websites for FOI disclosure logs. But now it seems that interest may be picking up in making data available in more palatable ways…

Take for example datagenerator, “an online service designed to help individuals and businesses in the creative sector to access the latest industry research and analysis” operated by Creative & Cultural Skills, the sector skills council for the UK’s creative and cultural industries:

This tool allows you search through – and select – data from a variety of sources and generate a range of tabulated data reports, or visual charts based on the datasets you have selected. It’ll be interesting to see whether or note this promotes uptake/use of the data made available via the service? That is, maybe the first step towards uptake of data at scale (rather than by developers for app development, for example), is the provision of tools that allow the creation of reports and dashboards?

If the datagenerator approach is successful, I wonder if it would help with uptake of data and research made available via the DCMS CASE programme?

Or how about OpenDataComminites, which is trying to make Linked Data published via DCLG a little more palatable.

There’s still a little way to go before this becomes widely used though, I suspect?

But it’s a good start, and just needs some way of allowing folk to share more useful queries now and maybe hide them under a description of what sorts of result they (are supposed to?!) return.

Data powered services

As the recent National Audit Office report on Implementing Transparency reminds us, the UK Government’s transparency agenda is driving the release of public data not only for the purposes of increased accountability and improving services, but also with the intention of unlocking or realising value associated with datasets generated or held by public bodies. In this respect, it is interesting to see how data sets are also being used to power services at a local level, improving service provision for citizens at the same time.

In Putting Public Open Data to Work…?, I reviewed several data related services built on top of data released at a local level that might also provide the basis for a destination site at a national level based on a aggregation of the locally produced data. Two problems immediately come to mind in this respect. Firstly, identifying where (or indeed, if) the local data can be found; secondly, normalising the data. Even if 10 councils nominally publish the same sort of dataset, it’s likely that the data will be formatted or published in 11 or more different ways!

(Note that for local government data, one way of tracking down data sets is to use the Local Government Service IDs to identify web pages relating to the service of interest: for example, Aggregated Local Government Verticals Based on LocalGov Service IDs.)

Here’s a (new to me) example of one such service in the transport domain – Leeds Travel Info

Another traffic related site shows how it may be possible to build a sustainable, national service on top of aggregated public data, offering benefits back to local councils as well as to members of the public: operated by Elgin, roadworks.org aggregates roadworks related data fom across the UK and makes it avaiable via a user facing site as well as an API.

The various Elgin articles provide an interesting starting point, I think, for anyone who’s considering building a service that may benefit local government service provision and citizens alike based around open data.

ELGIN is the trading name of Roadworks Information Limited, a new company established in 2011 to take over the stewardship of roadworks.org (formerly http://www.elgin.gov.uk).
ELGIN has been established specifically to realize the Government’s vision of opening up local transport data by December 2012 (see Prime Minister’s statement 7th July and the Chancellor’s Autumn statement November 2011.)

ELGIN is dedicated to preserving the free-to-view national roadworks portal and extending its range of Open Data services throughout the Intelligent Transport and software communities.

roadworks.org is a free-to-view web based information service which publishes current and planned roadworks fulfilling the requirements of members of the public wanting quick information on roadworks and providing a data rich environment for traffic management professionals and utility companies.

[ELGIN supports the roadworks.org local roadworks portal by the providing services to local authority and utility clients and through subscriptions from participating local authorities. Though a private company, ELGIN manages roadworks.org and adheres to public sector standards of governance and a commitment to free and open data.]

Our policies and development strategy are strongly influenced by our Subscribers and governed by a governance regime appropriate to our role serving the public sector.

We are committed to helping achieve better coordination of roadworks, and in working together with all stakeholders to realise the vision of open, accessible, timely and accurate roadworks information.

[ELGIN redistributes public sector information under the Open Government Licence and provides its added value aggregation and application services to industry via its easy to use API (Application Programme Interface).]

Another application I quite like is YourTaximeter, This service scrapes and interprets local regulations in a contextually meaningful way, in this case locally set Hackney Carriage (taxi) fares:

If you know of any other local data or local legislation powered apps that are out there, please feel free to add a link in the comments, and I’ll maybe do a round up of anything that turns up;-)