OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for March 2011

A Python XML Handling Gotcha – Namespaces

with one comment

Just a quick note to self from a matter arising at the #IIP11 hackday earlier today – parsing XML from the PatientOpinion API using xml.etree.ElementTree library in Python. The issue – as Dan Hagon (aka @axiomsofchoice) discovered, and as I couldn’t help on out given my appalling lack of skills in using Python, was that the namespace the XML results file used needed handling explicitly. (I wonder if this is also why Yahoo Pipes choked on the XML?).

Anyway, here’s an example of the XML returned from Patient Choices:

<Opinions xmlns="http://www.patientopinion.org.uk/api/rest/v1" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
  <Opinion>
    <Author>*****</Author>
    <Body>My mother who is ...
      ...
    </Body>
    <PostingID>27290</PostingID>
    <dtSubmitted>2010-01-05T12:57:19.667</dtSubmitted>
    <syndicOriginalID/><syndicSourceID>po</syndicSourceID>
    <HealthServices>
      <HealthService>
        <NACS>RAJ01_430</NACS>
        <Name>Geriatric medicine</Name>
        <OrganisationNACS>RAJ</OrganisationNACS>
        <Postcode>SS0 0RY</Postcode>
        <SiteNACS>RAJ01</SiteNACS>
        <Town/>
        <Type>service</Type>
      </HealthService>
    </HealthServices>
    <Period>Today</Period>
    <PostingAs>a relative</PostingAs>
    <Responses/>
    <Tags>
      <Tag>
        <TagGroup>Condition</TagGroup>
        <TagName>confused</TagName>
      </Tag>
    </Tags>
    <Title>My darling dad</Title>
    <Type>Story</Type>
  </Opinion>
</Opinions>

And here’s a snippet for how to handle it, as gleaned from the ever helpful Stack Overflow…

import urllib2
from xml.etree.ElementTree import *

req = urllib2.Request(url='http://www.patientopinion.org.uk/api/rest.svc/v1/postings/search?tag=dirty&take=20&apikey=******')
f = urllib2.urlopen(req)

tree = ElementTree()
tree.parse(f)
doc = tree.getroot()

#http://stackoverflow.com/questions/1319385/need-help-using-xpath-in-elementtree
namespace = "{http://www.patientopinion.org.uk/api/rest/v1}"
t= doc.find("{0}Opinion/{0}HealthServices/{0}HealthService/{0}Postcode".format(namespace))
print t.text

It also seems as if the full path from the root is required?

PS are there any Python libraries out there that would have been able to handle the namespaced XML automagically….?

Written by Tony Hirst

March 28, 2011 at 5:02 pm

Posted in Anything you want

Tagged with , , ,

TSO OpenUP Competition – Opening Up UCAS Data

with 3 comments

Here’s the presentation I gave to the judging panel at the TSO OpenUp competition final yesterday. As ever, it doesn’t make sense with[out] (doh!) me talking, though I did add some notes in to the Powerpoint deck: Opening up UCAS Course Code Data

(I had hoped Slideshare would be able to use the notes as a transcript, bit it doesn’t seem to do that, and I can’t see how to cut and paste the notes in by hand?:-(

A quick summary:

The “Big Idea” behind my entry to the TSO competition was a simple one – make UCAS course data (course code, title and institution) avaliable as data. By opening up the data we make it possible for third parties to construct services and applications based around complete data skeleton of all the courses offered for undergraduate entry through clearing in a particular year across UK higher education.
The data acts as scaffolding that can be used to develop consumer facing applications across HE (e.g. improved course choice applications) as well as support internal “vertical” activities within HEIs that may also be transferable across HEIs.
Primary value is generated from taking the course code scaffolding and annotating it with related data. Access to this dataset may be sold on in a B2B context via data platform services. Consumer facing applications with their own revenue streams may also be built on top of the data platform.
This idea makes data available that can potentially disrupt the currently discovery model for course choice and selection (but in its current form, not in university application or enrollment), in Higher Education in the UK.

Here are the notes I doodled to myself in preparation for the pitch. Now the idea has been picked up, it will need tightening up and may change significantly! ;-) Which is to say – in this form, it is just my original personal opinion on the idea, and all ‘facts’ need checking…

  1. I thought the competition was as much about opening up the data as anything… So the original idea was simply that it would be really handy to have machine readable access to course code and course name information for UK HE courses from UCAS – which is presumably the closest thing we have to a national catalogue of higher education courses.

    But when selected to pitch the idea, it became clear that an application or two were also required, or at least some good business reasons for opening up this data…

    So here we go…

  2. UCAS is the clearing house for applying to university in the UK. It maintains a comprehensive directory of HE courses available in the UK.

    Postgraduate students and Open University students do not go through UCAS. Other direct entry routes to higher education courses may also be available.

    According to UCAS, in 2010, there were 697,351 applicants with 487,329 acceptances, compared with 639,860 applications and 481,854 acceptances in 2009. [ Slightly different figures in end of cycle report 2009/10? ]

    For convenience, hold in mind the thought that course codes could be to course marketing, what postcodes are for geo related applications… They provide a natural identifier that other things can be associated with.

    Associated with each degree course is a course code. UCAS course codes are also associated with JACS codes – Joint Academic Coding System identifiers – that relate to particular topics of study. “The UCAS course codes have no meaning other than “this course is offered by this institution for this application cycle”.” link]

    “UCAS course code is 4 character reference which can be any combination of letters and numbers.

    Each course is also assigned up to three JACS (Joint Academic Coding System) codes in order to classify the course for *J purposes. The JACS system was introduced for 2002 entry, and replaced UCAS Standard Classification of Academic Subjects (SCAS). Each JACS code consists of a single letter followed by 3 numbers. JACS is divided into subject areas, with a related initial letter for each. JACS codes are allocated to courses for the *J return.

    The JACS system is used by the Higher Education Statistics Agency (HESA), and is the result of a joint UCAS-HESA subject code harmonization project.

    JACS is also used by UK institutions to identify the subject matter of programmes and modules. These institutions include the Department for Innovation, Universities and Skills (DIUS), the Home Office and the Higher Education Funding Council for England (HEFCE).”

    Keywords: up to 10 keywords per course are allocated to each course from a restricted list of just over 4,500 valid keywords.
    “Main keyword: This is generally a broad subject category, usually expressed as a single word, for example ‘Business’.
    Suggested keyword (SUG): Where a search on a main keyword identifies more than 200 courses, the Course Search user is prompted to select from a set of secondary keywords or phrases. These are the more specific ‘Suggested keywords’ attached to the courses identified. For example, ‘Business Administration’ is one of a range of ‘Suggested keywords’ which could be attached to a Business course (there are more than 60 others to choose from). A course in Business Administration would typically have this as the ‘Suggested keyword’, with ‘Business’ as the main keyword.
    However, if a course only has a ‘Suggested keyword’ and not a related ‘Main keyword’, the course will not be displayed in any search under the ‘Main keyword’ alone.

    Single subject: Main keywords can be ticked as ‘Single subject’. This means that the course will be displayed by a keyword search on the subject, when the user chooses the ‘single subject’ option below. You may have a maximum of two keywords indicated as single subjects per course.”

    “Between January and March 2010, approximately 600,000 unique IP addresses access the UCAS course code search function. During the same time period, almost 5 million unique IP addresses accessed the UCAS subject search function.” [link]

    “New courses from 2012 will be given UCAS codes that should not be used for subject classification purposes. However, all courses will still be assigned up to three individual JACS3 codes based on the subject content of the course.

    An analysis of unique IP address activity on the UCAS Course Search has shown that very few searches are conducted using the course code, compared to the subject search function. UCAS Courses Data Team will be working to improve the subject search and course keywords over the coming year to enable potential applicants to accurately find suitable courses.” [link]

    Course code identifiers have an important role to play within a university administrations, for example in marshalling resources around a course, although they are not used by students. (On the other hand, students may have a familiarity with module codes.) Course codes identify courses that are the subject of quality assessment by the QAA. To a certain extent, a complete catalogue of course codes allows third parties to organise offerings based around UK higher education degrees in a comprehensive way and link in to the UCAS application procedure.

  3. If released as open data, and particularly as Linked Open Data, the course data can be used to support:
    - the release of horizontal data across the UK HE sector by HEIs, such as course catalogue information;
    - vertical scaffolding within an institution for elaboration by module codes, which in turn may be associated with module descriptions, reading lists, educational resources, etc.
    - the development across HE of services supporting student choice – for example “compare the uni” type services
  4. At the moment the data is siloed inside UCAS behind a search engine with unfriendly session based URLs and a poor results UI. Whilst it is possible to scrape or crowd-source course code information, such ad hoc collection mechanisms run the danger of being far from complete, which means that bias may be introduced into the collection as a side effect of the collection method.
  5. Making the data available via an API or Linked data store makes it easier for third parties to build course related services of whatever flavour – course comparison sites, badging services, resource recommendation services. The availability of the data also makes it easier for developers within an intsitution to develop services around course codes that might be directly transferable to, or scaleable across, other institutions.
  6. What happens if the API becomes writeable? An appropriately designed data store, and corresponding ingest routes, might encourage HEIs to start releasing the course data themselves in a more structured way.

    XCRI is JISC’s preferred way of doing this, and I think there has been some lobbying of HEFCE from various JISC projects, but I’m not sure how successful it’s been?

  7. Ultimately, we might be able to aggregate data from locally maintained local data stores. Course marketing becomes a feature of the Linked Data cloud.

    Also context of data burden on HEIs, reporting to Professional, Statutory and Regulatory Bodies – PSURBS.

    Reconciliation with HESA Institution and campus identifiers, as well as the JISCMU API and Guardian Datablog Rosetta Stone spreadsheet

    By hosting course code data, and using it as scaffolding within a Linked Data cloud around HE courses, a valuable platform service can be made available to HEIs as well as commercial operations designed to support student choice when it comes to selecting an appropriate course and university.

  8. Several recent JISC project have started to explore the release of course related activity data on the one hand, and Linked Data approaches to enterprise wide data management on the other. What is currently lacking is national data-centric view over all HEI course offerings. UCAS has that data.

    Opening up the data facilitates rapid innovation projects within HEIs, and makes it possible for innovators within an HEI to make progress on projects that span across course offerings even if they don’t have easy access to that data from their own institution.

  9. Consumer services are also a possibility. As HEIs become more businesslike, treating students as customers, and paying customers at that, we might expect to see the appearance of university course comparison sites.

    CompareTheUni has had a holding page up for months – but will it ever launch? Uni&Books crowd sources module codes and associated reading links. Talis Aspire is a commercial reading list system that associates resources with module codes.

  10. Last year, I pulled together a few separate published datasets and through them into Google Fusion Tables, then plotted the results. The idea was that you could chart research ratings against student satisfaction, or drop out rates against the academic pay. [link ]

    Guardian datablog picked up the post, and I still get traffic from there on a daily basis… [link ]

  11. The JISC MOSAIC Library data challenge saw Huddersfield University open up book loans data associated with course codes – so you could map between courses and books, and vice versa (“People who studied this course borrowed this book”, “this book was referred to by students on this course”)

    One demonstrator I built used a bookmarklet to annotate UCAS course pages with a link to a resource page showing what books had been borrowed by students on that course at Huddersfiled University. [Link ]

  12. Enter something like compare the uni, but data driven, and providing aggregated views over data from universities and courses.
  13. To set the scene, the site needs to be designed with a user in mind. I see a 16-17 year old, sloughing on the sofa, TV on with the most partial of attention being paid to it, laptop or tablet to hand and the main focus of attention. Facebook chat and a browser are grabbing attention on screen, with occasional distractions from the TV and mobile phone.
  14. The key is course data – this provides a natural set of identifiers that span the full range of clearing based HE course offerings in the UK and allows third parties to build servies on this basis.

    The course codes also provide hooks against which it may be possible to deploy mappings across skills frameworks, e.g. SFIA in IT world. The course codes will also have associated JACS subject code mappings and UCAS search terms, which in turn may provide weak links into other domains, such as the world of books using vocabularies such as the Library of Congress Subject headings and Dewey classification codes.

  15. Further down the line, if we can start to associate module codes with course codes, we can start to develop services to support current students, or informal learners, by hooking in educational resources at the module level.
  16. Marketing can go several ways. For the data platform, evangelism into the HE developer community may spark innovation from within HEIs, most likely under the auspices of JISC projects. Platform data services may also be marketed to third party developers and innovators/entrepeneurs.

    Marketing of services built on top of the data platform will need to be marketed to the target audience using appropriate channels. Specialist marketers such as Campus Group may be appropriate partners here.

  17. The idea pitched is disruptive in that one of the major competitors is at first UCAS. However, if UCAS retains it’s unique role in university application and clearing, then UCAS will still play an essential, and heavily trafficked, role in undergraduate student applications to university. Course discovery and selection will, however, move away from the UCAS site towards services that better meet the needs of potential applicants. One then might imagine UCAS becoming a B2B service that acts as intermediary between student choice websites and universities, or even entertain a scenario in which UCAS is disintermediated and some other clearing mechanism instituted between universities and potential-student facing course choice portals.
  18. According to UCAS, between January and March 2010 “almost 5 million unique IP addresses accessed the UCAS subject search function” [link] In each of the last couple of years, the annual application/acceptance numbers have been of the order approx 500,000 students intake per year, on 600,000 applicants. If 10% of applicants and generate £5 per applicant, that’s £300k pa. £10 from 20% of intake, that’s £1M pa. £50 each from 40% is £10M. I haven’t managed to find out what the acquisition cost of a successful applicant is, or the advertising budget allocated to an undergraduate recruitment marketing campaign, but there are 200 or so HE institutions (going by the number of allocated HESA institution codes).

    For platform business – e.g. business model based around selling queries on linked/aggregated/mapped datasets. If you imagine a query returning results with several attributes, each result is a row and each attribute is a column, If you allow free access to x thousand query cells returned a day, and then charge for cells above that limit, you:
    Encourage wider innovation around your platform; let people run narrow queries or broad queries. License on use of data for folk to use on their own datastores/augmented with their own triples.
    Generate revenue that scales on a metered basis according to usage;
    - offer additional analytics that get your tracking script in third party web pages, helping train your learning classifiers, which makes platform more valuable.

    For a consumer facing application – eg a course choice site for potential appications is the easiest to imagine:
    - Short term model would be advertising (e.g. course/uni ads), affiliate fees on booksales for first year books? Seond hand books market eg via Facebook marketplace?
    - Medium term – affiliate for for prospectus application/fulfilment
    Long term – affiliate fee for course registration

  19. At the end of the day, if the data describing all the HE courses available in the UK is available as data, folk will be able to start to build interesting things around it…

Written by Tony Hirst

March 24, 2011 at 5:48 pm

TSO Open Up Competition – Result:-)

with 2 comments

Yesterday, I had the great fortune to win the 2011 TSO OpenUp Competition (@TSOopenup). My idea to open up UCAS course code data was pitched alongside Gail Knight‘s Great British Public Toilet Map idea [presentation] (this so needs doing…), Harry Harrold‘s New Premises app (I can see commercial estate agents going for this one…), and Benjamin Wood’s My Neighbourhood App idea (which would be a great complement to OpenlyLocal, methinks…?)

TSO OpenUp trophy :-)

I’ll post the slides [here] I pitched once I’ve tidied and checked the embedded notes, and a quick write up of the idea hopefully by the end of the day…

Written by Tony Hirst

March 24, 2011 at 11:23 am

Posted in Anything you want

Tagged with

eSTEeM Project: Custom Course Search Engines

with 4 comments

Preamble
If the desire for OU courses to make increased use of third party materials and open educational resources is realised, we are likely to see a shift in the pedagogy to one that is more resource based. This project seeks to explore the extent to which custom search engines tuned to particular courses may be used to support the discovery of appropriate resources published on the public web, and as indexed by Google, on any given course.

Many courses now include links to third party resources that have been published on the public web. Discovering appropriate resources in terms of relevance and quality can be a time consuming affair. The Google Custom Search Engine service allows users to define custom search engines (CSEs) that search over a limited set of domains or web pages, rather than the whole web.

(Topic based links can be discovered in a wide variety of places. For example, it is possible to create custom search engines based around the homepages of people added to a Twitter list, or the nominated blogs in annual award listings.)

The ranking of particular resources may also be boosted in the definition of the CSE via a custom ranking configuration. For example, open educational resources published in support of the course may be boosted in the search result rankings.

Alternatively, CSEs may be used to exclude results from particular domains, or return resources from the whole web with the ranking of results from specified pages or domains boosted as required. By opening up results to the whole of the web, if recent, relevant resources from an unspecified domain are identified in response to a particular search query, they stand a chance of being presented to the user in the results listing.

Synonyms for common terms may also be explicitly declared and refinement labels used to offer facet based search limits. This might be used to limit results to resources identified as particularly relevant for a particular unit, or block within a course, for example, or to particular topic areas spread across a course.

“Promoted” results may also be used to emphasise particular results in response to particular queries. A good example here might be to display promoted results relating to resources explicitly referenced in an exercise, assignment or activity.

If any of the indexed pages are marked up with structured data, it may be possible to expose this data using an rich snippet/enhanced search listing. Whilst there are few examples to date, enhanced listings that display document types or media types might be appropriate.

Examples of Google CSEs in action can be found here:

- Digital Worlds Cusotm Search Engine (created by hand; as used in T151).

- faceted “HE CSE” metasearch engine over UK Higher Education Library websites, UK Parliamentary pages, OERs, video protocols for science experiments. This example demonstrates how the search engine may be embedded in a web page.

The Project
The project proposes the automated generation of custom search engines on a per course basis based on the resources linked to from any given course.

The deliverables will be:

1) an automated way of generating Google CSE definition files through link scraping of Structured Authoring/XML versions of online course materials. If necessary, additional scraping of non-SA, VLE published resources may be required.

2) a resource template page and/or widget in the VLE providing access to the customised course search engine

Success will be based on the extent to which:

1) students on pilot courses use the search engine;
2) a survey of students on courses using the search engine about how useful they found it

Search engine metrics will also form part of the reporting chain. If appropriate, we will also explore the extent to which search engine analytics can be used to enhance the performance of the search engine (for example, by tuning custom ranking configurations), as well offering “recent searches” information to students.

The placement of the search box for the CSE will be an important factor and any evaluation should take this into account, e.g. through A/B testing on course web pages.

Another variable relating to the extent to which a CSE is used by students is whether the CSE performs a whole web search with declared resources prioritised, or whether it just searches over declared resources. Again, an A/B test may be appropriate.

For activities that include a resource discovery component, it would be interesting to explore what effect embedding the search engine with the activity description page might have?

If course team members on any OU courses presenting over the next 9 months are interested in trying out a course based custom search engine, please get in touch. If academics on courses outside the OU would like to discuss the creation and use of course search engines for use on their own courses, I’d love to hear from you too:-)

eSTEeM is joint initiative between the Open University’s Faculty of Science and Faculty of Maths, Computing and Technology to develop new approaches to teaching and learning both within existing and new programmes.

Written by Tony Hirst

March 21, 2011 at 4:17 pm

Posted in Project, Search

Tagged with ,

Vicarious Learning and the Practitioner Educator

with 3 comments

Over the last couple of months, I’ve had several folk enquiring whether I could develop some interactive visualisations for them. Whilst I’m usually happy to have a go at hacking something quick and dirty together, I don’t consider myself enough of a developer to be able to put together a production system. (I don’t really myself to be a developer at all…) Instead, I see myself performing more of a scout, or observatory, role, maintaining a reasonable current awareness of what tools are out there and how they might be combined in novel ways in order to support the development of rapid prototypes that can provide a basic functional and operational specification of a system that someone could then implement properly if it ever took off…

I do think I need to start bring some money into the OU, though, so here’s an idle lunchtime thought out loud: maybe I should take on some of this consultancy work, and wherever possibly get agreement that I can blog about whatever I do as part of an uncourse. This would allow me to learn more about the topic, do a better job, and teach on that experience as I do so. An “open working, personal learning journey”. This fits in with my view of teachers-as-co-learners, and is maybe a radical version of student-as-producer, with the “teacher” taking on the role of student, albeit as an auto-didactic student with (hopefully) pretty well-developed learning skills and a side-role in maintaining a learning journal that can be used vicariously as open educational resources by others.

Payment for the work would help cover some of the costs of producing the open materials, and a discounted charge for services would recognise the open working/transparent nature of the project. The project would be documented as much in terms of “how we learned how to create/develop this application” as “how it works”.

Related: @jimgroom’s #ds106 seems very much in this vein… Why shouldn’t the instructor have to be learning in public and teaching that learning process on by living it, rather than spouting stuff they learned and internalised years ago?

Written by Tony Hirst

March 21, 2011 at 12:52 pm

Posted in Infoskills, Thinkses

Open Data Processes – Taps, Query Paths/Audit Trails and Round Tripping

with 4 comments

A few quick thoughts on open data processes and how we might start to put some of all this open public data to work, maybe via transparent data processes, not least in the institutions that are publishing it all…

Data Taps

The idea behind a data tap is simple – just tap off a view of the data as one institution provides it to another:


Tapping data is part of the motivation behind using FOI requests to identify standard reporting forms that may be used as part of a white box (open and transparent) data exchange process.

Query Paths/Audit Trails

Query paths describe a process in which is it possible to see how a particular data view or set of summary data was obtained from a one or more data sources:

For an example use case, see So Where Do the Numbers in Government Reports Come From?.

Round Trips

Round tripping refers to the ability to regenerate a data source from a data report, as for example taking data out of an HTML table and popping it into a spreadsheet or database):

If common data fields are used across datasets, it may be possible to populate fields in one data “source” automatically from another:

Round tripping means that we can reuse data, once entered, to populate other reporting forms.

[See also: Open Data Handbook]

Written by Tony Hirst

March 18, 2011 at 5:38 pm

Posted in Data, Policy

Tagged with

Academic Library Usage Data as Reported to SCONUL, via FOI, And a Thought About Whitebox Data Reporting

with one comment

Something I’ve been meaning to do for ages, but only just got round to starting to do, is to send up trial balloon FOI requests around the data that one public organisation might release to other organisations as part of a formal or templated reporting procedure.

So here’s the first one – an FOI request to the University of Bath Library for a copy of the data they returned to SCONUL for the period 2008/2009 made via MySociety’s WhatDoTheyKnow service – and here’s the response, along with a copy of the return.

(In general, I wonder if it would be more useful to ask for a copy of the document if possible, in the document format it was submitted (for example, a Microsoft Word document, if that was the document type submitted).)

The information reported to SCONUL is not available from SCONUL for free, although aggregated data from across the UK HE sector is available via a paid for report. A copy of the current questionnaire used to collect the data is available.

It seems to me that what requests of this sort do is demonstrate a precedent regarding the release of data that is produced as part of a formal or standardised reporting process that can be used to encourage (oblige?) other institutions in the same sector to make the information available in the same way?

So here’s what I have in mind: a site that collects and collates information about standard reports that are used to transport information between public sector organisations (including copies of the forms used to collate that data), including but not limited to the information/data that public institutions are obliged to return to government or overseeing agencies.

For example, this DCLG list of of the minimum data central Government needs from local authorities is a good start – is there an equivalent for universities (come on, BIS…;-)? [Ah, maybe this is a place to start, at least as far as HESA goes: HEFCE Report 2008: Making your data work for you - Data quality and efficiency in higher education. I imagine there is also a considerable data burden arising from REF reporting?]

As and when reports are demonstrated to be FOIable, their contents also become candidates for open data release. One aim here is to start making data chains visible to the organisations that are producing the data (internal transparency) so that the organisation can become more aware of its own data resources and how they might be used elsewhere within the organisation. (Transparency within the organisation may also lead to a reduction in duplication of effort creating or collating the same data at several different locations within the same organisation?)

The claim I guess I’m making towards this approach to opening up data may be summarised as follows: data that is produced as part of formal reporting and that is FOIable should be made public as a matter of course. As a consequence, there should be little extra effort required to open up the data. Indeed, it may be possible to submit the reports via an open and transparent whitebox reporting process.

[See also: Putting Public Open Data to Work…?]

PS for what it’s worth, I think the SCONUL data application provides another example of a situation where it might be useful to have a WhatDoTheyKnow service that allows you to make the same (bulk) request to every institution in a particular sector (such as universities, or local councils). I can see there may need to be controls around such a service to prevent abuse, but

PPS I wonder, do MySociety license WhatDoTheyKnow to any public institutions to help they manage their FOI process?

PPPS Here’s a related comment I posted to the Public Data Corporation engagement exercise:

Question 5 – What methods of access to datasets would most benefit you or your organisation?

One particular class of data that interests me is data that is:

1) reported by a local organisation to a central body;
2) using a standardised, templated reporting format,
3) and that is FOIable either from the local organisation, and/or from the central body.

For example, in Higher Education, this might include data on library usage as reported to SCONUL, or marketing information about courses submitted to UCAS.

It can often be hard to find out how to phrase an FOI request to obtain this data as submitted, unless you know the type of reporting form used to submit it.

What I would like to see is the Public Data Corporation acting in part as a Public Data Exchange Directory, showing how different classes of public organisation make standard (public data containing) reports to other public organisations, detailing the standard report formats, with names/identifiers for those forms if appropriate, and describing which sections of the report are FOIable. This could also link in to the list of local council data burdens, for example ( http://www.communities.gov.uk/… and/or the code of practice for local authority transparency ( http://www.communities.gov.uk/… )

The next step would be to introduce a pubsub (publish-subscribe) model in the reporting chain for reporting documents* that are wholly FOIable. This could happen in several ways:

A) /open report publication/ – the publishing organisation could post their report to their opendata reporting store, and the consuming organisation (the one to which the report was being made) would subscribe to that store, collecting the data from there as it was published; third parties could also subscribe to the local publishing store and be alerted to reports as they are published. If co-publication to the central organisation and the public is not appropriate, the report could be witheld from public/press consumption for a specified period of days, or published to the press but not the public under embargo.

B) /open deposit/ – the publishing organisation publishes the report/data to an open deposit box owned by the central organisation which is receiving the report. After a specified period of time, the report is made public (ie published) via that central deposit box.

C) /data corp in the middle/ – a centralised architecture in which local organisations submit public reports to a Public Data Exchange, which then passes them on to the central body to which reports are made, and publishes them to the public, maybe after a fixed period of time.

The intention of all three approaches described above is to provide an open window onto the reporting chain. At the current time, open public data tends to be data that is published via a separate branch “to the public”. In contrast, the above approach suggests that public data publication acts as a view onto all or part of the data as it goes about it’s daily business being published from one organisation to another. That is, public data publication becomes a “tap” onto a dataflow/workflow process.

If one of the desires for data exploitation is to help introduce efficiencies as well as reuse in data related activities, third parties need to be able to work with data as it currently used.

A final issue relates to the way data is published. The JISC Resource Discovery Taskforce is currently consulting [ http://rdtfmetadata.jiscpress.... ] about metadata standards for describing resources in the Museums, Libraries and Archives field, and work is also ongoing with respect to efficient and complete ways of publishing scientific data. To the extent that generic models or guidance is possible with respect the representation of arbitrary data sets, it may be worth liaising with those working groups on generic guidelines for effective data publishing conventions. [Disclaimer: I am on the RDTF Technical Advisory Group]

* when talking about reports, I include the following sense: where a report is made, it is likely to include summary reports and maybe complete datasets. Ideally, data contained in reports should also be made available as “raw data” in an open data format, for example compliant with two or more stars in the W3C Linked Data 5 star open Linked Data publishing scheme [ http://www.w3.org/DesignIssues... ]. In addition, where summary reports appear, referencing views over raw data sets, the queries/database queries that generate the summary report view from the raw data should also be published, thus providing transparency over how the raw data generates summary statistics, for example, in the final report.]</blockquote

Written by Tony Hirst

March 18, 2011 at 10:29 am

Thinkses Around Open Course Accreditation

with 2 comments

What do P2PU, the University of Mary Washington (UMW), and a joint venture between the National Research Council of Canada (Institute for Information Technology, Learning and collaborative Technologies Group, PLE Project), The Technology Enhanced Knowledge Research Institute at Athabasca University and the University of Prince Edward Island have in common? The answer is that they either have, or are about to, run open online courses, at undergraduate level, for free, on the web.

In the case of P2PU and the Canadian joint venture, the courses were run without credit. At UMW, the DS106 Digital Storytelling course ran for the first time in 2010 as a for credit course for registered UMW students, albeit largely in public. In 2011, it has run as a course with loose boundaries, open to all whilst at the same time providing a recognised course offering within UMW itself. In each case, the course duration was of the order of 10 weeks.

With HE in the UK going through a phase of soul-searching around the question of “where’s the money going to come from”, it could be argued that we need to start doing some work around business model innovation. So here’s one of my starters for ten… (I have floated this internally, and no-one’s picked up on it, so I feel as if I’m not giving away anything away by posting it here…)

The idea is simple: a recognised award offering body offers a module or course container that will allow participants in online courses to receive recognised academic credit points based in part on their participation in an open, online course, in part on their reflections about what they learned on the course.

What follows are initial (probably naive) thoughts on how it might work…

The module is inspired in part by the International Baccalaureate’s CAS (Creativity, Action, Service) component as well as HE level course modules developed to recognise work based or prior experiential learning; it provides a means by which paid for assessment may be decoupled from course delivery. To try and address concerns, the proposal in the first instance is that the container be used to award credit for students who have freely participated in one of a recognised number of open educational units, for example from the OU’s OpenLearn website or one or more courses offered by P2PU (subject to agreement).

OpenLearn Courses: participation in these courses is based on individual engagement with the course material, informally supported by one or more forums or social spaces open to all. This model allows us to explore the extent to which purely independent learning within a controlled open courseware context provides an appropriate context for accredited independent study.

One or more OU Uncourses/Learning Journey Courses (or open, online courses run by academics in other institutions): a significant part of the original course material drafted for the Relevant Knowledge short course T151 Digital Worlds was authored over a 15-20 week period on a public blog hosted on wordpress.com. The materials posted combined elements of personal learning diary as the OU author explored the subject area, as well as learning devices borrowed from the OU’s tutorial-in-print style of writing (in-line exercises, self-reflection questions, and worked through tutorials, for example). By running one or more new “learning journey” courses, such as in areas where material is being drafted for fully fledged future OU courses, where material is timely (for example, in response to a BBC series or short term skills gap (such as the opening up of data in central and local government)), or where there exists considerable vendor produced third party training material albeit in a poorly structured form as far as course design goes (for example, Google tutorials around Google Apps, or Google Analytics, or the Yahoo User Interface libraries), we can: i) pilot the open course container model; ii) create useful open resources “for the common good”; c) draft course materials for possible formal (paid for) OU course offerings.

P2PU Courses: P2PU runs 10 week courses for small cohorts starting throughout the year. Learners engage with each other as well as the course resources and course instructors. Recognising participation in this sort of course allows us to explore the extent to which an open accreditation module can be used to recognise participation in semi-formal courses. Recognising participation with P2PU courses also provides an opportunity for the OU to develop ties to the Mozilla Foundation, who support P2PU and are keen to see it develop a range of semi-professional courses based around the open web and open software development.

How the Container Works

The container awards credit based on the fulfilment of several criteria:

- demonstration of engagement with, or participation in, a recognised open, online course; this requirement means we know that learners were at least exposed to a certain of content we recognise;

- a reflective assessment component; this may take the form of a reflective essay, or piece of project work arising from the course and a critical review of that work.

- optionally, results from quizzes provided during the course. These not only demonstrate engagement with the course, but also provide some means of demonstrating a particular level of attainment in particular topic areas through computer marked assessment.

In the first instance, accreditation is offered for independent study based on participation with one of a limited number of pre-identified open online courses. In this way, we could artificially limit the range of subject areas and course models engaged with by the initial batches of learners to a know set of approved courses. This approach allows us to mitigate the risks involved with a prove the model and allow the course model to develop in a carefully controlled way.

The OpenLearn Context (2011I-2011L)

To a certain extent, the idea is based on a particular vision of how we might go about assessing participation in open online courses run outside the OU. However, I think it might also be used to provide a way in to formal study for students wishing to take formal OU awards based on prior engagement with OpenLearn materials.

By accrediting engagement with two OpenLearn based units derived from current Technology short course/Relevant Knowledge programme courses, we can compare achievement levels across formal and informal presentations of the material. For example, if material from Relevant Knowledge short courses in the their final presentation are released to OpenLearn immediately prior to the final presentation, we can engage learners around course material that is concurrently being offered in a supported fashion as an officially recognised OU course through the VLE, and informally via OpenLearn. As such, we can explore the extent to which an open course container might: i) extend the life of a course; ii) provide alternative pathways to credit and assessment models for students interested in a particular topic area but not necessarily interested in “named credit” for a course.

The Uncourse/Learning Journey Context

As institutions such as the OU continue to innovate in the areas of informal and semi-formal education through OpenLearn and emerging practice in Digital Scholarship, the uncourse/learning journey, originally inspired, in part, by the notion of “misguided tours”, provides a framework for digital scholars to record their learning journey through a new subject area as a learning pathway that others might follow. By employing writing devices that well are proven in the delivery of “tutorial-in-print” style learning materials, the learning diary becomes a piece of instructional material in its own right. Through openly recording the learning journey, and ideally engaging with other learners interested in the topic area, the author should also remain free to negotiate the future direction of the learning journey (hence its declaration as an ‘uncourse’) and so discover a curriculum that fairly reflects the learning needs of its participants.

The P2PU Context

If, as seems likely, ad hoc open online courses continue to emerge as a consequence of: a) the increasing availability of high quality content that can be put to use as a learning resource, even if not originally designed as one; b) the growth in online social networks and an apparent desire and willingness for learners to come together and participate in semi-structured learning directed activity, there will be a growing market for recognising participation in such activities and acknowledging it in some way. Through recognising participation in P2PU courses in certain areas, it may be possible for HEIs to develop closer ties with the Mozilla Foundation and engage with open courses in areas complementary to formal offerings (e.g. in the OU’s case, the Web Certificate, Open Source Tools and Linux courses). Such engagement provides opportunities for using P2PU courses as a marketing channel similar to the way in which OpenLearn units may be used, as well as providing a continuing education context for alumni in areas where an institution may not provide courses. P2PU may also provide a slightly more structured context than is offered by the uncourse/learning journey model for the developmental testing of formal course materials as they are being developed for fully fledged distance online courses.

What’s in it for folk offering online courses?
An obvious argument against the above approach is that folk running courses may get upset that someone else if offering (for a fee) accreditation around their course materials. (I always thought non-commercial could be a Bad Thing ;-) However, a couple of benefits come to mind.

Firstly, the institution offering the accreditation may pay to advertise on the site offering the course. (Yes, I know this might seem as if it’s a way for an institution to essentially outsource its course production and delivery, and in a way it is… But if open courses take off, and if they offer educational benefit, and if there’s value in proving to someone else you have taken an open course, and if HEIs don’t start offering certification around open courses, then someone else will. Such as an organisation like Pearson…

Secondly, by accepting that participation in a course can be used as partial fulfillment of requirements for the receipt of formal academic credit, it reflects back some of the authority of the award offering body on the course, showing that the course has something of educational value to offer.

Isn’t the Audience Limited?
Open educational courses aren’t for everyone; they require some element of motivation on the part of the learner, they are often best followed in a social way. At times they may lack structure, and instead focus on resource investigation activities, which can be hard for learners who prefer very heavily structured courses with linear narratives and “teacher” leading from the front. But if you want to develop skills and a model of learning that helps you exploit the power of the web, then open courses may help you on your way…

Conclusion
Err, that’s it… ;-)

Related: Massive Open Online Courses – All You Need to Know…

Written by Tony Hirst

March 11, 2011 at 2:45 pm

Another step on the Road to a Distributed data.ac.uk – Southampton University Linked Open Data

leave a comment »

Earlier this week, Chris Gutteridge and Dave Challis pushed data.southampton.ac.uk, Southampton University’s Linked Open Data store (Southampton U Data blog), containing for starters at least the following:

  • place data
  • a (non-authoritative) dataset describing the university’s organisational units
  • Academic programme data; this dataset identifies courses according to UCAS course code and JACS code, as well as remodelled Unistats course data for some of the courses.

From what I can tell, Chris has been running round Southampton grabbing data from wheresoever he can get it, so it’ll be interesting to see how the datasets grow out over the coming months;-)

Here’s how I think part of the data looks at the moment?

graph soton {
"Programmes(2010-2011session)"--"OpenDataCatalog";
"JACSCodes"--"OpenDataCatalog";
"JACSCodes"--"StudentStatistics";
"StudentStatistics"--"OpenDataCatalog";
"Programmes(2010-2011session)"--"JACSCodes";
"BuildingsandPlaces"--"OpenDataCatalog";
"PublicPhonebook"--"OpenDataCatalog";
"Organisation"--"OpenDataCatalog";
"PublicPhonebook"--"Organisation";}
}

A full list of datasets can be found here.

I wonder if it would be useful if each institution publishing Linked Open Data published an authoritative, local-to-them version of the Linked Open Data Cloud Diagram showing the local datasets and the third party datasets that are directly linked to? As well as the diagram, a data representation of the diagram (e.g. a Graphviz .dot file, would be handy…)

As a quick way in to writing your own queries on the Southampton open data SPARQL endpoint, previews of the queries used to generate results pages in the data store are also provided:

So for example:

A couple of quick observations about the data:

The organisation data looks quite flat at the moment, but I wonder if more structure will become available over time, allowing an organogram for the university to be generated directly from this data? Whenever I see an organisational chart (such as the Soton Corporate Servies organisation chart, I can’t help feeling it should be generated from an underlying data description, rather than simply presented as a flat image, with the underlying data published alongside the chart, or progressively enhanced to display the chart?) Given the general crapness of institutional search engines, surely we should be able to find a way of using organisational structure and committee workflows to help surface relevant content to folk at a particular location in the organisation/workflow?

The academic/course data is quite thin at the moment, but provides really important piece of scaffolding for linking to ever richer course related content, as well as linking out through services like UCAS. [UPDATE: by the by, @scottbw just created and shared an RDF XCRI vocabulary for course descriptions, for use with this MLO RDFS (I have no idea what any of that means, either;-).]

In the same way that getting access to postcode data and its various associations was foundational for the development of many location based services in the UK, so access to course code data for building course level applications is key.

I thought it was particularly interesting to see a link from courses to data obtained and remodelled from the Unistats service:

As well as the Southampton open data store, the OU is also running a 5 star linked open data service at data.open.ac.uk for OU Linked data, which is currently exposing module information and data around OU podcasts (OU Linked data on OUseful.info). I think location data is also in the store, though not publicly avalaible yet???

One thing that excites me about the opening up of data across sites is the extent to which institutions will start to open up different datasets to other HEIs, and hopefully drive the wider roll out of data as a result as everybody sees what everyone else is opening up… The other thing that excites is being able to join datasets;-)

So, which university will be next?

Written by Tony Hirst

March 10, 2011 at 12:27 pm

Posted in Data

Tagged with ,

Cobbling Together a Searchable Twitter Friends/Followers Contact List in Google Spreadsheets

with 2 comments

Have you ever found yourself in the situation where you want to send someone a Twitter message but you can’t remember their Twitter username although you do know their real name? Or where you can remember their twitter username or their real name, but you do remember who they work for, or some other biographical fact about them that might appear in their Twitter biography? If that sounds familiar, here’s a trick that may help…

… a searchable Twitter friends and followers contact list in Google Spreadsheets.

It’s based on Martin Hawksey’s rather wonderful Export Twitter Followers and Friends using a Google Spreadsheet (I have to admit – Martin has left me way behind now when it comes to tinkering with Google Apps Script…!) To get started, you’ll need a Google docs account, and then have to indulge in a quick secret handshake between Google docs and Twitter, but Martin’s instruction sheet is a joy to follow:-) Follow the *** Google Spreadsheet to Export Twitter Friends and Followers *** link on Martin’s page, then come back here once you’ve archived your Twitter friends and/or followers…

..done that? Here’s how to make the contact list searchable… I thought it should have been trivial, but it turned out to be quite involved!

The first thing I did was create a drop down list to let the user select Friends or Followers as the target of the search. (Martin’s application loads friends and followers into different sheets.)

The next step was to generate a query. To search for a particular term on a specified sheet we can use a QUERY formula that takes the following form:

=query(Friends!B:E,”select B,C,D,E where D contains ‘JISC’”)

Friends! specifies the sheet we want to search over; B:E says we want to pull columns B, C, D and E from the Friends sheet into the current sheet; the select statement will display results over four columns (B, C, D and E) from Friends for rows where the entry in column D contains the search term JISC.

To pull in the search term from cell D1 we can use a query of the form:

=query(Friends!B:E,concatenate(“select B,C,D,E where D contains ‘”,D1,”‘”))

The =concatenate formula constructs the search query. Make sure you use the right sort of quotes when constructing the string – Google Spreadsheets seems to prefer the use of double quotes wherever possible!

To search over two columns, (for example, the real name and the description columns of the twitter friends/follower data) we can use a query of the form:

=query(Followers!B:E,concatenate(“select B,C,D,E where C contains ‘”,D1,”‘ or D contains ‘”,D1,”‘”)

Again – watch out for the quotes – the result we want from the concatenation is something like:

=query(Followers!B:E,concatenate(“select B,C,D,E where C contains ‘Jisc’ or D contains ‘Jisc’)

so we have to explicitly code in the single quote in the concatenation formula.

Unfortunately, the query formula is case sensitive, which can cause the search to fail because we haven’t taken (mis)use of case into account in our search term. This means we need to go defensive in the query formulation – in the following example, I force everything to upper case – search corpus as well as search terms:

=query(Followers!B:E,concatenate(“select B,C,D,E where upper(C) contains upper(‘”,D1,”‘) or upper(D) contains upper(‘”,D1,”‘)”)

The final step is to define the sheet we want to search – Friends! or Followers! – depending on the setting of cell B1 in our search sheet. I had idly though I could use a concatenate formula to create this, but concatenate returns a string and we need to define a range. In the end, the workaround I adopted was an if statement, that chooses a query with an appropriate range set explicitly/hardwired within the formula depending on whether we are are searching Friends or Followers. Here’s the complete formula, which i put into cell E1.

=if(B1=”Friends”,query(Friends!B:E,concatenate(“select B,C,D,E where upper(C) contains upper(‘”,D1,”‘) or upper(D) contains upper(‘”,D1,”‘)”)),query(Followers!B:E,concatenate(“select B,C,D,E where upper(C) contains upper(‘”,D1,”‘) or upper(D) contains upper(‘”,D1,”‘)”)))

I now have a query sheet defined that allows me to search over my friends or followers, as required, according to their real name or a search term that appears in their biography description.

Written by Tony Hirst

March 9, 2011 at 7:41 pm

Follow

Get every new post delivered to your Inbox.

Join 126 other followers