Archive for the ‘Presentation’ Category
Just Back From #DevXS
Just back home from #devXS, the first DevCSI student developer event held at the University of Lincoln, in which a shed load (literally!) of student developers gave up their weekend for a 24 hour code bash (and 2 minute Rememberance Sunday silence) on projects of their own design. Well done to all the teams for their hacks and apps – I’m guessing a list of prize winners will appear on the DevXS blog, but you can find a full list on the wiki…
It was really encouraging to see several teams hacking out apps and services around course code data – it’s just a shame that UCAS Terms and Conditions make it so hard for folk to find an open way in to getting hold of a national catalogue of course codes. In the same way that restrictions on UK postcode data held back grass roots development for way too long until recently, access to course code data – which UCAS could help out with – is really holding back the development of grass roots apps around course choice and selection…if crappy license conditions are respected of course… (is there an “in the public interest” defence that could be mounted against respecting such terms and conditions?!)
Here’s the overall winning app, from St Andrews’ Another Team: UUG: the Unofficial University Guide
Many congrats and thanks to the local organisers Alex Bilbie, Nick Jackson, Joss Winn, Jamie Mahoney and any others I may have omitted (apols…) as well as UKOLN’s DevCSI co-ordinator Mahendra Mahey. Great stuff, chaps:-)
PS FWIW, here are my slides from the presentation I gave at the event, as well as a hack I did along the way…
Appropriate IT – My ILI2011 Presentation
Here’s a copy of the slides from my ILI2011 presentation on Appropriate IT:
One thing I wanted to explore was, if discovery happens elsewhere, and the role of the librarian is no longer focussed on discovery related issues, where can library folk help out? Here’s where I think we need to start placing some attention: sensemaking, and knowing what’s possible (aka helping redistribute the future that is already around us;-) Allied with this is the idea that we need to make more out of using appropriate IT for particular tasks, as well as appropriating IT where we can to make our lives easier.
In part, sensemaking is turning the wealth of relevant data out there into something meaningful for the question or issue at hand, or the choice we have to make. My own dabblings with social network analysis are approaches I’m working on that help me make sense of interest networks and social positioning within those networks so I can get a feel for how those communities are structured and who the major actors are within them.
As far as knowing what’s possible, I think we have a real issue with “folk IT” knowledge. Most of us have a reasonable grasp of folk physics and folk psychology. That is, we have a reasonable common-sense model of how the world works at the human scale (let go of an apple, it falls to the floor), and we can generally read other people from their behaviour; but how well developed is “folk IT” knowledge? Given that to most people the idea that you can search within a page in a wide variety of electronic documents using crtrl-F as a keyboard shortcut to a “search within page/document” feature is alien to them, I think our folk understanding of IT is limited to the principle of “if you switch it off and on again it should start working again”.
Folk IT is also tied up with computational thinking, but at a practical, “human scale”. So here are a few ideas I think the librarians need to start pushing:
- the idea of a graph; it’s what the web’s based around, after all, and it also helps us understand social networks. If you think of your website as a graph, with edges representing links that connect nodes/pages together, and realise that your on-site homepage is whatever page someone lands on from a search engine or third party link, you soon start to realise that maybe your website is not as usefully structured as you thought…
- some sort of common sense understanding of the role that URLs/URIs play in the browser, along with the idea that URIs are readable and hackable and also may say something about the way a website, or the resources it makes available, organised;
- the notion of “View Source”, that allows you to copy and crib the work of others when constructing your own applications, along with the very idea that you might be able to build web pages yourself out of free standing components.
- the idea of document types and applications that can work all sorts of magic given documents of that type; the knowledge that an MP3 file works well with an audio player or audio editor, for example, or that a PNG or JPG encodes an image, along with more esoteric formats such as KML (paste a URL to a KML file into the search box of a Google Maps search and see what happens, for example…). Knowledge of the filetype/document type gives you some sort of power over it, and helps you realise what sorts of thing you can do with it… (except for things like PDF, for example, which is to all intents and purposes a “can’t do anything with it” filetype;-)
I also think an understanding of pattern based string matching and what regular expressions allow you to do would go a long way towards helping folk who ever have to manipulate text or text-based data files, at least in terms of letting them know that there are often better ways of cleaning up a text file automagically rather than having to repeat the same operation over and over again on each separate row in file containing several thousand lines… They don’t need to know how to write the regular expression from the off, just that the sorts of operation regular expressions support are possible, and that someone will probably be able to show you how to do it…
Slides from OU Rise Library Analytics Workshop: Rambling about Visualisation
For what it’s worth, slides from my presentation yesterday… As ever, they’re largely pointless without commentary…
… and even with the commentary, it was all a bit more garbled than usual (I forgot to breathe, had no real idea in my own mind what I wanted to say, etc etc…)
On reflection, here’s what I took from thinking back about what I should have tried to say:
- my assumption is that folk who are interested in asking data related questions should feel as if they can actually work with the data itself (direct data manipulation); I appreciate this is already way off the mark for some people who want someone else to work the data and then just read reports about it – but then that means you can’t ask or discover your own questions about the data, just read answers (maybe) to questions that someone else has asked, presented in a way they decided;
- you need to feel confident in working with data files – or at least, you need to be prepared to have a go at working with data files! (Bear in mind that many of the blog posts I write are write ups – of a sort – of how to do something I didn’t know how to do a couple of hours before… The web usually has answers to most of the questions that I come up against – and if I can’t find the answers, I can often request them via things like Twitter or Stack Overflow…) This can range from using command line tools, to using applications that let you take data in using one format and getting it out as another);
- different tools do different things; if you can get a dataset into a tool in the right way, it may be able to do magical things very very easily indeed…
- three tools that can do a lot without you having to know a lot (though you may have to follow a tutorial or two to pick up the method/recipe….or at least recognise a picture you like and a dataset whose shape you can replicate using your own data, and then the ability to see which bits you need to cut and paste into the command line…):
-=- Gephi: great for plotting networks and graphs. It can also be appropriated to draw line charts (if you can work out how to ‘join the dots’ in the data file by turning the line into a set of points connected by edges) or scatter plots (just load in nodes – no edges connecting them – and lay it out using Gephi’s geolayout tool which also lets you plot “rectilinear” plots based on x and y axis values; (I haven’t worked out a reliable way of working with CSV in Gephi – yet…); it’s amazing what you can describe as a graph when you put your mind to it…
-=- gnuplot: command line tool for plotting scatter plots and line graphs (eg from time series) using data stored in simple text file (e.g. TSV or CSV)
-=- R (and ggplot if you’re feeling adventurous and want :pretty”, nicely designed graphs out); another command line tool (I find R-Studio helps) that again loads in data from a CSV file; R can generate statistical graphs very easily from the command line (it does the stats calculations for you given the raw data).
- Visual analytics/graphical data analysis is a process – you tease out questions and answers through directly manipulating the data and engaging with it in a visual way;
- when you see a visualisation you like, look at it closely: what do you see? Spending five mins or so looking at a Gestalt psychology/visual perception tutorial will give you all sorts of tricks and tips for how to construct visualisations so that structure your eye can detect will jump out at you;
- I think I may have confused folk talking about “dimensions”: what I meant what, how many columns could you represent in a given visulisation at the same time, if each data point corresponds to a single row in a data set. So for example, if you have an x-y plot (2 dimensions), with different symbols (1 dimension) available for plotting the points, as well as different colours (1 dimension) and different possible size (1 dimension) for each symbol, along with a label (1 dimension) for each point, and maybe control over the size (1 dimension), colour (1 dimension) and even font (1 dimension) applied to the label, you might find you can actually plot quite a few columns/dimensions for each data point on your chart… Whether or not you can actually decipher it is another matter of course! My Gephi charts generally have 2 explicit dimensions (node size and colour), as well as making use of two spatial dimensions (x, y) to lay out points that are in some sense “close” to each other in network space. It’s worth remembering though, that if you’re using a tool to engage in a conversation with a dataset as you try to get it to tell its story to you, it may not matter that the visualisation looks a mess to anyone else (a bit like an involved conversation may not make sense if someone else suddenly tries to join it). (Presentation graphics, on the other hand, are usually designed to communicate something that the data is trying to say to another person in a very explicit way.)
- working with data is a tactile thing… you have to be prepared to get your hands dirty…
My Presentation at OU Statistics Conference – Visualisation Tools for the Rest of Us
Slides from my presentation to the OU Visualisation and Presentation in Statistics earlier today… will update this post with notes and links as an when I get round to it! In the meantime, you’ll have to use Google…(though other search engines are available). (Slodes via Slideshare)
Just Do IT Yourself… MY UKSG Presentation
As ever, the slides really need a commentary, but just in case they are useful – my slides from UKSG11:
(It’s short of links, but if you Google appropriate application names it should get you there…;-)
PS Following the presentation, I’ve had a couple of queries about Yahoo Pipes… Here’s a starter guide: Pipes Book – Imaginings. (See also: OUseful.info blog posts on Yahoo Pipes.)
And if you thought you knew what Google could do for you…. What can Google do for you?
TSO OpenUP Competition – Opening Up UCAS Data
Here’s the presentation I gave to the judging panel at the TSO OpenUp competition final yesterday. As ever, it doesn’t make sense with[out] (doh!) me talking, though I did add some notes in to the Powerpoint deck: Opening up UCAS Course Code Data
(I had hoped Slideshare would be able to use the notes as a transcript, bit it doesn’t seem to do that, and I can’t see how to cut and paste the notes in by hand?:-(
A quick summary:
The “Big Idea” behind my entry to the TSO competition was a simple one – make UCAS course data (course code, title and institution) avaliable as data. By opening up the data we make it possible for third parties to construct services and applications based around complete data skeleton of all the courses offered for undergraduate entry through clearing in a particular year across UK higher education.
The data acts as scaffolding that can be used to develop consumer facing applications across HE (e.g. improved course choice applications) as well as support internal “vertical” activities within HEIs that may also be transferable across HEIs.
Primary value is generated from taking the course code scaffolding and annotating it with related data. Access to this dataset may be sold on in a B2B context via data platform services. Consumer facing applications with their own revenue streams may also be built on top of the data platform.
This idea makes data available that can potentially disrupt the currently discovery model for course choice and selection (but in its current form, not in university application or enrollment), in Higher Education in the UK.
Here are the notes I doodled to myself in preparation for the pitch. Now the idea has been picked up, it will need tightening up and may change significantly! ;-) Which is to say – in this form, it is just my original personal opinion on the idea, and all ‘facts’ need checking…
But when selected to pitch the idea, it became clear that an application or two were also required, or at least some good business reasons for opening up this data…
So here we go…
Postgraduate students and Open University students do not go through UCAS. Other direct entry routes to higher education courses may also be available.
According to UCAS, in 2010, there were 697,351 applicants with 487,329 acceptances, compared with 639,860 applications and 481,854 acceptances in 2009. [ Slightly different figures in end of cycle report 2009/10? ]
For convenience, hold in mind the thought that course codes could be to course marketing, what postcodes are for geo related applications… They provide a natural identifier that other things can be associated with.
Associated with each degree course is a course code. UCAS course codes are also associated with JACS codes – Joint Academic Coding System identifiers – that relate to particular topics of study. “The UCAS course codes have no meaning other than “this course is offered by this institution for this application cycle”.” link]
“UCAS course code is 4 character reference which can be any combination of letters and numbers.
Each course is also assigned up to three JACS (Joint Academic Coding System) codes in order to classify the course for *J purposes. The JACS system was introduced for 2002 entry, and replaced UCAS Standard Classification of Academic Subjects (SCAS). Each JACS code consists of a single letter followed by 3 numbers. JACS is divided into subject areas, with a related initial letter for each. JACS codes are allocated to courses for the *J return.
The JACS system is used by the Higher Education Statistics Agency (HESA), and is the result of a joint UCAS-HESA subject code harmonization project.
JACS is also used by UK institutions to identify the subject matter of programmes and modules. These institutions include the Department for Innovation, Universities and Skills (DIUS), the Home Office and the Higher Education Funding Council for England (HEFCE).”
—
Keywords: up to 10 keywords per course are allocated to each course from a restricted list of just over 4,500 valid keywords.
“Main keyword: This is generally a broad subject category, usually expressed as a single word, for example ‘Business’.
Suggested keyword (SUG): Where a search on a main keyword identifies more than 200 courses, the Course Search user is prompted to select from a set of secondary keywords or phrases. These are the more specific ‘Suggested keywords’ attached to the courses identified. For example, ‘Business Administration’ is one of a range of ‘Suggested keywords’ which could be attached to a Business course (there are more than 60 others to choose from). A course in Business Administration would typically have this as the ‘Suggested keyword’, with ‘Business’ as the main keyword.
However, if a course only has a ‘Suggested keyword’ and not a related ‘Main keyword’, the course will not be displayed in any search under the ‘Main keyword’ alone.
Single subject: Main keywords can be ticked as ‘Single subject’. This means that the course will be displayed by a keyword search on the subject, when the user chooses the ‘single subject’ option below. You may have a maximum of two keywords indicated as single subjects per course.”
“Between January and March 2010, approximately 600,000 unique IP addresses access the UCAS course code search function. During the same time period, almost 5 million unique IP addresses accessed the UCAS subject search function.” [link]
—
“New courses from 2012 will be given UCAS codes that should not be used for subject classification purposes. However, all courses will still be assigned up to three individual JACS3 codes based on the subject content of the course.
An analysis of unique IP address activity on the UCAS Course Search has shown that very few searches are conducted using the course code, compared to the subject search function. UCAS Courses Data Team will be working to improve the subject search and course keywords over the coming year to enable potential applicants to accurately find suitable courses.” [link]
—
Course code identifiers have an important role to play within a university administrations, for example in marshalling resources around a course, although they are not used by students. (On the other hand, students may have a familiarity with module codes.) Course codes identify courses that are the subject of quality assessment by the QAA. To a certain extent, a complete catalogue of course codes allows third parties to organise offerings based around UK higher education degrees in a comprehensive way and link in to the UCAS application procedure.
- the release of horizontal data across the UK HE sector by HEIs, such as course catalogue information;
- vertical scaffolding within an institution for elaboration by module codes, which in turn may be associated with module descriptions, reading lists, educational resources, etc.
- the development across HE of services supporting student choice – for example “compare the uni” type services
XCRI is JISC’s preferred way of doing this, and I think there has been some lobbying of HEFCE from various JISC projects, but I’m not sure how successful it’s been?
Also context of data burden on HEIs, reporting to Professional, Statutory and Regulatory Bodies – PSURBS.
Reconciliation with HESA Institution and campus identifiers, as well as the JISCMU API and Guardian Datablog Rosetta Stone spreadsheet
By hosting course code data, and using it as scaffolding within a Linked Data cloud around HE courses, a valuable platform service can be made available to HEIs as well as commercial operations designed to support student choice when it comes to selecting an appropriate course and university.
Opening up the data facilitates rapid innovation projects within HEIs, and makes it possible for innovators within an HEI to make progress on projects that span across course offerings even if they don’t have easy access to that data from their own institution.
CompareTheUni has had a holding page up for months – but will it ever launch? Uni&Books crowd sources module codes and associated reading links. Talis Aspire is a commercial reading list system that associates resources with module codes.
Guardian datablog picked up the post, and I still get traffic from there on a daily basis… [link ]
One demonstrator I built used a bookmarklet to annotate UCAS course pages with a link to a resource page showing what books had been borrowed by students on that course at Huddersfiled University. [Link ]
The course codes also provide hooks against which it may be possible to deploy mappings across skills frameworks, e.g. SFIA in IT world. The course codes will also have associated JACS subject code mappings and UCAS search terms, which in turn may provide weak links into other domains, such as the world of books using vocabularies such as the Library of Congress Subject headings and Dewey classification codes.
Marketing of services built on top of the data platform will need to be marketed to the target audience using appropriate channels. Specialist marketers such as Campus Group may be appropriate partners here.
For platform business – e.g. business model based around selling queries on linked/aggregated/mapped datasets. If you imagine a query returning results with several attributes, each result is a row and each attribute is a column, If you allow free access to x thousand query cells returned a day, and then charge for cells above that limit, you:
Encourage wider innovation around your platform; let people run narrow queries or broad queries. License on use of data for folk to use on their own datastores/augmented with their own triples.
Generate revenue that scales on a metered basis according to usage;
- offer additional analytics that get your tracking script in third party web pages, helping train your learning classifiers, which makes platform more valuable.
For a consumer facing application – eg a course choice site for potential appications is the easiest to imagine:
- Short term model would be advertising (e.g. course/uni ads), affiliate fees on booksales for first year books? Seond hand books market eg via Facebook marketplace?
- Medium term – affiliate for for prospectus application/fulfilment
Long term – affiliate fee for course registration
Google Apps as a Mashup Environment – Slides from #guug11
FWIW, here are the slides from my presentation on “Mashing Up Google Apps” at the excellent Google Apps UK User Group (#guug11), as hosted by Martin Hamilton at Loughbourough University yesterday.
The “mashup environment” diagram was generated using a desktop version of Graphviz, but it can also be generated using the Google Chart Tools Graphviz chart, as in the example below:
Here’s the “source code” for that image:
digraph googApps {
GoogleSpreadsheet [shape=Msquare]
GoogleCalendar [shape=Msquare]
GoogleMail [shape=Msquare]
GoogleDocs [shape=Msquare]
CSV [shape=diamond]
JSON [shape=diamond]
HTML [shape=diamond]
XML [shape=diamond]
GoogleAppsScript [shape=diamond]
"[GoogleVizDataAPI]" [shape=diamond]
"<GoogleForm>" [shape=doubleoctagon]
"<GoogleGadgets>" [shape=doubleoctagon]
"<GoogleVizDataCharts>" [shape=doubleoctagon]
"<GoogleMaps>" [shape=doubleoctagon]
CSV->URL
HTML->URL
XML->URL
event->GoogleAppsScript
GoogleAppsScript->"<GoogleMaps>"
GoogleAppsScript->GoogleMail
GoogleAppsScript->GoogleCalendar
GoogleAppsScript->GoogleSpreadsheet
GoogleSpreadsheet->GoogleAppsScript
GoogleAppsScript->GoogleDocs
GoogleSpreadsheet->JSON
email->GoogleMail
GoogleMail->email
GoogleDocs->GoogleAppsScript
GoogleCalendar->GoogleAppsScript
"<GoogleForm>"->event
event->GoogleSpreadsheet
time->event
"<GoogleForm>"->GoogleSpreadsheet
URL->GoogleSpreadsheet
GoogleSpreadsheet->"[GoogleVizDataAPI]"
"[GoogleVizDataAPI]"->"<GoogleVizDataCharts>"
GoogleSpreadsheet->"<GoogleGadgets>"
}
And finally, here’s a snapshot of the hashtag community around the event as of mid-morning yesterday:
Node colour is related to the total number of followers, and node size is betweenness centrality.
Scholarly Communication in the Networked Age
Last week, I was fortunate enough to receive an invitation to attend the Texts and Literacy in the Digital Age: Assessing the future of scholarly communication at the Dutch National Library in Den Haag (a trip that ended up turning into a weekend break in Amsterdam when my flight was cancelled…)
The presentation can be found here and embedded below, if your feed reader supports it:
One thing I have tried to do is annotate each slide with a short piece of discursive text relating to the slide. I need to find a way of linearising slide shows prepared this way to see if I can find a way of generating blog posts from them, which is a task for next year…
The presentation draws heavily on Martin Belam’s news:rewired presentation from 2009 (The tyranny of chronology), as I try to tease out some of the structural issues that face the presentation of news media in an online networked age, and constrast (or complement) them with issues faced by scholoarly publishing.
One of the things I hope to mull over more next year, and maybe communicate in a more principled way rather than via occasional blog posts and tweets, are the ways in which news media and academia can work together to put the news into some sort of deeper context, and maybe even into a learning (resource) context…
My Slides from the Data Driven Journalism Round Table (ddj)
Yesterday, I was fortunate enough to attend a Data Driven Journalism round table (sort of!) event organised by the European Journalism Centre.
Here are the slides I used in my talk, such as they are… I really do need to annotate them with links, but in the meantime, if you want to track any of the examples down the best way is to probably just search this blog ;-)
(Readers might also be interested in my slides from News:Rewired (although they make even less sense without notes!))
Although most of the slides will be familiar to longtime readers of this blog, there is one new thing in there: the first sketch if a network diagram showing how some of my favourite online apps can work together based on the online file formats they either publish or consume (the idea being once you can get a file into the network somewhere, you can route it to other places/apps in the network…)
The graph showing how a handful of web apps connect together was generated using Graphiz, with the graph defined as follows:
GoogleSpreadsheet -> CSV;
GoogleSpreadsheet -> “<GoogleGadgets>”;
GoogleSpreadsheet -> “[GoogleVizDataAPI]“;
“[GoogleVizDataAPI]“->JSON;
CSV -> GoogleSpreadsheet;
YahooPipes -> CSV;
YahooPipes -> JSON;
CSV -> YahooPipes;
JSON -> YahooPipes;
XML -> YahooPipes;
“[YQL]” -> JSON;
“[YQL]” -> XML;
CSV->”[YQL]“;
XML->”[YQL]“;
CSV->”<ManyEyesWikified>”;
YahooPipes -> KML;
KML->”<GoogleEarth>”;
KML->”<GoogleMaps>”;
“<GoogleMaps>”->KML;
RDFTripleStore->”[SPARQL]“;
“[SPARQL]“->RDF;
“[SPARQL]“->XML;
“[SPARQL]“->CSV;
“[SPARQL]“->JSON;
JSON-> “<JQueryCharts_etc>”;
I had intended to build this up “live” in a text editor using the GraphViz Mac client to display the growing network, but in the end I just showed a static image.
At the time, I’d also forgotten that there is an experimental Graphviz chart generator made available via the Google Charts API, so here’s the graph generated via a URL using the Google Charts API:
Here’s the chart playground view:
PS if the topics covered by the #ddj event appealed to you, you might also be interested in the P2PU Open Journalism on the Open Webcourse, the “syllabus” of which is being arranged at the moment (and which includes at least one week on data journalism) and which will run over 6 weeks, err, sometime; and the Web of Data Data Journalism Meetup in Berlin on September 1st.
From Uncourse to Short Course – CALRG Presentation
Slides from a presentation earlier today:
Doug blogged it here: CALRG Conf: Learning in Public


