OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for October 2010

OUseless.info

with 13 comments

OUseless.info

Written by Tony Hirst

October 11, 2010 at 3:30 pm

Posted in Anything you want

Legacy Systems Management – Time For a Cobol Course?!;-)

with 4 comments

Listening to Jeff Papows discussing “Glitch: The Impact of Faulty Software” on Technometria via my IT Conversations podcast feed this morning, (book available here), I started wondering whether there was an opportunity for a Masters level course on managing legacy systems, which might include a unit or two on programming in Cobol?

Whilst not the most fashionable of languages, Cobol still keeps many financial and enterprise systems running, and the old hand maintainers are now starting to retire, if they haven’t already.

Taking the thought a little further, it also occurred to me that many legacy systems may well have been designed according to old-fashioned and no longer popular architectural styles, styles that are not necessarily taught to today’s students… which means if you’re faced with managing a legacy system, you may not have the right model of it in mind. So what’s the way round this? Maybe picking up old course materials, contemporary to the time that the systems were put in place, and using them to provide an almost archaeological overview of the styles an approaches used to the engineer the legacy systems that still need managing.

Related to managing legacy systems is the idea of digital preservation strategies, a forward looking complement to legacy systems which seeks to identify processes, procedures and representations that ensure today’s digital creations don’t become an unreadable legacy tomorrow.

I knew the OU already ran a course on Learning from Information System failures, but it seems there’s also one on Information systems legacy and evolution, though I’m not sure it teaches any Cobol?!;-) So the OU’s already on the case… I wonder how popular the course is…?

In fact, I’m not sure if the OU ever did a Cobol course…? Hmmm… is there an “old OU courses catalogue” anywhere? I know the OU Library acts as archive to OU courses, so maybe I can track one down through the Library catalogue…? The answer is “yes, (sort of… no..)”:

OU cobol course

Here it is – :

OU Cobol course http://voyager.open.ac.uk/vwebv/holdingsInfo?searchId=223&recCount=10&recPointer=4&bibId=22133

…though I don’t think it’s available in a digital/scanned form…

Hmm… ;-)

PS I wonder… for quirky advanced users, would a “course materials archive” filter be a useful to the library catalogue search…? What would happen if we ran search terms coming in to the OU website – and on the OU website – over archived course descriptions, to see if any old courses are actually starting to pick up long tail interest…?

Written by Tony Hirst

October 7, 2010 at 10:15 am

Posted in OU2.0

Orange Visual Visualisation Tool

with 6 comments

A few days ago, I came across a drag’n'drop, wire it together visualisation and data analysis tool called Orange.

Here’s a quick run through of some of the basics (at least, a run through of the first few things I tried to do with the tool…)

First off, we need some data. Orange likes TSV (tab separated values) rather than CSV, so I grabbed some TSV from one of the Guardian Datastore spreadheets on Google Docs (use “Save as Text” to get the tab separated value format…)

TSV from google docs

Orange is a canvas based visual programming environment, in which functional blocks are added the the canvas and certain parameters set within the block. Here’s how we get some data into Orange from a TSV file:

Orangie viz tool - import data

The File icon is giving me a warning (no dependent variable) but I’m not sure why…? I’m sure Orange has managed to detect labels and quantities correctly from other files I’ve tried?

Anyway… we can inspect the data by looking at it in a data table widget – just wire one in:

Orange viz tool - data table

The table is sortable by column, and the Report button can be used to save a version of the table. Looking t the data table, we see it has identified columns with missing entries. We can clean these from out data set using the Preprocessing widget:

Orange - data cleaning

If we now wire the output of the Processing widget into the Scatterplot widget, we can generate a variety of scatterplots:

Orange scatterplot

If you want to save a copy of the chart, it’s easy enough to do so. (I can’t get colour palettes to work on my Mac, so I’m stuck with greyscale displays. Also, the blob sizing doesn’t seem very responsive…)

Orange - save a scatterplot

The Report tool allows us to create a report from various bits of the dataflow, including adding information from several widgets to either separate report pages or the same report page.

Orange - report generator

Saving a Report saves all the report pages to a navigable set of HTML pages that resemble the Orange Report viewer.

Here are a couple of other things we can do with the data, this time using a data set that isn’t throwing the “dependent variable missing” error, in particular the distribution of comments in a small Friendfeed network…

So for example, here’s how the number of comments made by members of the network is distributed:

Orange - distribution of values

Alternatively, we may look at the distribution in a more “statistical” way:

Orange - simple distributions

(Remember, we can generate these reports interactively, and then add them to a growing report.)

The survey plot gives us a macroscopic birds eye view over the whole of the data set:

Orange - survey plot

Okay, that’s enough for starters – hopefully you get the idea: wire stuff together and generate visual reports… So why not go and download Orange now?!;-)

There are a whole range of clustering tools, too, which look like they could be interesting…

And I think the platform is extensible, which means there’s a way of adding your own widgets (written in Python, maybe..?)

Written by Tony Hirst

October 6, 2010 at 2:45 pm

Posted in Visualisation

Tagged with ,

It’s All About Flow…

with 5 comments

One of the compelling features of Yahoo Pipes for me is the way the the user interface encourages you think of programming in terms of pipelines and feeds, in which a bundle of stuff (RSS feed, CSV data, or whatever) is processed in a sequence of steps (the pipeline), with each step being applied to each item in the feed.

A few days ago I blogged about pipe2py, a toolkit from Greg Gaughan that lets you “compile” a simple Yahoo pipe into a Python code equivalent programme (Yahoo Pipes Code Generator (Python)). Given that, in general, I don’t believe the “build it and they will come” mantra, I spent half an hour or so this morning looking round the web for people who had posted queries about how to generate code equivalents of Yahoo Pipes, so that I could point them to pipe2py.

In doing so, I came across a couple of other visual pipeline environments that are maybe worth looking at in a little more detail.

PyF is a “[flow based] open source Python programming framework and platform dedicated to large data processing, mining, transforming, reporting and more.”

PyF - flow based pythin programming

On the other hand, Orange claims to offer “[o]pen source data visualization and analysis for novice and experts. Data mining through visual programming or Python scripting. Components for machine learning. Extensions for bioinformatics and text mining. Packed with features for data analytics.”

Here’s one of their promo shots:

Orange - piped visual data analysis

I haven’t had a chance to play with either of these environments – and probably won’t for a little time yet – so whilst I feel like I’m cheating by posting about them in such a cursory way without having even a simple demo to show, they’re maybe of interest to anyone who stumbles across this blog by way of pipe2py… [Update: my Orange Visualisation tool review).]

PS as well as PyF, see also: Pypes [via @dartdog]

Written by Tony Hirst

October 4, 2010 at 10:12 am

Graph Structure of an Open Science Notebook – “Linked Science” FTW…

with 13 comments

Early days on this, but what, if anything, can we look from looking at the link structure of an open science lab-book, based on the use of hyperlinks between pages in the lab-book?

A couple of days ago, I started informally bouncing ideas around with @cameronneylon about quick wins/low hanging fruit visualisations around his open science notebook (a full description of our conversation – and indeed the whole history of this ad hoc “mini-project” – can be found on Cameron’s blog: A little bit of federated Open Notebook Science). So here are a couple of Gephi takes on the lab-book (original data/scripts can be found from the github links in Cameron’s post.)

The lab notebook identifies different types of post, which can be used to colour the graph:

Lab notebook - colour modules by section type

The network graph also shows the presence of highly linked “procedure” type nodes relating to a particular experimental procedure. If we apply the ego filter to the graph we can get a close look at which posts are connected to a procedure:

Applying a gephi ego filter to a set of linked posts from a lab notebook

If we run the modularity statistic, we can automatically partition the posts into groupings of posts that are linked together – here they are grouped by modularity class:

Different modularity classes

We can expand different class nodes to see the posts associated with them:

Modularity partitions on lab book, partially expanded

Here’s one close up:

Modularity class

If we apply the ego network, we see the modularity cluster does seem to have acted in a meaningful way:

Module identified - ego network applied

Notice though that we lose sight of the internal link structure within that modularity class that was evident in the previous image.

Was that connect node important in some way?

Close up of internal structure of connected node in modulairity class

With his intimate knowledge of the experiments recorded in the lab book, Cameron also observed that Gephi has (largely) successfully clustered the correct posts together [according to protein classification] and thus separate the purifications from each other based only on connectivity. This suggests that even if posts aren’t explicitly tagged by a particular experiment, the link analysis may be useful in finding posts that are related to a particular experiment; in cases where a post is included in one group and links out to another, it may indicate some some sort of relationship between the separate clusters, such as a shared reagent.

So why might the visualisation of the whole notebook be a useful thing to do? My take on it is that the visualisation acts as a macroscope.

As Jonathan Schull put it in the Macroscope Manisfesto:

Most natural patterns are not easily perceived, for they do not happen to produce lasting stimuli to which our nervous systems are attuned. But everything we know about biology, epidemiology, social networks, computational algorithms and data structures, tells us that branching patterns are “out there”, waiting to be mapped, illuminated, seen-anew. In the last few decades new data sources, new data-analytic tools, and new tracking techniques have become available to scientists and school children. It is now possible to envision a “macroscope” that present these invisible but ubiquitous patterns to human perceptual systems so that they would engage our innate ability to perceive millions of leaves as scores of trees…and a forest

For me, Gephi can act as a macroscope in the way it reveals structure from across the whole of Cameron’s open science lab book in a single image, and allows us to interrogate the lab book from a variety of perspectives in an interactive way.

The approach is amenable to displaying structures aggregated from across multiple blogs, as long as they link to teach other. It may also serve to identify related processes, as for example when modularity clusters are connected by one or more links.

And what might this suggest as a baby next step Open Notebook Science? Well I can’t help thinking that maybe open Lab Notebooks should also be publishing their link graph, with URI referenced external links as well as internal links included… then we can create some big graphs across notebooks and start to see what might fall out…

Linked Science FTW ;-)

PS One think that may or may not be missing from the above – links to a video demonstrating each procedure, if appropriate, on a visual protocols site. Just by the by, here’s a Google custom search engine I created some time ago that implements a Science Experimental Protocols Videos meta-search engine. (It doesnlt turn up anything for /Purification of sortase/ though;-(

Written by Tony Hirst

October 3, 2010 at 10:42 pm

UK Open Data Guidance Resources

with 2 comments

This is a live post where I will try to collect together advice relating to the release and use of open public data in the UK, as much for my own reference as anything… (I guess this really should be a wiki page somewhere…?)

Written by Tony Hirst

October 1, 2010 at 9:00 pm

Posted in Data

Tagged with ,

Follow

Get every new post delivered to your Inbox.

Join 126 other followers