Graph Structure of an Open Science Notebook – “Linked Science” FTW…

Early days on this, but what, if anything, can we look from looking at the link structure of an open science lab-book, based on the use of hyperlinks between pages in the lab-book?

A couple of days ago, I started informally bouncing ideas around with @cameronneylon about quick wins/low hanging fruit visualisations around his open science notebook (a full description of our conversation – and indeed the whole history of this ad hoc “mini-project” – can be found on Cameron’s blog: A little bit of federated Open Notebook Science). So here are a couple of Gephi takes on the lab-book (original data/scripts can be found from the github links in Cameron’s post.)

The lab notebook identifies different types of post, which can be used to colour the graph:

Lab notebook - colour modules by section type

The network graph also shows the presence of highly linked “procedure” type nodes relating to a particular experimental procedure. If we apply the ego filter to the graph we can get a close look at which posts are connected to a procedure:

Applying a gephi ego filter to a set of linked posts from a lab notebook

If we run the modularity statistic, we can automatically partition the posts into groupings of posts that are linked together – here they are grouped by modularity class:

Different modularity classes

We can expand different class nodes to see the posts associated with them:

Modularity partitions on lab book, partially expanded

Here’s one close up:

Modularity class

If we apply the ego network, we see the modularity cluster does seem to have acted in a meaningful way:

Module identified - ego network applied

Notice though that we lose sight of the internal link structure within that modularity class that was evident in the previous image.

Was that connect node important in some way?

Close up of internal structure of connected node in modulairity class

With his intimate knowledge of the experiments recorded in the lab book, Cameron also observed that Gephi has (largely) successfully clustered the correct posts together [according to protein classification] and thus separate the purifications from each other based only on connectivity. This suggests that even if posts aren’t explicitly tagged by a particular experiment, the link analysis may be useful in finding posts that are related to a particular experiment; in cases where a post is included in one group and links out to another, it may indicate some some sort of relationship between the separate clusters, such as a shared reagent.

So why might the visualisation of the whole notebook be a useful thing to do? My take on it is that the visualisation acts as a macroscope.

As Jonathan Schull put it in the Macroscope Manisfesto:

Most natural patterns are not easily perceived, for they do not happen to produce lasting stimuli to which our nervous systems are attuned. But everything we know about biology, epidemiology, social networks, computational algorithms and data structures, tells us that branching patterns are “out there”, waiting to be mapped, illuminated, seen-anew. In the last few decades new data sources, new data-analytic tools, and new tracking techniques have become available to scientists and school children. It is now possible to envision a “macroscope” that present these invisible but ubiquitous patterns to human perceptual systems so that they would engage our innate ability to perceive millions of leaves as scores of trees…and a forest

For me, Gephi can act as a macroscope in the way it reveals structure from across the whole of Cameron’s open science lab book in a single image, and allows us to interrogate the lab book from a variety of perspectives in an interactive way.

The approach is amenable to displaying structures aggregated from across multiple blogs, as long as they link to teach other. It may also serve to identify related processes, as for example when modularity clusters are connected by one or more links.

And what might this suggest as a baby next step Open Notebook Science? Well I can’t help thinking that maybe open Lab Notebooks should also be publishing their link graph, with URI referenced external links as well as internal links included… then we can create some big graphs across notebooks and start to see what might fall out…

Linked Science FTW ;-)

PS One think that may or may not be missing from the above – links to a video demonstrating each procedure, if appropriate, on a visual protocols site. Just by the by, here’s a Google custom search engine I created some time ago that implements a Science Experimental Protocols Videos meta-search engine. (It doesnlt turn up anything for /Purification of sortase/ though;-(