# Information Literacy, Graphs, Hierarchies and the Structure of Information Networks

Over dinner at Côte in Cambridge last week, during the Arcadia Project review event, I doodled a couple of data structures, one on either side of a scrap of paper, and asked my co-Arcadians what sort of thing the drawing might represent, or what the structures they described might be called in general terms.

The sketches were broadly along the lines of the following, though without the circular nodes and labels displayed, just a set of connecting lines:

and:

So if I asked you the same question (what would you call these two different things?), how would you answer?

To my mind, the different organisational structures these represent, and how we can exploit and manipulate them, represents a whole host of issues in the reimagining of information literacy and the teaching of information skills. This ranges from an understanding of the structure of information spaces through the representation and analysis of those structures, to ways in which we can navigate and discover things in those spaces as well as how we can visualise and otherwise make sense of them.

So how would I describe the two different things shown above? The first image represents a hierarchy and is often referred to as a tree. Many library classification schemes, and many organisational management structures, are based around that sort of information structure.

The second image is a depiction of a more general network structure. Whenever I talk about graphs on the OUsefu.info blog (in fact, pretty much whenever I talk about a graph anywhere), that’s the sort of thing I’m talking about. This mess of connections is the way the web is structured. (The tree structure is also a graph, but subject to particular constraints; can you work out what some of those constraint might be?)

Note: it’s maybe worth reiterating at this point when I talk about graphs, the messy network thing I mean, not line charts like this:

One of the terms I got to describe one of the graphs was “a matrix”. Matrices are in fact a very powerful way of describing the structure of a graph – if you fancy a treasure hunt, the terms adjacency matrix and incidence matrix should give you a head start…

I’m not sure what the problem is, but I think there is a problem that arises from not appreciating how powerful graph structures are as a way of making sense of the world. And I’m not really sure what I wanted to say in this post… except maybe go on a little fishing expedition to see how widespread the lack of familiarity with the notion of a graph as something like this:

really is…? So, if I asked you to draw a graph: a) what would you draw? b) would you even remotely consider drawing something the the image directly above? If you answered “no’ to (b), does it “say” anything to you at all?! Would you ever draw a diagram that had that flavour when explaining something (what?!) to someone else? (And the same question for the hierarchy…?)

PS a nice thing about graphs is you don’t have to draw them by hand – all you have to do is describe what connects to what, and then you can let a machine draw it for you. So for example:

– here is the “source code” for the tree
– here is the “source code” for the messy network graph

PPS when folk hear other folk wittering on about “the social graph”, what do they think it is? If asked to draw an indicative sketch of “the social graph”, what would they draw?!

1. sportstweet (@kpfssport)

Not a statistician, nor do I play one on the telly, but I tend to assume some sort of numerical values can be put to a graph, even the tree, there’s an obvious hierarchy there that can be given numbers. With the messy network graph the only thing it reminds me of is gene expression/protein effect network diagrams but they at least have arrows in. I can’t tell what it’s supposed to be telling me about, for example, the relationship between F and G. Do they only interact through D? Is H necessary for their interaction? That kind of thing.

• Tony Hirst

@kpfssport the question wasn’t supposed to be ‘what does this graph represent’ or even ‘what does it say’, it was more abstract than that: ‘what sort of thing might this (be used to) represent’, for example?

2. stuartbrown (@stuartbrown)

hmm, maybe a tree is more generally used to define the structure that will be added to data and a graph more generally used to represent the relationships between data that already exists (i.e for the former is data as we want it to be and the latter is data as it really is?)

• Wilbert Kraan

“tree is data as we want it to be and the graph is data as it really is” is great; it captures why I love graphs so much.

Well, that, and several experiences of trying to fit the world into trees.

I think Tony does have a point about trees being most intuitive for more people, though.

3. Tony Hirst

@stuart – now there’s an interesting thought – it’s observations like that I think we need to start collecting to try and scope out what the common perceptions are atm, and how they maybe represent stumbling blocks in communicating why things like the web, Linked Data etc etc are so powerful? Hierarchical classifications themselves are really powerful in another way of course – I’m always amazed by the way a 5 digit classmark can lead me directly to one specific volume out of tens or hundreds of thousands in a well stocked library…

@wilbert Agreed – I think trees are a really natural approach that underlies a lot of folk understanding relating to classification schemes? I think I one of my New Year Resolutions might have to be to start doing some proper work around ideas that fit under the banner of “folk IT and computing” [ cf. http://blog.ouseful.info/2011/10/31/appropriate-it-my-ili2011-presentation/ ]

4. Chris Rusbridge

In my early working days we used to use the term “networks” for the things you call graphs. As in PERT network in particular. I think the graph moniker is a bit unfortunate!

5. Derek Jones

tbh I would draw the traditional x-y axes ‘proper’ graph (as per @chris – when I was a boy etc.).

I tend to draw networks a lot when describing things in design (spatial relationships, workflows, team shapes, decision-making) – even when those things are not really networks (or maybe they are and I just don’t know it…).

Interestingly (and piggybacking @stuart’s comments), the networks I might start out with are usually nice and hierarchical and well ordered. As the discussion grows, the network becomes much more complex and suddenly it starts to look like no2 – once that happens it can stop making sense. It’s as if the ‘shape’ is too complex to completely understand in any meaningful way. Sometimes, if your lucky, it might retain a bit of hierarchy or a central node might appear. Mostly, it just gets messy.

This is (arguably) reflected in the code for the two graphs – no1 looks ‘easy’ to read (for a given value of easy) and no2 looks much harder to read. Dare I mention entropy?

The other possibly interesting thing is maybe scale. With 2 nodes it is quite hard to tell whether the network is hierarchical or not. With thousands of nodes (like any of your twitter maps, for example) you can suddenly see other scales of network on ‘top’ of the original and start picking out meta-hierarchies. It’s the bit in between that is hard to sort out – too big to see a simple structure and too small to see a meta-structure.

So how do we manage (conceptualise?) complex networks using natural language? Like family or social networks? Do we always look for or superimpose a hierarchy? I remember doing an urban study asking people to map their town/city and getting a lot of very hierarchical maps back on that (landmarks, roundabout, shops, etc) – so is it a hardwired thing to want to overlay or generate hierarchical shapes generally? But we must still (somehow) embed the complex nodes/links in those same maps and still come up with some order. Is it a PoV hierarchy of a complex network? i.e. the data might be no2 but I make myself see it as no1 (does that even make sense?)

6. Pingback: OER Visualisation Project: Beginnings of linking data from PROD to Google Spreadsheet and early fruit [day 8] #ukoer – MASHe
7. mhawksey

> PS a nice thing about graphs is you don’t have to draw them by hand

This may be an aside but its rare for me to draw a graph instead more often than not I have some data that the computer draws for me. And when I say draws for me more often than not it will suggest graph type based on the data. The danger with this rapid remix is we produce graphs that look nice but don’t communicate the information in the most effective way.

This is one of my favourite quotes from Ben Fry which sums this problem up nicely:

“Graphs can be a powerful way to represent relationships between data, but they are also a very abstract concept, which means that they run the danger of meaning something only to the creator of the graph. Often, simply showing the structure of the data says very little about what it actually means, even though it’s a perfectly accurate means of representing the data. Everything looks like a graph, but almost nothing should ever be drawn as one.” Ben Fry – ‘Visualizing Data’

• Tony Hirst

@Martin Is this where I get defensive?! OUseful.info isn’t intended to offer finished goods, just glimpses into processes, and maybe some of the tricks I’ve had to wrangle to make them work. Which I think is congruent with Fry’s approach;-) For me, many visualisations are transient things, interactive/dynamic tools for use as part of the sensemaking process. I think a lot of visualisations get posted that may be nmeaningful to the creator during, or at the end of, a discovery process, but are actually meaningless to anyone else (the context is all missing).
In context of networks/hairball visualisations, which a lot of folk have trouble with because the edges just turn to noise, for me they’re often best viewed as maps, with trust placed in the creator to use layout algorithms that are in some sense capable of projecting a meaningful (or at least useful!;-) geometry onto the data points and then rendering them appropriately. This is why I try to use clustering algorithms that are compatible with layout algorithms when I post my network viz.
The wider point of my post was that the hugely powerful idea/concept that is the very notion of graph as network is not widespread. And if folk active in the development of information skills/literacy programmes arenlt familiar with this notion, they won’t ever really grok why links are such powerful things…

• mhawksey

If you have to get defensive, then I would have to get defensive because most of my work and how it is presented is modelled on how you do it ;)

My interpretation of Fry’s quote underlines what you are saying. As creators we gain insight to the data but often the consumers lack the literacies to understand what is being presented. Perhaps an addendum to Fry’s quote should be ‘… but almost nothing should ever be drawn as one as the majority of people won’t understand it’.

This underlying issue is illustrated by a recent comment on my own blog