Preliminary Thoughts on Visualising the OpenEd09 Twitter Network
The wealth of activity on the #opened09 Twitterstream keeps stealing my attention, and has got me thinking about how we might start to analyse the structure of the network around the hashtag, in part so we can understand information flow through that part of the open education network better.
So what’s come to mind? Here are some early sketches that I’m using as a foil to see what sort of questions they bring to my mind about the structure of the network. Note that this is a ‘first principles’/grass roots/bare bones approach – I’m not going to use any formal social network analysis techniques because I don’t know anything about them (yet….;-) But by starting with some doodles, the questions that arise may lead me to appreciating why the formal SNA tools and approaches are so useful in practical, easy to understand, human terms, and thus provide me with the motivation to learn how to use them…
First up, is there any value in understanding the structure of the network of people who are twittering with the #opened09 hashtag? Partly because of rate limiting on the Twitter API, last night I grabbed a sample of the most recent #opened09 hashtagged tweets, and then filtered them down to the most active twitterers over that period (people using the hashtag more than three times in the sample, I think it turned out to be). Then I pulled down their follower lists from the Twitter API, and constructed a graph of who followed who in that set. Here’s the Graphviz plot of that graph:
So what questions does this bring to mind? Well, off the top of my head:
- who is the most connected person in this graph? Does that tell us anything useful? Would we expect the event organiser to be the most connected?
- is there anyone in the network who isn’t very connected to other people? Why? Are they a different ‘sort’ of user? Are they new to the network?
- does the connectedness of people within the graph change over the course of the event? (I think that the Twitter API returns a list of followers in reverse chronological order; so we could approximate the growth by comparing the above graph with one that ignored the most recent 50(?!) followers of each person?
- are all Twitterers equal? Should we treat users who only ever use the hashtag as part of an RT differently when constructing this graph?
- is there any value in representing the number of followers each person has within the above graph? Or the number of people they follower? Or some function of the two? What about the number of times they used the hashtag in the sample period (or the number of times they RT the hashtag, or a function of the two) – should that be reflected too?
What’s the POINT of asking these questions? How about this – as individuals, can we identify members of the community who we don’t know? (IE is this sort of graph a good basis as a friend recommender? Would a big poster of this sort of graph be a good thing to post in the event coffee area? What would people look for in it if we did?)
Okay, next up: there’s always talk of things like twitter being used to amplify an event to ‘virtual participants’. How big might that audience be? And who comprises it? Are there people not at the event who effectively amplify it further?
How about a plot of the simple reach of the twitters, as a treemap?
Hmm… I’m not sure about that… Just because you have a big following doesn’t mean it’s a big relevant or interested following? (The number of people RTing the hashtag in your audience might?)
instead, how about this sort of graphic to help frame some questions:
This one shows, for the most active opened09 hashtag twitterers, the people who follow more than 12 (maybe?; or more than 11?) of them. The named individuals are heavy opened09 twitterers, the numbers are the Twiitter IDs of the people who are seeing the event amplified to them. (Not that these people may also have tweeted the hashtag, only not so heavily. Maybe I need a stop list that removes people from this ‘amplification graph’ who have used the hashtag? That way, we can identify ‘leaves’ on the #opened09 tree – that is, people who received some number of #opened09 tweets but who never used the hashtag?)
So what questions does it bring to mind:
- are there people receiving large numbers of opened09 tweets who are unknown to the community?
- do the opened twitterers fall into cliques or reasonably well clustered groups around sets of followers who aren’t tweeting? (Would a cluster analysis be an interesting thing to do here?)
- if we lower the sampling threshold that specifies the minimum number of heavy twitterers that a ‘listener’ is following, how does the size of the listening audience grow? Is this interesting? Does the numbr of people that a listener follows influence how likely they are to see opened09 tweets. (eg if i follow 20 opned09 heavy twitterers, and only 50 people in all, my traffic may be domintad by opened09 folk; if i follow 500, or 1000, or 2000, then that traffic is likely to be diluted?)
And the POINT? Can we get a feeling for the audience the event is being amplified to? Are there members of that audience who seem to be a member of the community but aren’t really known to the community? Can we find the lurkers and pull them in with a personal invite (and is this even ethical?)
Just by the by, looking at RT networks could also be interesting – that is, looking at patterns of RTing across the network. maybe a graph showing people who RTd hashtagged tweets, as well as the path back to the original tweet? (This brings to mind some of @mediaczar’s work looking at Twitter in a PR context – which is exaclty what event amplification is, right?)
So having got some questions in mind (if you have more, please add them as comments below), I’ve got some sort of rationale for having a look at some formal graph theory and social network analysis stuff. This looks like it could be a good place to start: M.E.J. Nwman – The mathematics of networks [PDF].
[UPDATE: I guess the heuristic I have in mind with respect to the charts and SNA is this: are there features from the visualisation that jump out at me that the SNA tools can also pick out?]