PLENK2010 – Twitter Clusters

Playing around with looking at the structure of my own Twitter friends network (see recent previous posts) by using the Gephi modularity statistic to partition (or cluster) my Twitter network depending on the strengths of connections between members of that network, it struck me that I could take a similar approach to exploring the structure of the relations between the members of a Twitter list. So I grabbed the members of the PLENK2010 list (which I had automatically created by mining the Twapperkeeper archive of posts tagged with PLENK2010, and then adding frequent hashtaggers to the list), grabbed all their friends lists, and had a poke around the friends connections between the list members.

The Gephi modularity tool identified three medium sized clusters, one large cluster, and several smaller ones. Looking at the three middle sized clusters, let’s see who’s in each cluster, where they’re from (from their Twitter location info) and what their interests are (from their Twitter bio field).

Here’s the first cluster:

Plenk2010 - twitter cluster

PLENK2010 - location cluster

first geo interst cluster

Here’s the second cluster:

PLENK2010 twitter cluster

Anotehr PLENK2010 location cluster

UK cluster interests

And here’s the third:

PLENK2010 twitter cluster

A third PLEN2010 geo cluster

PLENK2010 German cluster

Not surprisingly, it seems as if geography still plays a role in defining networks…

There was also a large cluster identified in the original pass:

PLENK2010 twitter cluster

Here’s what they’re interested in:

PLENK2010 interests

And here’s where they’re from:

PLENK2010 - big cluster locale

Here’s what happens if we partition that large cluster by running the modularity tool over just the members of this cluster again:

PLENK2010 twitter community - tunnelling in

Do they make any sort of sense…?

So is this:

a) interesting?
b) useful?

If it’s useful – why? What can we do with this information?

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

12 thoughts on “PLENK2010 – Twitter Clusters”

  1. these are really cool, Tony.

    i’m curious. what would you say these end up representing? how do you talk about what strength of connection is?

    i’m asking b/c the top one, from PLENK, could be really helpful for visualizing the project George & Dave & i are working on on MOOCs, if you aren’t opposed to that. and – full disclosure – i’m also trying to wrap that project work into a quantitative research course i’m doing as a Ph.D requirement this fall…but i’m still just beginning to get my head around what these tell.

    thanks for the food for thought.

    1. All the images shown above are clusters identified from the PLEN2010 twitterers.

      The clustering Gephi uses is described in the paper linked to from (I was just about to go away and read it now).

      On my to do list is use clustering algorithms I can poke and play with, which I’ll probably steal from

      As to making use of the above – feel free. I can let you have copies of the code (though it’s really messy atm), though you’d ideally need a whitelisted Twitter API key to play with it. Or I can just grab a dump from the friends list every so often.

      Are you also looking at linkage structures between people’s blogs? Eg I started exploring the internal link structure of this blog here:

  2. I made some notes at a recent seminar given at the OU by Caroline Haythornthwaite on social network analysis.
    Her list of what network analysis reveals included:
    • Density
    • Cliques
    • Network stats
    • Network brokers
    • Isolates
    • Isolated cliques
    • Structural holes
    • Resource flow
    • Social structure
    So maybe one of the uses for these network diagrams is to identify structural holes and structural weaknesses?

  3. thanks, Tony & R3beccaF…i know Dave (Cormier, my partner) was happy to see the PLENK2010 representation too, so much appreciated. maybe we’ll sort out what it would be helpful for us to know for the project we’re working on and then take you up on the code offer?


  4. It tells me that we are succeeding with some of our design objectives.

    The intent of the course design is to distribute and diversify participation. We tell people to find their own unique perspective on the material and the commentary. In other writing I have described the desired outcome as a structure resembling ‘a community of communities’.

    The clustering represents the realization of this concept. There is less of a centre, and more of a distributed structure. Thus, the course communications are breaking into more easily manageable clusters. Each cluster has its own particular focus of interest. Nonetheless, these clusters revolve around a common overall theme, the course itself.

  5. Hi Tony,
    Very interesting SNA & graphs. I am interested in how this delineate the distribution and diversity of participation & interaction, and how it aligns with the course design as mentioned by Stephen. How to interpret the clusters in terms of quality & strength of the ties? That may require an analysis of the tweet cluster using some rubrics.


    1. Something that might be revealing is some sort of conversation analysis, looking at strength of ties in terms of the extent to which users send each other messages?

  6. These are great!
    Useful to me, and with a nod to Stephen’s post on the courses various purposes, would be to see:

    1. Removal of moderators/administrators (george, rita, stephen) from the graph as those relationships between members and them are to be expected and aren’t as useful. Then we can see how the other relationships are growing.

    1a. Run set each week so over the term of the course we can see how initial relationships of members (e.g. a couple of folks join up who already know each other) change, strengthen, weaken over the course.

    2. Line strength/width could represent how many posts back to each other (via Reply to’s) in the Moodle discussions environment. Better would be to also capture those points of contact via participants’ blogs and postings to them, retweets and replies in Twitter, etc, but I think that data set cumulatively would be practically impossible to get to in the near-term. Moodle traffic of post and response is a smaller universe to chew through.

    3. Size of name could be determined by factors such as (all philosophically debatable):

    3a. If you start and host discussions in Moodle, that is valued most highly. Number of posts within that discussion could count to strengthen the size of your name.

    3b. Also impacting name size could be the number of replies to others’ posts that you contribute to , but their weighting would not have as much value as if you start and host discussions.

    3c. If you post an individual blog via Moodle (which in our case is often a link to an externally-hosted blog) then that adds to your name size. You could get more granular here, measuring comments and responses to blog posts as a factor, but in our case I would discount that granularity as for Moodle blogs, folks aren’t really commenting in them and you don’t have an ability to measure the comments and responses on the externally-hosted blogs.

    3d. If you tweet an original thought/comment, your name size can grow; if you simply retweet that is a smaller factor (or perhaps none at all – unless you wanted to weight it based on number of followers that twitter account has as a measure of further influence and reach – again I think access to that data set is beyond the ability to get for this experiment).

    Theoretically, this could go toward a grading system (!) depending on the purpose of the overall course as determined by the administrators. Those who made the most number of connections of value over the course timeframe plus had the highest quality of influence as determined by factors above could be ranked highest.

    Or at the very least, when you overlay geographic location (actually there are a ton of cool data points you could overlay, but to keep it simple) you would see by name size the most influential members of the course in terms of both quality postings and in terms of making relationships with others that weren’t there before. Actually I think it would be interesting to study those two groups separately, as my hunch is that of real interest to me will those that excel in relationship-building (line numbers and strength of lines to and fro) but not necessarily in quality posting (size of name) which is the more traditional academic judgment of a student’s work.

    Just my thoughts…(I actually did something similar a few years back in setting up reputation ranking algorithms for a group of several million Canadians and their working with and postings about stock market data – but this was in business, not academia so it was proprietary and not published anywhere.)

Comments are closed.