Last week, Andrew Stott, Director of Digital Engagement in the Cabinet Office, announced his retirement date over Twitter:
At the time of writing, @dirdigeng follows slightly over two thousand folk on Twitter, so I thought I’d have a quick look at who the “players” are…
The network described is constructed as follows:
– nodes represent the people followed by @dirdigeng on Twitter;
– a directed edge from A to B means that A is following B.
In the first view (randomly layed out, using Gephi), we plot node size as linearly proportional to the number of dirdigeng’s friends who are following each of the other friends (that is, the in-degree of each node), and colour proportional to their total number of followers (including people not friended by @dirdigeng).
The colour mapping is non-linear – @Number10gov, @guardiantach and @mashable have significantly more followers that the other nodes – and is set via the spline control:
If we run the betweenness centrality statistic, and size nodes accordingly, we can see how the various parts of the network may be connected. (“Betweenness centrality is a measure based on the number of shortest paths between any two nodes that pass through a particular node. Nodes around the edge of the network would typically have a low betweenness centrality. A high betweenness centrality might suggest that the individual is connecting various different parts of the network together.”)
We can also run the modularity class statistic to try to partition the friends into small networks with a high degree of internal connections. Here’s what we get (click through on the image to see it in more detail):
Modularity groups help us understand the structure of the network in a bit more detail. I’ve started to think they might also be used to automatically generate a seeding set of people who form a highly interconnected community with an interest in a particular topic and from a particular stance.
As well as looking at the structure of the network, we can also create a search engine over the home pages declared in the Twitter bios of @dirdigeng’s friends. My thinking here is that this might provide a useful constrained search engine over sites engaged in social media and with an interest in “Digital Britain”.
The simplest custem search engine simply uses the URLs from the Twitter bios of folk followedd by @dirdigeng and adds them to a “Digital Britain” Google Custom search engine. However, one attractive feature of the Google CSEs is that you can also tweak the rankings by weighting results from different domains differently to give a “weighted” custom search engine.
As a quick experiment, I produced one weighted search engine where I set the score for each domain to be the normalised number of followers amongst @dirdigeng’s friends community. (That is, the domain score equalled the indegree of a node in the @dirdigEng friends network, divided by the total number of people in that network).
As you can see from the above, the results differ… Whether there is any improvement in the ranking of results is another thing. (There is also the question of how best to score, or boost, rankings based on networks stastics, and the extent to which rankings should be determined by friends network factors…)
It also strikes me that the modularity groups might also be used to inform the setup of a CSE. For example, separate modularity groups/classes may be used to define refinement label, allowing users to just search pages from members of a particular modularity class, or boost the results from those people.
And finally, I wonder whether we can mine the tweets of @dirdigeng’s friends, as well as those of @dirdigeng, to provide raw material for additional advice for searchers?
Quite surprised that there are no comments on this post as yet, even though it is full of lovely details on graph manipulation and, most importantly, the rendering of them!
I’m not sure how well subjective measures, like importance, interest and so on can be rendered by graph manipulation – not because your measure of interest is better or worse than any other – but because almost by definition, everyone will have a different idea of what they expect for a graph of ‘interestingness’.
Analysis of the data lets you find out which accounts *link* others, the betweeness factor, but you are missing data that would be crucial to see if a certain account *linked* two other accounts. You may know, follow or have unfollowed an account, but it may take a RT or comment on an interesting message by a peer you trust, to make you take notice of that account. Not quite as simple as A RTs B, C reads A’s RT, starts to follow B :) Sentiment analysis perhaps?
The approach I’ve been taking so far is to look at just the internal connectedness of folk who are either followed by a particular person (as in this case), or who are on a particular list or using a particular hashtag (e.g. see https://blog.ouseful.info/2010/09/23/plenk2010-twitter-clusters/ for a look at folk using the PLENK2010 hashtag). Working in a ‘reactive planning’ mode (i.e. I have no real plan around what I’m doing;-), a couple of easy next steps have been the creation of search engines around those communities, the creation of Twitter lists based on hashtag users, and the partitioning of communities using the modularity measure to see whether or not different coherent groupings exist within a partition.
If you look at the PLENK2010 post, you’ll see I played around with the creation of word clouds for each partition based on the Twitter bios of folk in each partition, as well as their location. At the moment, I accessing the modularity algorithm through Gephi, which requires a manual step, but what I intend to do is rewrite my code using networkx so I can then run the modularity measure (using this extension probably: http://perso.crans.org/aynaud/communities/index.html ) via the command line. The PLENK2010 word clouds were generated using tag cloud, so what I also want to do is implement simple word feature detection and create my own weighted lists from bios/location fields automatically. A further step would be to roughly geocode the location field, and output a KML layer showing the geographical distribution of the community members.
As I see it, I’m just trying to pick low hanging fruit, though as I climb the apple tree, some of it may end up being not so close to the ground, though always only ever a branch away!;-)
Tony, I don’t really quite understand the analysis you’re doing. I was prompted to write this comment following a graph you built based on use of the #rswebsci hashtag. I commented on twitter that I appeared in the graph (as cardcc), even though I wasn’t there. I have no problem with appearing, it’s just that it didn’t seem quite right; my participation was simply through re-tweets or replies, and seemed to be of quite a different order to the participation by the original tweeters.
I started this with: “@psychemedia I showed up in the small print of your #rswebsci tweet visualisation but wasn’t there. Better to ignore RTs?”
You wrote: “@cardcc point of hashtag communities I’m constructing is not to do with who’s at event, it’s to do with who’s interested in topic via tag”
My response: “@psychemedia surely there’s significant difference between sources and spreaders? My RTs represent “interest” << original tweeter interest!"
You responded:
"@cardcc that said, i do sometimes set a switch to only incl folk who have used the tag more than a specified number of times…"
"@cardcc re retweeters… i guess it depends on how you want to define the community. I'm using it in part for custom search engine creation"
I thought the #rswebsci traffic was interesting, but there were a _lot_ of RTs, which sometimes made it hard to get a feel for what was being reported. You're right, they do express interest in the topic, in spreading the word to their own followers. I've just got a feeling that there are two distinct groups here: those who only or mostly send original tweets, and those who only or mostly RT or reply.
Since I'm not sure what your aims were, I don't know how important that distinction might be. But since you said this was exploratory, follow your nose stuff, I thought it's a point worth making.
@chris – thanks for the comment, and the recap of our Twitter conversation. There are a couple of things I am trying to do in (at the moment) a rather informal way.
One is discovering people who may be interested in (or ideally, knowledgeable about) a particular topic based on their interest in a particular online event, as disclosed through their engagement with an event hashtag (I loosely refer to these as “hashtag communities”). An extension of this is creating a legacy from the event, e.g. in the form of a custom search engine over websites associated with the members of those communities.
What I am not so interested in is capturing who was actually at an event, though this is maybe something worth considering. Just because someone is RTing doesn’t mean they aren’t interested in, knowledgeable about, or even not at that event. For example, I’ve been at several events where there are folk present who tend to RT rather than originate backchannel commentary becuase they see themselves as amplifying the event to a wider (not physically present) audience.
On the matter of analysing the structure of the hashtag communities, I’m still just exploring. From looking at the PLENK2010 hashtag community [ https://blog.ouseful.info/2010/09/23/plenk2010-twitter-clusters/ ], it was possible to identify several subgroups based on friends relationships that appeared to make sense in terms of different interests or locales of the members of those subgroups. I’m not sure how useful this is, though it may give some insight into the particular interests or contexts of individuals based on which subgroup they are most tightly interconnected to (e.g. can we identify commercial interests in the academic publishing [ https://blog.ouseful.info/2010/09/10/so-what-do-simple-hashtag-community-visualisations-tell-us/ ], or different departmental groupings within folk who all work for the same institution [ for example, librarian spotting in the OU: https://blog.ouseful.info/2010/09/23/digging-deeper-into-the-structure-of-my-twitter-friends-network-librarian-spotting/ ]
At the end of the day, I’m interested in who’s interested in the event, and who participated in the communication of ideas relating to the event. Identification with a hashtag is feature that is very easy to get a handle on, as is membership of a particular curated twitter list, where the curator has grouped people together for some reason (and network analysis/biographical detail word clouds may reveal what is in common? Cf the word clouds I generated around the PLENK2010 subgroups.)
What I haven’t started doing (yet) is looking at the structure of networks outside the community. Eg the graphs plotted above look at friends links between friends of @dirdigeng. One possible next step is to look at the indegree of all the friend of all the friends of @dirdigeng to see if there are folk whom @dirdigeng is not following that are maybe worth following. (Twitter is getting a lot of praise for its friend recommendations at the moment, though I’m not sure how it’s identifying them. One thing I intend to look at down the line is how hashtag and list communities might provide signals for topic based friend recommendations, as well as for event discovery (e.g. hashtags on a twitter list: https://blog.ouseful.info/2010/09/08/deriving-a-persistent-edtech-context-from-the-altc2010-twitter-backchannel/ )
As to understanding the “analysis” I’m doing – I have no idea either…;-) I’m just looking for quick wins relating to knowledge discovery and event legacies that have a life beyond the event, know matter who was there… What’s important for me is the interest signal they flag through association with a particular hashtag, for example.
That said, with respect to the “different order [of] participation”, another things on my to do list is to look at what ranking factors we can include in custom search engines derived from hashtag communities, and the extent to which individuals originate, reply to or retweet hashtagged comments may have a role to play in that.
Thanks Tony. That all makes sense.
One additional idea that occurred to me just now as I watch a TimBL tweet RT’d for the umpteenth time, is that the density of RTs says something about the strength of that particular meme or concept. OTOH maybe it’s just things that hit the funny bone, like the “bitsam and netsam” comment!
Yes… so maybe that’s a ranking factor too – every RT is a weak vote, essentially, in favour of the person being RT’d… (though it may be as much a vote in favour of the judgement of the person being RT’d – eg when the original tweet was reporting what someone else said – as it is in what the person being RTd says themselves?)