Do Retweeters Lack Commitment to a Hashtag?

I seem to be going down more ratholes than usual at the moment, in this case relating to activity round Twitter hashtags. Here’s a quick bit of reflection around a chart from Visualising Activity Around a Twitter Hashtag or Search Term Using R that shows activity around a hashtag that was minted for an event that took place before the sample period.

The y-axis is organised according to the time of first use (within the sample period) of the tag by a particular user. The x axis is time. The dots represent tweets containing the hashtag, coloured blue by default, red if they are an old-style RT (i.e. they begin RT @username:).

So what sorts of thing might we look for in this chart, and what are the problems with it? Several things jump out at me:

  • For many of the users, their first tweet (in this sample period at least) is an RT; that is, they are brought into the hashtag community through issuing an RT;
  • Many of the users whose first use is via an RT don’t use the hashtag again within the sample period. Is this typical? Does this signal represent amplification of the tag without any real sense of engagement with it?
  • A noticeable proportion of folk whose first use is not an RT go on to post further non-RT tweets. Does this represent an ongoing commitment to the tag? Note that this chart does not show whether tweets are replies, or “open” tweets. Replies (that is, tweets beginning @username are likely to represent conversational threads within a tag context rather than “general” tag usage, so it would be worth using an additional colour to identify reply based conversational tweets as such.
  • “New style” retweets are diaplayed as retweets by colouring… I need to check whether or nor newstyle RT information is available that I could use to colour such tweets appropriately. (or alternatively, I’d have to do some sort of string matching to see whether or not a tweet was the same as a previously seen tweet, which is a bit of a pain:-(

(Note that when I started mapping hashtag communities, I used to generate tag user names based on a filtered list of tweets that excluded RTs. this meant that folk who only used the tag as part of an RT and did not originate tweets that contained the tag, either in general or as part of a conversation, would not be counted as a member of the hashtag community. More recently, I have added filters that include RTs but exclude users who used the tag only once, for example, thus retaining serial RTers, but not single use users.)

So what else might this chart tell us? Looking at vertical slices, it seems that news entrants to the tag community appear to come in waves, maybe as part of rapid fire RT bursts. This chart doesn’t tell us for sure that this is happening, but it does highlight areas of the timelime that might be worth investigating more closely if we are interested in what happened at those times when there does appear to be a spike in activity. (Are there any modifications we could make to this chart to make them more informative in this respect? The time resolution is very poor, for example, so being able to zoom in on a particular time might be handy. Or are there other charts that might provide a different lens that can help us see what was happening at those times?)

And as a final point – this stuff may be all very interesting, but is it useful?, And if so, how? I also wonder how generalisable it is to other sorts of communication analysis. For example, I think we could use similar graphical techniques to explore engagement with an active comment thread on a blog, or Google+, or additions to an online forum thread. (For forums with mutliple threads, we maybe need to rethink how this sort of chart would work, or how it might be coloured/what symbols we might use, to distinguish between starting a new thread, or adding to a pre-existing one, for example. I’m sure the literature is filled with dozens of examples for how we might visualise forum activity, so if you know of any good references/links…?! ;-) #lazyacademic)

Friends of the Community: Who’s Effectively Following a Hashtag

Picking up on @briankelly’s Thoughts on ILI 2010, where he reports on a few gross level stats about #ili2010 hashtag activity grabbed from Summarizr, here are a few things I observed from looking at some of the hashtag community network stats…

To start with, I looked at the “inner hashtag community” where I grab the list of hashtaggers and their friends who have also used the hashtag and make links between them to give this sort of graph, as used before in many posts:

ILI2010 hashtaggers

(Directed graph from person to friend (i.e. to person they follow); node size proportional to in-degree, heat to out-degree.)

After running a few network statistics generated using Gephi, and exporting the data from the Gephi Data Table view, I uploaded the statistics data to IBM’s ManyEyes site here. This allows us to view the distribution of the hashtaggers based on various statistical and network measures using a range of other visualisation techniques, such as histograms (view interactive histogram chart for ILI2010 hashtaggers, interactive scatterplot)

So for example, here’s the distribution of hashtaggers by total number of followers (that is, including followers outside the hashtag community) as a histogram:

ILI2010 hashtaggers - total numbers of followers

If we look at the betweenness measure, which was calculated over the friends connections between the hashtaggers, we can see who’s best suited to getting a message broadcast across the community through direct and friend-of-a-friend links:

ILIhashtaggers - inner frineds betweenness

If we look at the in-degree (number of people in the hashtag community who have friended (i.e. are following) an individual, divided by the total number of friends of that individual, we can identify people who are being followed by more people in the community than they have as friends:

ILI2010 hashtaggers - in-degree divided by total friends

If we look at the in-degree divided by a users total number of followers, we can see the extent to which a person’s twitter feed is dominated by updates from folk who have used the ILI2010 hashtag:

ILI2010 hashtaggers - ectent to which stream is dominated by hashtaggers

In the above case, we see one person who appears to only follow members of the ILI2010 hashtag community. (I’m guessing that if folk come to twitter through a conference, this might be a signature of that?) Before you get too excited though, a little more digging suggests that that person only follows 1 person;-)

The interactive scatterplot allows us to view 3 dimensions of data – in the following case, ‘m looking for well connected (good betweenness centrality), well respected (high in-degree) folk in the hashtag community who also have a large reach in terms of their total number of followers:

ILI2010 hashtaggers - scatterplot

In terms of audience development, we can also create a network based on the complete follower lists of the ILI2010 hashtaggers. Creating such a graph generates a network with 71627 nodes, of which 236 were hashtaggers – meaning that in principle 71,391 people outside the hashtag community might have seen an ILI2010 hashtagged tweet…

Using a directed graph from hashtaggers to their followers, If we filter the graph to only show individuals with an in-degree above 60, say, we can see those people who are following at least 60 people who have used the hashtag:

ILI2010 hashtagger followers

In the way I have constructed this graph, the nodes showing Twitter usernames are in the hashtag community, the numerical IDs are individuals who didn’t use the ILI2010 hashtag but who do follow at least 60 people who did, and therefore presumably saw quite a lot of tweets about the event.

Looking up the twitter IDs of the “friends of the hashtag community”, we see the following people did not use the hashtag over the sample period, but do follow lots of people who did: @ijclark, @aekins, @metalibrarian, @schammond, @Jo_Bo_Anderson, @research_inform, @tomroper, @facetpublishing, @DavidGurteen

Of course, to know the extent to which hashtagger activity dominates the twitterstream of this “friends of the ahshtag community”, we’d need to normalise this against their total number of friends; because for exampe If I follow 20k people, of which 60 were hashtaggers, I’d probably miss most of the hashtagged tweets; whereas, if I follow 100 people, of which 60 are hashtaggers, the density of tweets received from hashtaggers could be expected to be quite high.

Okay – enough for now… although if you can think of anything else that might be interesting to know about the wider community around the hashtaggers, please post it in a comment below:-)

Structural Differences in Hashtag Communities: Highly Interconnected or Not?

In several recent posts, I’ve shown a variety of network diagrams based on who’s following whom in various twitter hashtag community networks. In this post, I thought I show a couple more, demonstrating the power of the visual approach for getting a quick feel for the structure of a particular community.

First up, here’s the inner friends graph for the #cam23 hashtag, which is used predominantly by a handful of Cambridge Unibiversity librarians who opted in to their local 23 things programme:

A highly interconnected hashtag commnunity network - cam23

(Node size is proportional to the number of incoming friends links; colour is proportional to the number of outgoing links.)

So what do we see? Pretty much everyone in this network is following a large number of other folk in the network, and is being followed by a large number. The network is highly interconnected. Messages don’t necessarily need tagging in order to ensure that the message gets distributed across the network because most folk are connected most other folk.

(It’s never that simple of course. The likelihood of someone seeing a message from a particular person in their network is a function of, amongst other things, the number of people they follow, the frequency at which those people post, and so on.)

Now let’s look at a hashtag around a different sort of event – the Isle of Wight #Bestival. Here’s a sample from that hashtag community:

Bestival hashtag community - not so much a twitter community

In this case, we see lots of small blue dots, disconnected from other folk in the network. A couple of nodes are well connected, such as @ventnorblog, the Isle of Wight’s hyperlocal news site. Generally, if we wanted to broadcast a message to the #bestival hashtag community, the only way we could hope to would be by tagging a message appropriately and hoping they had a search running on that tag.

If we run the Gephi connected components tool, we can group nodes that are comnected to each other. In the image below, the large blue circle is the “collapsed” network centred around ventnorblog and redfunnel (the table of the left shows that the majority of twitterers sampled fall into this group). Another, smaller network, shown in exploded form, has also been identified:

bestival hashtag community - connected components

Now let’s go back to the cam23 community, and consider the sociability of everyone in the community. In the following image, node size is prooportional to the total number of friends, and colour is proportional to the total number of followers:

Cam23 sociability - node size is tot no. of friends, colour tot followers

So red nodes have a large number of followers/wide broadcast reach outside the hashtag community, and large nodes show that the node has a wide hinterland, and receives messages from a large number of folk outside the hashtag community.

If we now plot “1-total_friends” as the node size and in-degree/incoming links as the colour (incoming links are links from followers), we can get an indication of the extent to which the tweets an individual sees are dominated by tweets from the hashtag community (size) – that is, large size means the person’s network is dominated by folk in the hashtag network – and the extent to which the a person’s tweets reach out into the hashtag network (colour; more red means that person’s tweets are seen by more of the hashtag community).

cam23 - node size is 1- total friends; colour is in degree

Small red nodes mean that a person has wide reach into the hashtag community, but that they follow a lot of other people, so hashtagged tweets may be drowned out. Large red nodes show that a person’s friend network is dominated by the hashtag community and that they are widely followed within it.

(Note that originally there were a couple of nodes that looked like “rogue” nodes, so I sized all nodes with zero incoming links to zero size; alternatively I could have filtered the graph to only show nodes with at least one incoming link.)

PS in addition to explicitly opt-in communities, such as hashtag networks, it strikes me that we could also start considering the structure of incidental/passive inclusion topic networks by searching for folk who are using particular key terms, rather than searching over hashtags?)

Mulling Over an Idea for Hashtag Community Maturity Profiles

A couple of weeks ago, I put started cobbling together some clunky scripts to collate network data files from lists of people twittering with a particular hashtag (First Glimpses of the OUConf10 Hashtag Community). I’ve got a Twapperkeeper key now, so the next step is to pull archived hashtagged tweets from there to generate my hashtaggers list, and then use that data as the basis for pulling in friends and followers links for particular individuals from the Twitter API.

One thing I’d like to start pulling together is a set of tools for providing network and backchannel analysis around hashtag communities. Andy Powell has already published a site that summarises hashtag activity in the form of Summarizr using a Twapperkeeper archive:


So what else might we look for?

Mulling over my own Personal Twitter Networks in Hashtag Communities, the metrics I report include:

– Number of hashtaggers [Ngalaxy]
– Hashtaggers as followers (‘hashtag followers’) [Gfollowers]
– Hashtaggers as friends (‘hashtag friends’) [Gfriends]
– Hashtagger followers not friended (‘serfs’) [Gserfs]
– Hashtagger friends not following (‘slebs’) [Gslebs]
– Hashtaggers not friends or followers (‘the hashtag void’) [Gvoid]
– Reach into hashtag community [Greach=Gfollowers/Ngalaxy]
– Reception of hashtag community the proportion of the the hashtag community that are followed by (i.e. are friends of) the named individual; [Greception=Gfriends/Ngalaxy]
– Hashtag void (normalised) [Normvoid=Gvoid/Ngalaxy]
– Total personal followers the total number of followers of the named individual [Nfollowers]
– Total personal friends: the total number of friends of the named individual [Nfriends]
– Hashtag community dominance of personal reach: the extent to which the hashtag community dominates the set of people who follow the named individual, [Domreach=Gfollowers/Nfollowers]
– Hashtag community dominance of personal reception: the extent to which the set of the named individual’s friends is dominated by members of the hashtag community, [Domreception=Gfriends/Nfriends]

Anyway, it strikes me that calculating those measures as means (and standard deviations) across all the members of the network, along with more traditional social network analysis network centrality or clustering measures, might help identify different signatures relating to the maturity of different hashtag communities (for example, the extent to which they are just forming, or the extent to which they have largely saturated in terms of members knowing each other).

These metrics might also change over the course of an event being discussed via a particular hashtag.