OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Estimated Follower Accession Charts for Twitter

Just over a year or so ago, Mat Morrison/@mediaczar introduced me to a visualisation he’d been working on (How should Page Admins deal with Flame Wars?) that I started to refer to as an accession chart (Visualising Activity Around a Twitter Hashtag or Search Term Using R). The idea is that we provide each entrant into a conversation or group with an accession number: the first person has accession number 1, the second person accession number 2 and so on. The accession number is plotted in rank order on the vertical y-axis, with ranked/time ordered “events” along the horizontal x-axis: utterances in a conversation for example, or posts to a forum.

A couple of months ago, I wondered whether this approach might also be used to estimate when folk started following an individual on Twitter. My reasoning went something like this:

One of the things I think is true of the Twitter API call for the followers of an account is that it returns lists of followers in reverse accession order. So the person who followed an account most recently will be at the top of the list (the first to be returned) and the person who followed first will be at the end of the list. Unfortunately, we don’t know when followers joined, so it’s hard to spot bursty growth in the number of followers of an account. However, it struck me that we may be able to get a bound on this by looking at the dates at which followers joined Twitter, along with their ‘accession order’ as followers of an account. If we get the list of followers and reverse it, and assume that this gives an ordered list of followers (with the follower that started following the longest time ago first), we can then work through this list and keep track of the oldest ‘created_at’ date seen so far. This gives us an upper bound (most recent date) for when followers that far through the list started following. (You can’t start following until you join twitter…)

So for example, if followers A, B, C, D in that accession order (ie started following target in that order) have user account creation dates 31/12/09, 1/1/09, 15/6/12, 5/5/10 then:
– A started following no earlier than 31/12/09 (because that’s when they joined Twitter and it’s the most recent creation date we’ve seen so far)
– B started following no earlier than 31/12/09 (because they started following after B)
– C started following no earlier than 15/6/12 (because that’s when they joined Twitter and it’s the most recent creation date we’ve seen so far)
– D started following no earlier than 15/6/12 (because they started following after C, which gave use the most recent creation date seen so far)

That’s probably confused you enough, so here’s a chart – accession number is along the bottom (i.e. the x-axis), joining date (in days ago) is on the y-axis:

recencyVacc

NOTE: this diverges from the accession graph described above, where accession number goes on the y-axis and rank ordered event along the x-axis.

What the chart shows is an estimate (the red line) of how many days ago a follower with a particular accession number started to follow a particular Twitter account.

As described in Sketches Around Twitter Followers, we see a clear break at 1500 days ago when Twitter started to get popular. This approach also suggests a technique for creating “follower probes” that we can use to date a follower record: if you know which day a particular user followed a target account, you can use that follower to put a datestamp into the follower record (assuming the Twitter API returned followers in reverse accession order).

Here’s an example of the code I used based on Twitter follower data grabbed for @ChrisPincher (whose follower profile appeared to be out of sorts from the analysis sketched in Visualising Activity Around a Twitter Hashtag or Search Term Using R). I’ve corrected the x/y axis ordering so follower accession number is now the vertical, y-component.

require(ggplot2)

processUserData = function(data) {
    data$tz = as.POSIXct(data$created_at)
    data$days = as.integer(difftime(Sys.time(), data$tz, units = "days"))
    data = data[rev(rownames(data)), ]
    data$acc = 1:length(data$days)
    data$recency = cummin(data$days)

    data
}

mp_cp <- read.csv("~/code/MPs/ChrisPincher_fo_0__2013-02-16-01-29-28.csv", row.names = NULL)

ggplot(processUserData(mp_cp)) +  geom_point(aes(x = -days, y = acc), size = 0.4) + geom_point(aes(x = -recency, y = acc), col = "red", size = 1)+xlim(-2000,0)

Here’s @ChrisPincher’s chart:

cp_demo

The black dots reveal how many days ago a particular follower joined Twitter. The red line is the estimate of when a particular follower started following the account, estimated based on the most recently created account seen to date amongst the previously acceded followers.

We see steady growth in follower numbers to start with, and then the account appears to have been spam followed? (Can you spot when?!;-) The clumping of creation dates of accounts during the attack also suggests they were created programmatically.

[In the “next” in this series of posts [What Happened Then? Using Approximated Twitter Follower Accession to Identify Political Events], I’ll show how spikes in follower acquisition on a particular day can often be used to “detect” historical news events.]

PS after coming up with this recipe, I did a little bit of “scholarly research” and I learned that a similar approach for estimating Twitter follower acquisition times had already been described at least once, at the opening of this paper: We Know Who You Followed Last Summer: Inferring Social Link Creation Times In Twitter – “We estimate the edge creation time for any follower of a celebrity by positing that it is equal to the greatest lower bound that can be deduced from the edge orderings and follower creation times for that celebrity”.

Written by Tony Hirst

April 5, 2013 at 10:31 am

Posted in Rstats

Tagged with

5 Responses

Subscribe to comments with RSS.

  1. [...] try out a simple bit of feature detection around approximated follower acquisition charts (e.g. Estimated Follower Accession Charts for Twitter) to see if I could detect dates around which there were spikes in follower [...]

  2. [...] the technique described in Estimated Follower Accession Charts for Twitter, we can estimate a follower acquisition growth curve for the @imascientist Twitter [...]

  3. [...] recap on the approach used for constructing follower acquisition charts (as described in Estimated Follower Accession Charts for Twitter, and which also works (in principle!) for plotting when Twitter users started following folk): you [...]

  4. […] my colleagues Darryl Woodford, Troy Sadkowsky and I came across on Tony Hirst’s fabulous blog OUseful.info a few months ago – and as Tony points out, there’s also a Microsoft Research paper which […]

  5. […] the fact that Twitter helpfully lists followers in the order of what we have come to call their ‘accession’: from Abbott’s first followers all the way through to his most […]


Comments are closed.

Follow

Get every new post delivered to your Inbox.

Join 787 other followers

%d bloggers like this: