A Few More Thoughts on the Forensic Analysis of Twitter Friend and Follower Timelines in a MOOCalytics Context

Immediately after posting Evaluating Event Impact Through Social Media Follower Histories, With Possible Relevance to cMOOC Learning Analytics, I took the dog out for a walk to ponder the practicalities of constructing follower (or friend) acquisition charts for accounts with only a low number of followers, or friends, as might be the case for folk taking a MOOC or who have attended a particular event. One aim I had in mind was to probe the extent to which a MOOC may help developing social ties between folk taking a MOOC, whether MOOC participants know each other prior taking the MOOC, or whether they come to develop social links after taking the MOOC. Another aim was simply to see whether we could identify from changes in velocity or makeup of follower acquisition curves whether particular events led either to growth in follower numbers or community development between followers.

To recap on the approach used for constructing follower acquisition charts (as described in Estimated Follower Accession Charts for Twitter, and which also works (in principle!) for plotting when Twitter users started following folk):

  • you can’t start following someone on Twitter until you join Twitter;
  • follower lists on Twitter are reverse chronological statements of the order in which folk started following the corresponding account;
  • starting with the first follower of an account (the bottom end of the follower list), we can estimate when they started following the account from the most recent account creation date seen so far amongst people who started following before that user.

A methodological problem arises when we have a low number of followers, because we don’t necessarily have enough newly created (follower) accounts starting to follow a target account soon after the creation of the follower account to give us solid basis for estimating when folk started following the target account. (If someone creates a new account and then immediately uses it to follow a target account, we get a good sample in time relating to when that follower started following the target account…If you have lots of people following an account there’s more of a chance that some of them will be quick-after-creation to start following the target account.)

There may also be methodological problems with trying to run an analysis over a short period of time (too much noise/lack of temporal definition in the follower acquisition curve over a limited time range).

So with low follower numbers, where can we get our timestamps from?

In the context of a MOOC, let’s suppose that there is a central MOOC account with lots of followers, and those followers don’t have many friends or followers (certainly not enough for us to be able to generate smooth – and reliable – acquisition curves).

If the MOOC account has lots of followers, let’s suppose we can generate a reasonable follower acquisition curve from them.

This means that for each follower, fo_i, we can associate with them a time when they started following the MOOC account, fo_i_t. Let’s write that as fo(MOOC, fo_i)=fo_i_t, where fo(MOOC, fo_i) reads “the estimated time when MOOC is followed by fo_i”.

(I’m making this up as I’m going along…;)

If we look at the friends of fo_i (that is, the people they follow), we know that they started following the MOOC account at time fo_i_t. So let’s write that as fr(fo_i, MOOC)=fo_i_t, where fr(fo_i, MOOC) reads “the estimated time when fo_i friends MOOC”.

Since public friend/follower relationsships are symmetrical on Twitter (if A friends B, then B is at that instant followed by A), we can also write fr(fo_i, MOOC) = fo(MOOC, fo_i), which is to say that the time when fo_i friends MOOC is the same time as when MOOC is followed by fo_i.

Got that?!;-) (I’m still making this up as I’m going along…!)

We now have a sample in time for calibrating at least a single point in the friend acquisition chart for fo_i. If fo_i follows other “celebrity” accounts for which we can generate reasonably sound follower acquisition charts, we should be able to add other timestamp estimates into the friend acquisition timeline.

If fo_i follows three accounts A,B,C in that order, with fr(fo_i,A)=t1 and fr(fo_i,C)=t2, we know that fr(fo_i,B) lies somewhere between t1 and t2, where t1 < t2, let’s call that [t1,t2], reading it as [not earlier than t1, not later than t2]. Which is to say, fr(fo_i,B)=[t1,t2], or “fo_i makes friends with B not before t1 and not after t2”, or more simply “fo_i makes friends with B somewhen between t1 and t2”.

Let’s now look at fo_j, who has only a few followers, one of whom is fo_i. Suppose that fo_j is actually account B. We know that fo(fo_j,fo_i), and furthermore that fo(fo_j,fo_i)=fr(fo_i,fo_j). Since we know that fr(fo_i,B)=[t1,t2], and B=fo_j, we know that fr(fo_i,fo_j)=[t1,t2]. (Just swap the symbols in and out of the equations…) But what we now also have is a timestamp estimate into the followers list for fo_j, that is: fo(fo_j,fo_i)=[t1,t2].

If MOOC has lots of friends, as well as lots of followers, and MOOC has a policy of following back followers immediately, we can use it to generate timestamp probes into the friend timelines of its followers, via fo(MOOC,X)=fr(X,MOOC), and its friends, via fr(MOOC,Y)=fo(Y,MOOC). (We should be able to use other accounts with large friend or follower accounts and reasonably well defined acquisition curves to generate additional samples?)

We can possibly also start to play off the time intervals from friend and follower curves against each other to try and reduce the uncertainty within them (that is, the range of them).

For example, if we have fr(fo_i,B)=[t1,t2], and from fo(B,fo_i)=[t3,t4], if t3 > t1, we can tighten up fr(fo_i,B)=[t3,t2]. Similarly, if t2 < t4, we can tighten up fo(B,fo_i)=[t3,t2]. Which I think in general is:

if fr(A,B)=[t1,t2] and fo(B,A)=[t3,t4], we can tighten up to fr(A,B) = fo(B,A) = [ greater_of(t1,t3), lesser_of(t2,t4) ]

Erm, maybe? (I should probably read through that again to check the logic!) Things also get a little more complex when we only have time range estimates for most of the friends or followers, rather than good single point timestamp estimates for when they were friended or started to follow…;-) I’ll leave it as an exercise for the reader to figure hout how to write that down and solve it!;-)]

If this thought experiment does work out, then a several rules of thumb jump out if we want to maximise our chances of generating reasonably accurate friend and follower acquisition curves:

– set up your MOOC Twitter account close to the time you want to start using it so it’s creation date is as late as possible;
– encourage folk to follow the MOOC account, and follow back, to improve the chances of getting reasonable resolution in the follower acquisition curve for the MOOC account. These connections also provide time-estimated probes into follower acquisition curves of friends and friend acquisition curves of followers;
– consider creating new “fake” timestamp Twitter accounts than can immediately on creation follow and be friended by the MOOC account to place temporal markers into the acquisition curves;
– if followers follow other celebrity accounts (or are followed (back) by them), we should be able to generate timestamp samples by analysing the celebrity account acquisition curves.

I think I need to go and walk the dog again.

PS a couple more trivial fixed points: for a target account, the earliest time at which they were first followed or when they first friended another account is the creation date of the target account; the latest possible time they acquired their most recent friend or follower is the time at which the data was collected.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

12 thoughts on “A Few More Thoughts on the Forensic Analysis of Twitter Friend and Follower Timelines in a MOOCalytics Context”

  1. Still scratching my head a bit too now, Tony:-) think this could work but there is a little nagging voice at the back of my mind asking how it would actually help with the engagement side of things. I’d still want to drill down a bit more to see how clusters were forming and how long they were active for. So we could see when people started following a MOOC, if they started following/ were followed by others following that MOOC account too and if/when they started following each other etc – if you see what I mean. I’m wondering about some kind of mash-up with Martin’s hash tag explorer . . .

    1. Sheila – I see Twitter as two graphs – a dynamic conversation graph (tweets to nowhere, around tags, between people) and a static friend/follower graph. The dynamic tweets have timestamps – we can see when a tweet was sent and we can search for tweets/conversations within a particular time. I think that’s the sort of activity you’re most interested in?

      I’m more interested in the static graph, (which may also relate to when folk get to *see* something, rather than talk about something).

      The previous post was in part a pondering on: can I work out whether an @imascientist event that took place on a particular date was responsible for: a) acquiring followers for @imascientist; b) promoting connections between followers of @Imascientist. This is in part related to “evidencing impact” where impact is taken as the creation of social ties. (So maybe rather than impact it’s a measure of some sort of conversion, and impact comes later…)

      This naturally(?) led me to wonder: can I use friend/follower acquisition curves from a MOOC account to identify whether particular MOOC events led to the creation of social ties on Twitter. As I realised at the time, and as you commented, finding when someone started following someone else using the technique I was using doesn’t work at all well when you don’t have many followers, because you don’t have enough samples to estimate follower times very well and you don’t know the time between a follower creating their account and following your account.

      What I tried to do in this post was see whether I could use evidence from *other* accounts to put timestamp markers into friend/follower acquisition curves for accounts with few friends or followers. The reason? So I could see if A started to follow B around the time both A and B did the MOOC, or whether A started to follow C around the time the MOOC recommended C as part of a suggested reading exercise. It’s probably not very useful, and it’s probably not even very interesting, but then, I feel the same way about most so-called learning analytics;-) At best, they’re interesting academic exercises…!

      1. Thanks Tony – that makes more sense to me now, and I can see there being a use for it. Just now I trying to see if these analytics thanks can actually work/ have any impact from a learners point of view (most of the time they see to be for teachers/course teams).


      2. Just a thought, couldn’t you just use the emails Twitter sends your MOOC account about new followers? :-)

  2. So you might be in luck. As part of the new Twitter API you get some enhanced metadata back which includes friend/follower counts for the time the tweet was collected http://mashe.hawksey.info/2013/03/twitter-throws-a-bone-increased-hits-and-metadata-in-twitter-search-api-1-1/. As part of TAGSv5.0 the default is to capture this information as part of the archive. So here is some sample data for you:

    #moocmooc (2013) – this one is playing up a bit
    #ocTEL (ongoing)



    1. @martin I’m guessing the friend/follower count is user data current at the time the API is called, not friend/follower count data for the user at the time the tweet as created?

      1. That’s my guess. Unfortunately I’ve got no record if the archive was been collected every hour or every day

        1. @Martin ah – ping…light bulb goes on… got you… you have records of user counts at different sample times… hmmmm…and we can get estimates for when those sample times were from the tweets…? There is a lower bound/earliest time for when the user data was collected given by the timestamp of the tweet it’s associated with…

          1. Just remembered I had a #cfhe12 archive still going (added it to the folder). Script was triggered to run every 15 minutes when I opened it (if set like this for the course it should be an accurate count).

            1. @martin issue that now comes to mind is churn… if folk are blocked, deleted as spam, or unfollow, they presumably fall out of the follow count? That is to say, if I grab all your followers and generate a follower acquisition curve for you and find that 50 days ago you appeared to have X followers, then I found an archived tweet of yours captured 50 days ago, it might report you as having Y followers. My estimates of followers P, Q, R days ago are based on the followers you have now… Still, could be interesting to log follower count on a daily basis, then compare this count numbers with numbers from recreated curve?

Comments are closed.

%d bloggers like this: