“The Follower Factory” – Way Back When vs Now

A news story heavily doing the rounds at the moment in the part of the Twitterverse I see is a post from the New York Times called The Follower Factory which describes how certain high profile figures feel the need to inflate their social media follower counts by purchasing followers.

Such behaviour isn’t new, but the story is good one – a useful one – to re-tell every so often. And the execution is a nice bit of scrollytelling.

A few years ago, I posted a recipe for generating such charts (Estimated Follower Accession Charts for Twitter) and applied it to the followers of UK MPs at the time, cross-referencing spikes in follower acquisition against news items in the days before, and on the day of, and hike in numbers or the rate of acquisition of numbers (What Happened Then? Using Approximated Twitter Follower Accession to Identify Political Events).

My thinking at the time was that bursts in follower acquisition, or changes in rate of follower acquisition, might correlate with news events around the person being followed. (It turned out that other researchers had followed a similar idea previously: We know who you followed last summer: inferring social link creation times in twitter.)

Whilst it is trivial for Twitter to generate such reports – they own the data – it is harder for independent researchers. When I charted the follower accession curves for UK MPs, I had access to a whitelisted Twitter API count that meant I could download large amounts of data quite quickly. The (generic) rate limiting constraints on my current Twitter account mean that grabbing the data to generate such charts in a reasonable amount of time nowadays would be all but impossible.

There are several ways round this: one is to purchase the data using a service such as Export Tweet; one is to abuse the system and create my own army of (“fake”) Twitter accounts in order to make use of their API limits to mine the Twitter API using a distributed set of accounts under my control; a third is to “borrow” the rate limits of other, legitimate users.

For example, many sites now offer “Twitter analysis” services if you sign in with your Twitter account and grant the service provider permission to use your account to access various API calls through your account. I imagine that one of the ways such services cover the costs of their free offerings is to make use of API calls generated from user accounts to harvest data to build a big database that more valuable services can be provided off the back of.

In this case, whilst the service is free, the aim is not specifically to collect data about the user so they can be sold to advertisers as part of a specific audience or market segment, but instead to make use of the user’s access to API services so that the service provider can co-opt the user’s account to harvest data from Twitter in a distributed way. That is, the service provider gains the necessary benefits from each user to cover the costs of servicing the needs of that user by gaining access to Twitter data more generally, using the the user’s account as a proxy. The mass of data can then be mined and analysed to create market segments, or exploited otherwise.

This approach to exploiting users by means of exploiting their access to a particular resource is also being demonstrated elsewhere. For example, a Wired post from October last year on “cryptojacking” – Your Browser Could Be Mining Cryptocurrency For A Stranger describes how users’ computers can be exploited via their browsers in the form of Javascript code embedded in web pages or adverts that “steals” (i.e. appropriates, or co-opts) CPU cycles (and the electricity used to power them) to do work on behalf of the exploiter – such as mining bitcoin. A more recent story elsewhere – The Coming Battle for Your Web Browser [via @euroinfosec] – describes how the same sort of exploitation can be used to co-opt your browser to take part in distributed denial of service attacks (the provision of which is another thing that can be bought and sold, and that can hence be used to generate revenue for the provider of that service). Defensive countermeasures are available.

PS I notice @simonw has been grabbing his Twitter follower data into a datasette. Thinks: maybe I should tidy up my ESP code for generating follower networks and use SQLite, rather than MongoDB, for that? I think I had an ESP to do list to work on too..?

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: