Who’s Tweeting Our Hashtag?

Last night, I put togther a quick video showing how to make a Yahoo Pipe that will find who’s been tweeting with a particular hashtag, in particular using the opened09 hashtag:

Here’s the pipe, deconstructed:

The idea behind this pipe in part came from a Dave Winer utility that will create an OPML feed linking to the Twitter feeds from the people that a named Twitter user is following (linked to from rssCloud news); my immediate reaction to that was: would an OPML feed linking to the feeds of everyone who had used a particular hashtag be useful? and this pipe is the result.

(Note to self: I really, really, really need to put a Pipes2OPML script together…)

Anyway, I thought it might make sense to generalise this pipe, and hopefully make it a little more useful, as well as showing a Pipes trick or two in some sort of appropriate context. So here are some problems with the above pipe that I’ll then show you how to fix…

– the pipe is hardwired to search for tweets containing #opened09; how can w generalise it to work with other hashtags/search strings?
– limited number of search results: the pipe only has access to the one hundred most recent opened09 hashtagged tweets; how can we get access to more?
– for a pipe that shows who has been using a particular hashtag, it might also be useful to see how many times they have used that hashtag?
– the pipe displays the hashtag twitterers in the order in which they first tweeted (a history based ordering); which is useful in one sense, but not in another. How else might we order the tweets? Recency of posting? Most active user of the hashtag?

So, first up, how do we generalise the pipe to work with other search terms? Simple, just add a user input for the search term and construct a URI using that term; then wire the URI in to the Fetch Feed block:

A form element appears on the pipes front page, and the search term is also passed via URIs that call the pipe:

Secondly, how do we increase the pool of hashtagged tweets? Simple: Twitter uses paged search results so we can pull in results from those other pages too…

So we can construct URIs for the second, third, fourth etc pages of search results too, and pull feeds in from all those pages:

The third item on our list was displaying how many times each user in the list had used the hashtag (based on the number of times they appeared in the search results sample, of course).

As @ostephens originally pointing out to me (at MashLib09), the Unique block actually counts the number of occurrences of the unique filter term:

We can display this in the title element using a very powerful construction… In the regular expression, rewrite the title string containing the value of the repeatcount variable. How? Like this – access the variable using the construction: ${VARIABLE_NAME}

So in the current example, we can get the value of the count using ${y:repeatcount}:

The final item on our improvements list was to try out some other orderings of the displayed tweets. At the moment, they are ordered according to the date on which each user first used the hashtag term (that is, within the sample of tweets we have pulled back from the Twitter search).

To change the order of names that are output from the pipe, we can use the Sort block. So for example, sort the users in terms of activity:

To sort the users in reverse chronological order (that is, according to who used the hashtag most recently), sort on the published attribute rather than on y:repeatcount.

You can now grab the RSS feed from the pipe and consume it in anything that accepts RSS, pull the JSON feed into your own web page if you’re a little more adventurous, or grab a Google homepage widget from the pipe’s homepage: Hashtag Twitterers Pipe

To tune the pipe to your own needs, you can also clone it from there and then modify your own version of it to your heart’s content:-)

Where Next With The Hashtagging Twitterers List?

This post is a holding position, so it’s probably gonna be even more cryptic than usual…

In Who’s Tweeting Our Hashtag?, I described a recipe for generating a list of people who had been tweeting, twittering or whatever, using a particular hashtag.

So what’s next on my to do list with this info?

Well, first of all I thought it’d be interesting to try to plot a graph of connections between the followers of everyone on the list, to see how large the hashtag audience might be.

Using a list of about 60 or so twitterers, captured yesterday, I called the Twitter API http://twitter.com/followers/ids/USERNAME.xml function for each one to pull down an XML list of all each of their followers by ID number, and topped it up with the user info (http://twitter.com/users/show/USERNAME.xml) for each person on the original list; this info meant I could in turn spot the ID for each of the hashtagging twitterers amongst the followers lists.

It’s easy enough to map transform these lists into the dot format that can be plotted by GraphViz, but the 10,000 edges or so that the list generated from the followers lists was too much for my version of GraphViz to cope with.

So instead, I thought I’d just try to plot a subgraph, such as the graph of people who were following a minimum specified number of people in the original hashtag twittering list. So for example, the graph of people who were following at least five of the the people who’d used the particular hashtag.

I hacked a piece of code to do this, but it’s far from ideal and I’m not totally convinced it works properly… Ideally what I want is simple (efficient) utility that will accept a .dot file and prune it, removing nodes that are less than a specified degree. (If you know of such a tool, please post a link to it in the comments:-)

Here’s the first graph I managed to plot:

If my code is working, an edge points to a person if at that person is following at least, err, lots of the other people [that is: lots of other people who used the hashtag]. So under the assumption that the code is working, this graph shows one person at the centre of the graph who is following lots of people who have tweeted the hashtag. Any guesses who that person might be? People who have edges directed towards them in this sort of plot are people who are heavily following the people using a particular hashtag. If you’re a conference organiser, I’m guessing that you’d probably want to appear in this sort of graph?

(If the code isn’t working, I’m not sure what the hell it is doing, or what the graph shows?!;-)

One other thing I thought I’d look at was the people who are following lots of people on the hashtagging list who haven’t themselves used the hashtag. These are the people to whom the event is being heavily amplified.

So for example, here we have a chart that is constructed as follows. The hashtag twitterers list is constructed from a sample of the most recent 500 opened09 hashtagged tweets around about the time stamp of this post and contains people who are in that list at least 3 times.

The edges on the chart are directed towards people who are not on the hashtag list but who are following more than 13 of the people who are on the list.

Hmmmm… anyway, that’s more than enough confusion for now… I’m going to try not to tinker with this any more for a bit, becuase a holiday beckons and this could turn into a mindf**k project… However, when I do return to it, I think I’m going to have a go at attacking it with a graph/network toolkit, such as NetworkX, and see if I can do a proper bit of network analysis on the resulting graphs.

Preliminary Thoughts on Visualising the OpenEd09 Twitter Network

The wealth of activity on the #opened09 Twitterstream keeps stealing my attention, and has got me thinking about how we might start to analyse the structure of the network around the hashtag, in part so we can understand information flow through that part of the open education network better.

So what’s come to mind? Here are some early sketches that I’m using as a foil to see what sort of questions they bring to my mind about the structure of the network. Note that this is a ‘first principles’/grass roots/bare bones approach – I’m not going to use any formal social network analysis techniques because I don’t know anything about them (yet….;-) But by starting with some doodles, the questions that arise may lead me to appreciating why the formal SNA tools and approaches are so useful in practical, easy to understand, human terms, and thus provide me with the motivation to learn how to use them…

First up, is there any value in understanding the structure of the network of people who are twittering with the #opened09 hashtag? Partly because of rate limiting on the Twitter API, last night I grabbed a sample of the most recent #opened09 hashtagged tweets, and then filtered them down to the most active twitterers over that period (people using the hashtag more than three times in the sample, I think it turned out to be). Then I pulled down their follower lists from the Twitter API, and constructed a graph of who followed who in that set. Here’s the Graphviz plot of that graph:

opened09 - follower relations between active twitterers

So what questions does this bring to mind? Well, off the top of my head:

– who is the most connected person in this graph? Does that tell us anything useful? Would we expect the event organiser to be the most connected?
– is there anyone in the network who isn’t very connected to other people? Why? Are they a different ‘sort’ of user? Are they new to the network?
– does the connectedness of people within the graph change over the course of the event? (I think that the Twitter API returns a list of followers in reverse chronological order; so we could approximate the growth by comparing the above graph with one that ignored the most recent 50(?!) followers of each person?
– are all Twitterers equal? Should we treat users who only ever use the hashtag as part of an RT differently when constructing this graph?
– is there any value in representing the number of followers each person has within the above graph? Or the number of people they follower? Or some function of the two? What about the number of times they used the hashtag in the sample period (or the number of times they RT the hashtag, or a function of the two) – should that be reflected too?

What’s the POINT of asking these questions? How about this – as individuals, can we identify members of the community who we don’t know? (IE is this sort of graph a good basis as a friend recommender? Would a big poster of this sort of graph be a good thing to post in the event coffee area? What would people look for in it if we did?)

Okay, next up: there’s always talk of things like twitter being used to amplify an event to ‘virtual participants’. How big might that audience be? And who comprises it? Are there people not at the event who effectively amplify it further?

How about a plot of the simple reach of the twitters, as a treemap?

Hmm… I’m not sure about that… Just because you have a big following doesn’t mean it’s a big relevant or interested following? (The number of people RTing the hashtag in your audience might?)

instead, how about this sort of graphic to help frame some questions:

opened11-amplificationNet

This one shows, for the most active opened09 hashtag twitterers, the people who follow more than 12 (maybe?; or more than 11?) of them. The named individuals are heavy opened09 twitterers, the numbers are the Twiitter IDs of the people who are seeing the event amplified to them. (Not that these people may also have tweeted the hashtag, only not so heavily. Maybe I need a stop list that removes people from this ‘amplification graph’ who have used the hashtag? That way, we can identify ‘leaves’ on the #opened09 tree – that is, people who received some number of #opened09 tweets but who never used the hashtag?)

So what questions does it bring to mind:

– are there people receiving large numbers of opened09 tweets who are unknown to the community?
– do the opened twitterers fall into cliques or reasonably well clustered groups around sets of followers who aren’t tweeting? (Would a cluster analysis be an interesting thing to do here?)
– if we lower the sampling threshold that specifies the minimum number of heavy twitterers that a ‘listener’ is following, how does the size of the listening audience grow? Is this interesting? Does the numbr of people that a listener follows influence how likely they are to see opened09 tweets. (eg if i follow 20 opned09 heavy twitterers, and only 50 people in all, my traffic may be domintad by opened09 folk; if i follow 500, or 1000, or 2000, then that traffic is likely to be diluted?)

And the POINT? Can we get a feeling for the audience the event is being amplified to? Are there members of that audience who seem to be a member of the community but aren’t really known to the community? Can we find the lurkers and pull them in with a personal invite (and is this even ethical?)

Just by the by, looking at RT networks could also be interesting – that is, looking at patterns of RTing across the network. maybe a graph showing people who RTd hashtagged tweets, as well as the path back to the original tweet? (This brings to mind some of @mediaczar’s work looking at Twitter in a PR context – which is exaclty what event amplification is, right?)

So having got some questions in mind (if you have more, please add them as comments below), I’ve got some sort of rationale for having a look at some formal graph theory and social network analysis stuff. This looks like it could be a good place to start: M.E.J. Nwman – The mathematics of networks [PDF].

[UPDATE: I guess the heuristic I have in mind with respect to the charts and SNA is this: are there features from the visualisation that jump out at me that the SNA tools can also pick out?]