Dangers of a Walled Garden…

Reading a recent Economist article (The value of friendship) about the announcement last week that Facebook is to float as a public company, and being amazed as ever about how these valuations, err, work, I recalled a couple of observations from a @currybet post about the Guardian Facebook app (“The Guardian’s Facebook app” – Martin Belam at news:rewired). The first related to using Facebook apps to (only partially successfully) capture attention of folk on Facebook and get them to refocus it on the Guardian website:

We knew that 77% of visits to the Guardian from facebook.com only lasted for one page. A good hypothesis for this was that leaving the confines of Facebook to visit another site was an interruption to a Facebook session, rather than a decision to go off and browse another site. We began to wonder what it would be like if you could visit the Guardian whilst still within Facebook, signed in, chatting and sharing with your friends. Within that environment could we show users a selection of other content that would appeal to them, and tempt them to stay with our content a little bit longer, even if they weren’t on our domain.

The second thing that came to mind related to the economic/business models around the app Facebook app itself:

The Guardian Facebook app is a canvas app. That means the bulk of the page is served by us within an iFrame on the Facebook domain. All the revenue from advertising served in that area of the page is ours, and for launch we engaged a sponsor to take the full inventory across the app. Facebook earn the revenue from advertising placed around the edges of the page.

I’m not sure if Facebook runs CPM (cost per thousand) display based ads, where advertisers pay per impression, or follow the Google AdWords model, where advertisers pay per click (PPC), but it got me wondering… A large number of folk on Facebook (and Twitter) share links to third party websites external to Facebook. As Martin Belam points out, the user return rate back to Facebook for folk visiting third party sites from Facebook seems very high – folk seem to follow a link from Facebook, consume that item, return to Facebook. Facebook makes an increasing chunk of its revenue from ads it sells on Facebook.com (though with the amount of furniture and Facebook open graph code it’s getting folk to include on their own websites, it presumably wouldn’t be so hard for them to roll out their own ad network to place ads on third party sites?) so keeping eyeballs on Facebook is presumably in their commercial interest.

In Twitter land, where the VC folk are presumably starting to wonder when the money tap will start to flow, I notice “sponsored tweets” are starting to appear in search results:

ANother twitter search irrelevance

Relevance still appears to be quite low, possibly because they haven’t yet got enough ads to cover a wide range of keywords or prompts:

Dodgy twitter promoted tweet

(Personally, if the relevance score was low, I wouldn’t place the ad, or I’d serve an ad tuned to the user, rather than the content, per se…)

Again, with Twitter, a lot of sharing results in users being taken to external sites, from which they quickly return to the Twitter context. Keeping folk in the Twitter context for images and videos through pop-up viewers or embedded content in the client is also a strategy pursued in may Twitter clients.

So here’s the thought, though it’s probably a commercially suicidal one: at the moment, Facebook and Twitter and Google+ all automatically “linkify” URLs (though Google+ also takes the strategy of previewing the first few lines of a single linked to page within a Google+ post). That is, given a URL in a post, they turn it into a link. But what if they turned that linkifier off for a domain, unless a fee was paid to turn it back on. Or what if the linkifier was turned off if the number of clickthrus on links to a particular domain, or page within a domain, exceeded a particular threshold, and could only be turned on again at a metered, CPM rate. (Memories here of different models for getting folk to pay for bandwidth, because what we have here is access to bandwidth out of the immediate Facebook, Twitter or Google+ context).

As a revenue model, the losses associated with irritating users would probably outweigh any revenue benefits, but as a thought experiment, it maybe suggests that we need to start paying more attention to how these large attention-consuming services are increasingly trying to cocoon us in their context (anyone remember AOL, or to a lesser extent Yahoo, or Microsoft?), rather than playing nicely with the rest of the web.

PS Hmmm…”app”. One default interpretation of this is “app on phone”, but “Facebook app” means an app that runs on the Facebook platform… So for any give app, that it is an “app” implies that that particular variant means “software application that runs on a proprietary platform”, which might actually be a combination of hardware and software platforms (e.g. Facebook API and Android phone)???

Social Media Interest Maps of Newsnight and BBCQT Twitterers

I grabbed independent samples of 1500 recent users of the #newsnight and #bbcqt hashtags within a minute or two of each other about half an hour ago. Here’s who’s followed by 25 or more of the recent hashtaggers in each case. Can you distinguish the programmes each audience interest projection map relates to?

Here’s the first one – are these folk followed by 25 or more of the folk who recently used the #bbcqt or the #newsnight hashtag?

#bbcqt 1500 forward friends 25 25

Here’s the second one – are these folk followed by 25 or more of the folk who recently used the #bbcqt or the #newsnight hashtag?

#newsnight 1500   forward friends  projection 25 25

The answer is a only a click away…

PS I’ve got a couple of scripts in the pipeline that should be able to generate data that I can use to generate this sort of differencing word cloud, the idea being I should be able to identify at a glance accounts that different hashtag communities both follow, and accounts that they differently follow…

UPDATE: so here’s a quick first pass at comparing the audiences. I’m not sure how reliable the method is, but it’s as follows:

– for each hashtag, grab 1500 recent tweets. Grab the list of folk the hashtagging users follow and retain a list (the ‘interest list’) of folk followed by at least 25 of the hashtaggers. Filter the hashtagger list so that it only contains hashtaggers who follow at least 25 people (this cuts out brand new users and newly created spam accounts). Count the number of filtered hashtaggers that follow each person in the interest list, and normalise by dividing through by the total number of filtered hashtaggers. To recap, for each tag, we now have a list of folk who were popularly followed by users of that tag, along with a number for each one between 0 and 1 describing proportionally how much of the hashtagging sample follow them.

(Note that there may be all sorts of sampling errors… I guess I need to qualify reports with the number of unique folk tweeting in the twitter sample captured. I maybe also need to improve sampling so rather than searching for 1500 tweets, I generate a sample of 1000 unique users of the tag?)

I then load these files into R and run through the following process:

#Multiply this nromalised follower proportion by 1000 and round down to get an integer between 0 and 1000 representing a score relative to the proportion of filtered hashtagger who follow each person in the interest list.
counts_newsnight$normIn=as.integer(counts_newsnight$inNorm*1000)
counts_bbcqt$normIn=as.integer(counts_bbcqt$inNorm*1000)

#ANother filtering step: we're going to plot similarities and differences between folk followed by at least 25% of the corresponding filtered hashtaggers
newsnight=subset(counts_newsnight,select=c(username,normIn),subset=(inNorm>=0.25))
bbcqt=subset(counts_bbcqt,select=c(username,normIn),subset=(inNorm>=0.25))

#Now generate a dataframe
qtvnn=merge(bbcqt,newsnight,by="username",all=T)
colnames(qtvnn)=c('username','bbcqt','newsnight')

#replace the NA cell values (where for example someone in the bbcqt list is not in the newsnight list
qtvnn[is.na(qtvnn)] <- 0

That generates a dataframe that looks something like this:

      username bbcqt newsnight
1    Aiannucci   414       408
2  BBCBreaking   455       464
3 BBCNewsnight   317       509
4  BBCPolitics     0       256
5   BBCr4today     0       356
6  BarackObama   296       334

Thanks to Josh O’Brien on Stack Overflow, I can recast this data frame into a term.matrix that plays nicely with the latest version of the R wordcloud package.

mat <- as.matrix(qtvnn[-1])
dimnames(mat)[1] <- qtvnn[1]
comparison.cloud(term.matrix = mat)
commonality.cloud(term.matrix = mat)

Here’s the result – commonly followed folk:

And differentially followed folk (at above the 25% level, remember…)

So from this what can we say? Both audiences have a general news interest, into pop politics and perhaps satirical comedy, maybe leaning to the left? The Question Time audience is a more casual audience, more minded to following celebrities, whereas the Newsnight audience is a bit more into following notable media folk (journalists, editors) and also political news. (I’d be keen to hear any other readings of these maps – please feel free to leave a comment containing your interpretations/observations/reading:-)

UPDATE2: to try to get a handle on what the word clouds might be telling us from an alternative visual perspective on the data, rather than inspecting the actual code for example, here’s a scatterplot showing how the follower proportions compare directly:

COmparison of who #newsnight and #bbcqt hashtaggers follow

ggplot(na.omit(subset(qtvnn,bbcqt>0 & newsnight>0))) + geom_text(aes(x=bbcqt,y=newsnight,label=username,angl=45),size=4) + xlim(200,600) + ylim(200,600) + geom_abline(intercept=0, slope=1,colour='grey')

Here’s another view – this time plotting followed folk for each tag who are not followed by the friends of the other tag [at at least the 25% level]:

hashtag comparison - folk not follwed by other tag

I couldn’t remember/didn’t have Google to hand to find the best way of reshaping the data for this, so I ended up with a horrible horrible hack…

nn=data.frame(typ='newsnight',subset(qtvnn,select=c(username,newsnight),subset=(newsnight>0 & bbcqt==0)))
qt=data.frame(typ='bbcqt',subset(qtvnn,select=c(username,bbcqt),subset=(newsnight==0 & bbcqt>0)))
colnames(nn)=c('typ','name',val'')
colnames(qt)=c('typ','name',val'')
qtnn=rbind(nn,qt)
ggplot()+geom_text(data=qtnn,aes(x=typ,y=val,label=name),size=3)

I think this is all starting to get to the point where I need to team up with a proper developer and get *all* the code properly written and documented before any errors that are currently there get baked in too deeply…

Getting Started With Twitter Analysis in R

Earlier today, I saw a post vis the aggregating R-Bloggers service a post on Using Text Mining to Find Out What @RDataMining Tweets are About. The post provides a walktrhough of how to grab tweets into an R session using the twitteR library, and then do some text mining on it.

I’ve been meaning to have a look at pulling Twitter bits into R for some time, so I couldn’t but have a quick play…

Starting from @RDataMiner’s lead, here’s what I did… (Notes: I use R in an R-Studio context. If you follow through the example and a library appears to be missing, from the Packages tab search for the missing library and import it, then try to reload the library in the script. The # denotes a commented out line.)

require(twitteR)
#The original example used the twitteR library to pull in a user stream
#rdmTweets <- userTimeline("psychemedia", n=100)
#Instead, I'm going to pull in a search around a hashtag.
rdmTweets <- searchTwitter('#mozfest', n=500)
# Note that the Twitter search API only goes back 1500 tweets (I think?)

#Create a dataframe based around the results
df <- do.call("rbind", lapply(rdmTweets, as.data.frame))
#Here are the columns
names(df)
#And some example content
head(df,3)

So what can we do out of the can? One thing is look to see who was tweeting most in the sample we collected:

counts=table(df$screenName)
barplot(counts)

# Let's do something hacky:
# Limit the data set to show only folk who tweeted twice or more in the sample
cc=subset(counts,counts>1)
barplot(cc,las=2,cex.names =0.3)

Now let’s have a go at parsing some tweets, pulling out the names of folk who have been retweeted or who have had a tweet sent to them:

#Whilst tinkering, I came across some errors that seemed
# to be caused by unusual character sets
#Here's a hacky defence that seemed to work...
df$text=sapply(df$text,function(row) iconv(row,to='UTF-8'))

#A helper function to remove @ symbols from user names...
trim <- function (x) sub('@','',x)

#A couple of tweet parsing functions that add columns to the dataframe
#We'll be needing this, I think?
library(stringr)
#Pull out who a message is to
df$to=sapply(df$text,function(tweet) str_extract(tweet,"^(@[[:alnum:]_]*)"))
df$to=sapply(df$to,function(name) trim(name))

#And here's a way of grabbing who's been RT'd
df$rt=sapply(df$text,function(tweet) trim(str_match(tweet,"^RT (@[[:alnum:]_]*)")[2]))

So for example, now we can plot a chart showing how often a particular person was RT’d in our sample. Let’s use ggplot2 this time…

require(ggplot2)
ggplot()+geom_bar(aes(x=na.omit(df$rt)))+opts(axis.text.x=theme_text(angle=-90,size=6))+xlab(NULL)

Okay – enough for now… if you’re tempted to have a play yourself, please post any other avenues you explored with in a comment, or in your own post with a link in my comments;-)

Fishing for OU Twitter Folk…

Just a quick observation inspired by the online “focus group” on Twitter yesterday around the #twitterou hashtag (a discussion for OU folk about Twitter usage): a few minutes in to the discussion, I grabbed a list of the folk who had used the tag so far (about 10 or people at the time), pulled down a list of the people they followed to construct a graph of hashtaggers->friends, and then filtered the resulting graph to show folk with node degree of 5 or more.

twitterOU - folk followed by 5 or more folk using twitterou before 2.10 or so today

Because a large number of OU Twitter folk follow each other, the graph is quite dense, which means that if we take a sample of known OU users and look for people that a majority of that sample follow, we stand a reasonable chance of identifying other OU folk…

Doing a bit of List Intelligence (looking up the lists that a significant number of hashtag users were on, I identified several OU folk Twitter lists, most notably @liamgh/planetou and @guyweb/openuniversity.

Just for completeness, it’s also worth pointing out that simple community analysis of followers of a known OU person might also turn up OU clusters, e.g. as described in Digging Deeper into the Structure of My Twitter Friends Network: Librarian Spotting. I suspect if we did clique analysis on the followers, this might also identify ‘core’ members of organisational communities that could be used to seed a snowball discovery mechanism for more members of that organisation.

PS hmmm… maybe I need to do a post or two on how we might go about discovering enterprise/organisation networks/communities on Twitter…?

Segmented Communications on Twitter via @-partner Messaging

As this blog rarely attracts comments, it can be quite hard for me to know who, if anyone, regularly reads it (likely known suspects and the Googlebot aside). The anonymous nature of feed reader subscriptions also means it tricky to know who (if anyone) is reading the blog at all…

Twitter is slightly different in this regard, because for the majority of accounts, the friends and followers lists are public; which means it’s possible to “position” a particular account in terms of the interests of the folk it follows and who follow it.

Whilst I was putting together A Couple More Social Media Positioning Maps for UK HE Twitter Accounts, I considered including a brief comment on how the audience of a popular account will probably segment into different interest groups, and whether or not there was any mileage in trying to customise messages to particular segments without alienating the other parts of the audience.

Seeing @eingang’s use yesterday of a new (to me) Twitter convention of sending hashtagged messages to @hidetag, so that folk following the hashtag would see the tweet, but Michelle’s followers wouldn’t necessarily see the tagged tweets (no-one should follow @hidetag, NO_ONE ;-), it struck me that we might be able to use a related technique to send messages that are only visible to a particular segment of the followers of a Twitter account…

How so?

Firstly, you need to know that public Twitter messages sent to a particular person by starting the message with an @name are only generally visible in the stream of folk who follow both the sender and @name; (identifying this population was one of the reasons I put together the Who Can See Whose Conversations In-stream on Twitter? tool).

Secondly, you need to do a bit of social network analysis. (In what follows, I assume a directed graph where a node from A to B means that A follows B, or equivalently, B is a friend of A.) A quick and dirty approach might be to use in-degree and out-degree, or maybe the HITS algorithm/authority and hub values, as follows: identify the audience segment you want to target by looking for clusters in how your followers follow each other, then do a bit of network analysis on that segment to look for Authority nodes or nodes that are followed by a large number of people in that segment who also follow you. If you now send a message to that Authority/high in-degree node, it will be seen in-stream by that user, as well as those of your followers who also follow that Authority account.

This approach can be seen as a version of co-branding/brand partnership: conversational co-branding/conversational brand partnerships. Here’s how it may work: brand X has an audience that segments into groups A, B and C. Suppose that company Y is an authority in segment B. If X and Y form a conversational brand partnership, X can send messages ostensibly to Y that also reach segment B. For a mutually beneficial relationship, X would also have to be an authority in one of Y’s audience segments (for example, segment P out of segments P, Q, and R.) Ideally, P and B would largely overlap, meaning they can have a “sensible” conversation and it will hit both their targeted audiences…

For monitoring discussions within a particular segment, it strikes me that if we monitor the messages seen by an individual with a large Hub value/out-degree (that is, folk who follow large numbers of (influential) folk within the segment). By tapping into the Hub’s stream, we get some sort of sampling of the conversations taking place within the segment.

These ideas are completely untested (by me) of course… But they’re something I may well start to tinker with if an appropriate opportunity arises…

Risk Assessment: Corporate Acquisitions Can Kill APIs

So it seems that my to-do list just got shorter as Twitter acquire BackType and as a result “will discontinue the BackType product and API services”.

Bah…:-(

On my roadmap (err, such as it is!;-), one thing I was hoping to do was start exploring in more detail the struture of communities around a shared link, with a view to exploring in more detail some of the actual dynamics of link sharing across Twitter networks. My early forays in to this have tended to use BackType, as for example in Visualising Ad Hoc Tweeted Link Communities, via BackType.

The simple recipe I’d started out with was based around the following steps:

– given the URL, look up who’s tweeted it via the BackType API;
– for each tweeter of the link, grab the list of people they follow (i.e. their friends);
– plot the “inner” network showing which of the people who tweeted the link the follow each other.

This gave an easy way in to identifying a set of folk who had expressed an interest in a link by virtue of sharing it, this set then acting as the starting point for a community analysis.

Another approach I started to explore (but never blogged?!) was looking at networks of folk who had shared one of the links recently shortened by a particular bit.ly user. So for example, this graph (captured some time ago) used the BackType API to find who had tweeted one of more of 15 or so links that @charlesarthur had shortened using bit.ly, and then plotted friend connections between them:

follower connections between folk tweeting one or more of 15 links also recently shortened on bitly by charlesarthur

Unfortunately, now that the BackType API has gone (when I try to call it I get a “Limit exceeded” error message), the key ingredient from those two original recipes is no longer available…:-(

A Map of My Twitter Follower Network

Twitter may lay claim to millions of users, but we intend to only inhabit a small part of it… I Follow 500 or so people, and am followed by 3000 or so “curated” followers (I block maybe 20-50 a week, and not all of them obvious spam accounts, in part because I see the list of folk who follow me as a network that defines several particular spheres of interest, and I don’t want to drown out signal by noise.)

So here’s a map of how the connected component of the graph of how my Twitter followers follow each other; it excludes people who aren’t followed by anyone in the graph (which may include folk who do follow me but who have private accounts).

The layout is done in Gephi using the Force Atlas 2 layout. It’s quite by chance that the layout resembles the Isle of Wight…or a heart? Yes, maybe it’s a great be heart:-)

Mu twitter follower net - connected component

By running the HITS statistic over the graph, we can get a feel for who the influential folk are; sizing and labeling nodes by Authority, we get this (click through to see a bigger version):

My twitter follower network

Here’s an annotated version, as I see it (click through to see a bigger version):

My annotated twitter follower network

If you’d like me to spend up to 20 mins making a map for you, I’ll pick up on an idea from Martin Hawksey and maybe do two or three maps for a donation to charity (in particular, to Ovacome). In fact, if you feel as if you’ve ever benefited from anything posted to this blog, why not give them a donation anyway…? Donate to ovacome.

Identifying the Twitterati Using List Analysis

Given absolutely no-one picked up on List Intelligence – Finding Reliable, Trustworthy and Comprehensive Topic/Sector Based Twitter Lists, here’s a example of what the technique might be good for…

Seeing the tag #edusum11 in my feed today, and not being minded to follow it it I used the list intelligence hack to see:

– which lists might be related to the topic area covered by the tag, based on looking at which Twitter lists folk recently using the tag appear on;
– which folk on twitter might be influential in the area, based on their presence on lists identified as maybe relevant to the topic associated with the tag…

Here’s what I found…

Some lists that maybe relate to the topic area (username/list, number of folk who used the hashtag appearing on the list, number of list subscribers), sorted by number of people using the tag present on the list:

/joedale/ukedtech 6 6
/TWMarkChambers/edict 6 32
/stevebob79/education-and-ict 5 28
/mhawksey/purposed 5 38
/fosteronomo/chalkstars-combined 5 12
/kamyousaf/uk-ict-education 5 77
/ssat_lia/lia 5 5
/tlists/edtech-995 4 42
/ICTDani/teched 4 33
/NickSpeller/buzzingeducators 4 2
/SchoolDuggery/uk-ed-admin-consultancy 4 65
/briankotts/educatorsuk 4 38
/JordanSkole/jutechtlets 4 10
/nyzzi_ann/teacher-type-people 4 9
/Alexandragibson/education 4 3
/danielrolo/teachers 4 20
/cstatucki/educators 4 13
/helenwhd/e-learning 4 29
/TechSmithEDU/courosalets 4 2
/JordanSkole/chalkstars-14 4 25
/deerwood/edtech 4 144

Some lists that maybe relate to the topic area (username/list, number of folk who used the hashtag appearing on the list, number of list subscribers), sorted by number of people subscribing to the list (a possible ranking factor for the list):
/deerwood/edtech 4 144
/kamyousaf/uk-ict-education 5 77
/SchoolDuggery/uk-ed-admin-consultancy 4 65
/tlists/edtech-995 4 42
/mhawksey/purposed 5 38
/briankotts/educatorsuk 4 38
/ICTDani/teched 4 33
/TWMarkChambers/edict 6 32
/helenwhd/e-learning 4 29
/stevebob79/education-and-ict 5 28
/JordanSkole/chalkstars-14 4 25
/danielrolo/teachers 4 20
/cstatucki/educators 4 13
/fosteronomo/chalkstars-combined 5 12
/JordanSkole/jutechtlets 4 10
/nyzzi_ann/teacher-type-people 4 9
/joedale/ukedtech 6 6
/ssat_lia/lia 5 5
/Alexandragibson/education 4 3
/NickSpeller/buzzingeducators 4 2
/TechSmithEDU/courosalets 4 2

Other ranking factors might include the follower count, or factors from some sort of social network analysis, of the list maintainer.

Having got a set of lists, we can then look for people who appear on lots of those lists to see who might be influential in the area. Here’s the top 10 (user, number of lists they appear on, friend count, follower count, number of tweets, time of arrival on twitter):

['terryfreedman', 9, 4570, 4831, 6946, datetime.datetime(2007, 6, 21, 16, 41, 17)]
['theokk', 9, 1564, 1693, 12029, datetime.datetime(2007, 3, 16, 14, 36, 2)]
['dawnhallybone', 8, 1482, 1807, 18997, datetime.datetime(2008, 5, 19, 14, 40, 50)]
['josiefraser', 8, 1111, 7624, 17971, datetime.datetime(2007, 2, 2, 8, 58, 46)]
['tonyparkin', 8, 509, 1715, 13274, datetime.datetime(2007, 7, 18, 16, 22, 53)]
['dughall', 8, 2022, 2794, 16961, datetime.datetime(2009, 1, 7, 9, 5, 50)]
['jamesclay', 8, 453, 2552, 22243, datetime.datetime(2007, 3, 26, 8, 20)]
['timbuckteeth', 8, 1125, 7198, 26150, datetime.datetime(2007, 12, 22, 17, 17, 35)]
['tombarrett', 8, 10949, 13665, 19135, datetime.datetime(2007, 11, 3, 11, 45, 50)]
['daibarnes', 8, 1592, 2592, 7673, datetime.datetime(2008, 3, 13, 23, 20, 1)]

The algorithms I’m using have a handful of tuneable parameters, which means there’s all sorts of scope for running with this idea in a “research” context…

One possible issue that occurred to me was that identified lists might actually cover different topic areas – this is something I need to ponder…

Cobbling Together a Searchable Twitter Friends/Followers Contact List in Google Spreadsheets

Have you ever found yourself in the situation where you want to send someone a Twitter message but you can’t remember their Twitter username although you do know their real name? Or where you can remember their twitter username or their real name, but you do remember who they work for, or some other biographical fact about them that might appear in their Twitter biography? If that sounds familiar, here’s a trick that may help…

… a searchable Twitter friends and followers contact list in Google Spreadsheets.

It’s based on Martin Hawksey’s rather wonderful Export Twitter Followers and Friends using a Google Spreadsheet (I have to admit – Martin has left me way behind now when it comes to tinkering with Google Apps Script…!) To get started, you’ll need a Google docs account, and then have to indulge in a quick secret handshake between Google docs and Twitter, but Martin’s instruction sheet is a joy to follow:-) Follow the *** Google Spreadsheet to Export Twitter Friends and Followers *** link on Martin’s page, then come back here once you’ve archived your Twitter friends and/or followers…

..done that? Here’s how to make the contact list searchable… I thought it should have been trivial, but it turned out to be quite involved!

The first thing I did was create a drop down list to let the user select Friends or Followers as the target of the search. (Martin’s application loads friends and followers into different sheets.)

The next step was to generate a query. To search for a particular term on a specified sheet we can use a QUERY formula that takes the following form:

=query(Friends!B:E,”select B,C,D,E where D contains ‘JISC'”)

Friends! specifies the sheet we want to search over; B:E says we want to pull columns B, C, D and E from the Friends sheet into the current sheet; the select statement will display results over four columns (B, C, D and E) from Friends for rows where the entry in column D contains the search term JISC.

To pull in the search term from cell D1 we can use a query of the form:

=query(Friends!B:E,concatenate(“select B,C,D,E where D contains ‘”,D1,”‘”))

The =concatenate formula constructs the search query. Make sure you use the right sort of quotes when constructing the string – Google Spreadsheets seems to prefer the use of double quotes wherever possible!

To search over two columns, (for example, the real name and the description columns of the twitter friends/follower data) we can use a query of the form:

=query(Followers!B:E,concatenate(“select B,C,D,E where C contains ‘”,D1,”‘ or D contains ‘”,D1,”‘”)

Again – watch out for the quotes – the result we want from the concatenation is something like:

=query(Followers!B:E,concatenate(“select B,C,D,E where C contains ‘Jisc’ or D contains ‘Jisc’)

so we have to explicitly code in the single quote in the concatenation formula.

Unfortunately, the query formula is case sensitive, which can cause the search to fail because we haven’t taken (mis)use of case into account in our search term. This means we need to go defensive in the query formulation – in the following example, I force everything to upper case – search corpus as well as search terms:

=query(Followers!B:E,concatenate(“select B,C,D,E where upper(C) contains upper(‘”,D1,”‘) or upper(D) contains upper(‘”,D1,”‘)”)

The final step is to define the sheet we want to search – Friends! or Followers! – depending on the setting of cell B1 in our search sheet. I had idly though I could use a concatenate formula to create this, but concatenate returns a string and we need to define a range. In the end, the workaround I adopted was an if statement, that chooses a query with an appropriate range set explicitly/hardwired within the formula depending on whether we are are searching Friends or Followers. Here’s the complete formula, which i put into cell E1.

=if(B1=”Friends”,query(Friends!B:E,concatenate(“select B,C,D,E where upper(C) contains upper(‘”,D1,”‘) or upper(D) contains upper(‘”,D1,”‘)”)),query(Followers!B:E,concatenate(“select B,C,D,E where upper(C) contains upper(‘”,D1,”‘) or upper(D) contains upper(‘”,D1,”‘)”)))

I now have a query sheet defined that allows me to search over my friends or followers, as required, according to their real name or a search term that appears in their biography description.

More Pivots Around Twitter Data (little-l, little-d, again;-)

I’ve been having a play with Twitter again, looking at how we can do the linked thing without RDF, both within a Twitter context and also (heuristically) outside it.

First up, hashtag discovery from Twitter lists. Twitter lists can be used to collect together folk who have a particular interest, or be generated from lists of people who have used a particular hashtag (as Martin H does with his recipe for Populating a Twitter List via Google Spreadsheet … Automatically! [Hashtag Communities]).

The thinking is simple: grab the most recent n tweets from the list, extract the hashtags, and count them, displaying them in descending order. This gives us a quick view of the most popular hashtags recently tweeted by folk on the list: Popular recent tags from folk a twitter list

This is only a start, of course: it might be that a single person has been heavily tweeting the same hashtag, so a sensible next step would be to also take into account the number of people using each hashtag in ranking the tags. It might also be useful to display the names of folk on the list who have used the hashtag?

I also updated a previous toy app that makes recommendations of who to follow on twitter based on a mismatch between the people you follow (and who follow you) and the people following and followed by another person – follower recommender (of a sort!):

The second doodle was inspired by discussions at Dev8D relating to a possible “UK HE Developers’ network”, and relies on an assumption – that the usernames people use Twitter might be used by the same person on Github. Again, the idea is simple: can we grab a list of Twitter usernames for people that have used the dev8d hashtag (that much is easy) and then lookup those names on Github, pulling down the followers and following lists from Github for any IDs that are recognised in order to identify a possible community of developers on Github from the seed list of dev8d hashtagging Twitter names. (It also occurs to me that we can pull down projects Git-folk are associated with (in order to identify projects Dev8D folk are committing to) and the developers who also commit or subscribe to those projects.)

follower/following connections on github using twitter usernames that tweeted dev8d hashtag

As the above network shows, it looks like we get some matches on usernames…:-)