My Twitter Community Grabbing Code – newt.py

In a series of recent posts, I’ve shown various network graphs and custom search engines generated from various things on twitter – communities around hashtags, users on lists, friends of an individual and so on.

Last weekend, I tried to pull together the code into some sort of integrated but extensible mess for my own tinkering. The core code can be found here: newt.py [UPDATE: more recent – and even more cluttered code – at https://github.com/psychemedia/newt]. It may or may not work for you, work correctly, or work at all. I may or may not update that file (though if I do I will try not to change function names or types they return, nor add or remove side-effects, of which there may be many in the most unlikely of places.) The functions are all over the place in the file, and almost completely undocumented. They also demonstrate my cut’n’paste from others understanding of Python. Whatever.

To use the code, you will need to install tweepy, and also add various keys into the functions at the top of the newt.py file. There are routines in there to generate Gephi GDF files, and Google custom search engine config files.

Here are a handful of scripts that I’ve been using to generate various output files.

The first (listfromTwapperkeepr.py) has a go at grabbing archived tweets (by hashtag) from Twapperkeeper. Command line usage is:
python listfromTwapperkeepr.py TAG START END LIMIT
where TAG is the hashtag (eg rswebsci), START and END are dates (in format YYYY-MM-DD, err, I think?! Whatever it is that Twapperkeeper expects;-), and LIMIT is the number of tweets that must appear from a particular person in the grabbed archive for that user to be added to the list. The Twitter list that is generated will be added with the name TAG to the authenticated user’s account (i.e. the user associated with the OAuth keys…) If the list already exists, it should just get updated with users not on the list (but no users will be removed…)

import sys,newt

def report(m):
  newt.report(m,True)

#----------------------------------------------------------------
#user settings
tag=sys.argv[1]
start=sys.argv[2]
end=sys.argv[3]
limit=int(sys.argv[4])
#----------------------------------------------------------------

twsn={}

twsn= newt.getTwapperkeeperArchiveTweeters(twsn,tag,start,end)

report("List members:")

tw=[]

tws={}
for i in twsn:
  tws[i]=twsn[i]['count']
  if tws[i]>=limit:
    tw.append(i)

#Print out the twitterers in order of who tweeted most
for i in sorted(tws, key=tws.get, reverse=True):
  msg=i+' '+str(tws[i])
  report(msg)

api=newt.getTwitterAPI()
user=api.auth.get_username()
newt.addManyToListByScreenName(api,user,tag,tw)

The second script (basicListNet.py) grabs users from a Twitter list, then plots out a network showing which members of the list friend other members of the list. Usage is:
python basicListNet.py USER LIST.

Files will be generated… ;-) In the CSE generation file, the XXX should be replaced by the label for the desired Google CSE (available from the Advanced settings page on the appropriate Google CSE existing site manager).

import sys,newt

def report(m):
  newt.report(m,True)

api=newt.getTwitterAPI()

#----------------------------------------------------------------
#user settings
user=sys.argv[1]
list=sys.argv[2]
#----------------------------------------------------------------
tw={}

tw=newt.listDetailsByScreenName({},api.list_members,user,list)

report("List members:")
for i in tw:
  report(tw[i].screen_name)
  
'''
report("List members:")
for i in tw:
  report(tw[i].screen_name)
'''  
newt.gephiOutputFile(api,list, tw)
newt.googleCSEDefinitionFile("XXX",list, tw)
newt.googleCSEDefinitionFileWeighted("XXX",list, tw)

The third script (basicUserNet.py) grabs the friends list for a specified user and then geneates the graph of which of their friends are friends of each other.Usage is:
python basicUserNet.py USER

import sys,newt

def report(m):
  newt.report(m,True)

api=newt.getTwitterAPI()

#----------------------------------------------------------------
#user settings
user=sys.argv[1]
#----------------------------------------------------------------
tw={}

tw=newt.getTwitterFriendsDetailsByIDs(api,user)

report("List members:")
for i in tw:
  report(tw[i].screen_name)
  
'''
report("List members:")
for i in tw:
  report(tw[i].screen_name)
'''  
newt.gephiOutputFile(api,user, tw)
newt.googleCSEDefinitionFile("XXX",user, tw)
newt.googleCSEDefinitionFileWeighted("XXX",user, tw)

And the fourth script (basicTwapperkeeperNet.py) gets an archive from Twitter and generates the network of archived users who have friended each other. Usage is:
python basicTwapperkeeperNet.py TAG START END LIMIT

import sys,newt

def report(m):
  newt.report(m,True)

api=newt.getTwitterAPI()

#----------------------------------------------------------------
#user settings
tag=sys.argv[1]
start=sys.argv[2] #'2010-09-23'
end=sys.argv[3]
limit=int(sys.argv[4])
#----------------------------------------------------------------

twsn={}

twsn= newt.getTwapperkeeperArchiveTweeters(twsn,tag,start,end)

report("List members:")
tw=[]
tws={}
for i in twsn:
  tws[i]=twsn[i]['count']
  if tws[i]>=limit:
    tw.append(i)

tw=newt.getTwitterUsersDetailsByScreenNames(api,tw)

#Print out the twitterers in order of who tweeted most
for i in sorted(tws, key=tws.get, reverse=True):
  msg=i+' '+str(tws[i])
  report(msg)

newt.gephiOutputFile(api,tag, tw)
newt.googleCSEDefinitionFile("XXX",tag, tw)
newt.googleCSEDefinitionFileWeighted("XXX",tag, tw)

The following script is experimental – and shows how to extend and merge data from several sources. A use case, for example, would be accessing lists of MPs on Twitter, by party and creating a Gephi output file whereby each twitterer node definition is annotated with their party.

import newt

def report(m):
  newt.report(m,True)

api=newt.getTwitterAPI()

#----------------------------------------------------------------
#user settings
user='ousefulapi'
list='cam23'
#----------------------------------------------------------------
tw={}

tw=newt.listDetailsByID(tw,api.list_members,user,list)
twx=newt.extendUserList(tw,["typ1 labour","typ2 sads"])
tw2=newt.listDetailsByID({},api.list_members,user,'alpsp')
tw2x=newt.extendUserList(tw2,["typ1 sdghg","typ2 sads"])
twx=newt.mergeDicts([twx,tw2x],True)

report("List members:")
twd=newt.deExtendUserList(twx)
for i in twd:
  report(twd[i].screen_name)
    
'''
#testing
ttx={}
for t in tw:
  ttx[t]={}
  ttx[t]['user']=tw[t]
  ttx[t]['classVals']={}
  ttx[t]['classVals']['typ1']='aser'
  ttx[t]['classVals']['typ2']=34
'''
ttx=twx
newt.gephiOutputFileExtended(api,'test', ttx,['typ1 VARCHAR','typ2 INT'])

Although the code sort of works, it’s a bit messy in terms of the representation used. So I’m going to give up on it and start afresh, probably using Networkx, In the new version, as soon as I load node or edge data in to what I shall probably call newtx, it will be added to a networkx graph. Then I can do network stats on the graph/network in Python, rather than having to use Gephi to run stats on the graph, as well as generating visual outputs using matplotlib.

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

2 thoughts on “My Twitter Community Grabbing Code – newt.py”

Comments are closed.