A couple of weeks ago I saw a great example of an open learning blogpost from @katy_bird: Generating a word cloud (or not) from a Twitter hashtag. It described the trials and tribulations associated with trying to satisfy a request for the generation of a wordcloud based on tweets associated with a specific Twitter hashtag. A seemingly simple task, you might think, but things are never that easy… If you read the post, you’ll see Katy identified several problems, or stumbling blocks, along the way, as well as how she addressed them. There’s also a bit of reflection on the process as a whole.
Reading the post the first time (and again, just now), completely set me up for the day. It had a little bit of everyhting: a goal statement, the identification of a set of problems associated with trying to complete the task, some commentary on how the problems were tackled, and some reflection on the process as a whole. The post thus serves the purpose of capturing a problem discovery process, as well as the steps taken to try and solve each problem (although full documentation is lacking… This is something I have learned over the years: to use something like a gist on github to actually keep a copy of any code I generated to solve the problem, linked to for reuse by myself and others from the associated blog post). The post captures a glimpse back at a moment in time – when Katy didn’t know how to generate a wordcloud – from the joyful moment at which she has just learned how to generate said wordcloud. More importantly, the post describes the learning problems that became evident whilst trying to achieve the goal in such a way that they can act as hooks on which others can hang alternative or additional ways of solving the problem, or act as mentor.
By identifying the learning journey and problems discovered along the way, Katy’s record of her learning strategy also provides an authentic, learner centric perspective on what’s involved in trying to create a wordcloud around a twitter hashtag.
Reading the post again has also prompted me to blog this recipe, largely copied from the RDataMining post Using Text Mining to Find Out What @RDataMining Tweets are About, for generating a word cloud around a twitter hashtag using R (I use RStudio; the recipe requires at least the twitteR and tm libraries):
require(twitteR) searchTerm='#dev8d' #Grab the tweets rdmTweets <- searchTwitter(searchTerm, n=500) #Use a handy helper function to put the tweets into a dataframe tw.df=twListToDF(rdmTweets) ##Note: there are some handy, basic Twitter related functions here: ##https://github.com/matteoredaelli/twitter-r-utils #For example: RemoveAtPeople <- function(tweet) { gsub("@\\w+", "", tweet) } #Then for example, remove @'d names tweets <- as.vector(sapply(tw.df$text, RemoveAtPeople)) ##Wordcloud - scripts available from various sources; I used: #http://rdatamining.wordpress.com/2011/11/09/using-text-mining-to-find-out-what-rdatamining-tweets-are-about/ #Install the textmining library require(tm) #Call with eg: tw.c=generateCorpus(tw.df$text) generateCorpus= function(df,my.stopwords=c()){ #The following is cribbed and seems to do what it says on the can tw.corpus= Corpus(VectorSource(df)) # remove punctuation tw.corpus = tm_map(tw.corpus, removePunctuation) #normalise case tw.corpus = tm_map(tw.corpus, tolower) # remove stopwords tw.corpus = tm_map(tw.corpus, removeWords, stopwords('english')) tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords) tw.corpus } wordcloud.generate=function(corpus,min.freq=3){ require(wordcloud) doc.m = TermDocumentMatrix(corpus, control = list(minWordLength = 1)) dm = as.matrix(doc.m) # calculate the frequency of words v = sort(rowSums(dm), decreasing=TRUE) d = data.frame(word=names(v), freq=v) #Generate the wordcloud wc=wordcloud(d$word, d$freq, min.freq=min.freq) wc } print(wordcloud.generate(generateCorpus(tweets,'dev8d'),7)) ##Generate an image file of the wordcloud png('test.png', width=600,height=600) wordcloud.generate(generateCorpus(tweets,'dev8d'),7) dev.off() #We could make it even easier if we hide away the tweet grabbing code. eg: tweets.grabber=function(searchTerm,num=500){ require(twitteR) rdmTweets = searchTwitter(searchTerm, n=num) tw.df=twListToDF(rdmTweets) as.vector(sapply(tw.df$text, RemoveAtPeople)) } #Then we could do something like: tweets=tweets.grabber('ukgc12') wordcloud.generate(generateCorpus(tweets),3)
Here’s the result:
PS for an earlier, was broken, now patched, route to sketching a wordcloud from a twitter search using Wordle, see How To Create Wordcloud from a Twitter Hashtag Search Feed in a Few Easy Steps.
Great article, thanks! As for the visualization part, some time ago I made an open-source plugin to make word clouds like yours in pure HTML (so that they can contain links and be embedded into webpages and styled easily). You may find it interesting: https://github.com/DukeLeNoir/jQCloud
Hi, thanks for sharing this script!
The following line caused an error for me:
tw.df=twListToDF(rdmTweets)
The error was: Error: could not find function “twListToDF”
I changed the command to the following which resolved the issue:
tw.df=do.call(‘rbind’,lapply(rdmTweets, as.data.frame))
I’m only new to R and lost as to why I encountered this error. The twitteR package had been installed and loaded.
Any advise would be greatly appreciated.
Looking forward to learning more and putting this script to use.
Thank you.
Johann
@johann Hmm.. I thought that was a twitteR function? Have you got the latest version f the package installed? (Thanks for posting the workaround, btw :-)
Hi,
In order to run this script, I need “Rcpp” package. However, I couldn’t install package ‘Rcpp’ . Please help.
@raj you should be able to install this package from CRAN I think? Or maybe it requires a more recent version of R than the one you are running before it will install?