Following on from A Tool Chain for Plotting Twitter Archive Retweet Graphs – Py, R, Gephi, here’s a quick view summary view over #UKGC12 tweets saved in Google Spreadsheet archive as developed by Martin Hawksey, generated from an R script (R code available here; #ukgc12 tweet archive here)…
(I did mean to tidy these up, add in titles etc etc but it’s late and I’m realllly tiered:-(
So for example, an ordered bar chart showing who was @’d most by hashtagged tweets:
And a scatterplot showing the number of tagged tweets to and from particular individuals, sized by how many times RT’s of a person’s tweets there were:
(Hmmm..strikes me I could use a fourth dimension (colour) to capture the number of RTs issued by each person too…? I wonder if I can also tie the angle of each label to a parameter value?!)
I also had a quick peek at looking at folk who were using the tag and/or were heavily followed by tag users (nodes sized according to betweenness centrality):
You can view a dynamic version of the conversation graph around the tag using Martin’s TAGSExplorer (about).
PS See the first comment below from Ben Marwick for a link to a text analysis script in R that can be easily tweaked to use archived tweets. When I get a chance, I’ll try to wrap this into a Sweave script (cf. How Might Data Journalists Show Their Working? Sweave for the automated generation of PDF and HTML reports.).
16 thoughts on “A Quick View Over a MASHe Google Spreadsheet Twitter Archive of UKGC12 Tweets”
I’ve been finding your R code for analysing tweets really handy, thanks for sharing it! For those interesting in content analysis of tweets, I’ve put together a little bit of R code on text mining (including clustering by multiscale bootstrap resampling) and topic modeling (using latent Dirichlet allocation) tweets over here: https://github.com/benmarwick/AAA2011-Tweets (which includes some of your previous code on retweet summaries).
@ben wonderful – thanks for linking to that… THough I think your R code/knowledge probably puts mine to shame! Putting together (i.e. cribbing someone else’s!;-) text analysis/topic modeling routines has been on my to do list for a some time. I guess another step in the chain might be to find where shortened links resolve to ultimately, and then produce stats that show the number of ultimate URLs, the number of different shortened forms of each URL etc?
Have you posted any of the plots that script generates anywhere?
@ben when generating wordlists and pulling out stop terms, I wonder if it would make sense to also use a stoplist that contains twitter user names? I guess a downside of this might be that it can be handy to have user names associated with topic lists? (Hmm… which also makes me wonder: if we do keep names in the mix, might it also make sense to append the name of a user to the tweet text before analysis so we can see see which senders are associated with topical items?
@Tony, thanks for your kind words. My code is almost entirely cobbled from others and stackoverflow… I can’t claim much originality! I’ve popped the plots up here: http://imgur.com/a/BxWxA#0 The final figure showing 23 topics over time is a bit of a cheat, all the plots were done in R but I used Inkscape to get the tweets-per-hour bars along the bottom (I turned the R barplot sideways and then laid each bar end-to-end). The figures are also in a manuscript currently under review. Great suggestions about URL analysis and usernames in stoplists. I left them in as another way of finding out who were the high-impact tweeters. By the way I like the way you plot twitter data with the usernames as data points, very efficient! Thanks for the igraph post too (you’re so productive it’s hard to keep up). Some of my colleagues use the sna package for statistical network analysis, have you tried that?
@ben yeah:-) – so you too, then, see R as arcana voodoo magic? Heh heh, that makes three: you, me and Martin Hawskey:-)
Been thinking I need to get up to speed in using github as a social tool… would it make sense to create a repo as a focus for a set of scripts around hashtag analysis usinfg Twitter search an R, so we three can maybe: 1) (if necessary) get comfortable with social use of Git; 2) start working collaboratively on some Sweave scripts, maybe, for generating canned reports around hashtags?
Re: the sna package – not tried it yet; I suspect @mhawksey will beat me to it if I’m not fast enough!
Love the idea of R twitter script recipe repo. Had a quick look at R SNA think I’ll be sticking with igraph for now, but look forward to be convinced otherwise. I need to start processing this archive https://docs.google.com/spreadsheet/ccc?key=0AqGkLMU9sHmLdGVxUXV2cHpiRzN4UzByS19zS3FUSVE&hl=en_GB#gid=5 so you should both look forward to your scripts been reused very soon (or you might want to jut beat me too it ;)
I’ve got my eye on the MLA12 tweet archive, actually. I’ve love to get started with Sweave, Tony would it be too much to ask you post a bit of a how-to-get-started-with-Sweave-and-RStudio? Or direct me to your favorite introduction? I’m happy to work with github. One thing that might be interesting to combine our methods on is a comparison of the tweet archives of a small number of major academic conferences, what do you think?
Hi Ben – I think a shared effort could be really handy, and I think Martin Hawksey is interested in this approach too.
As to Sweave, you need to get LaTeX installed (err, at least.. I think I worked out what I needed from any error messages!) and then RStudio picks it up. In R, if you open a new Sweave document, you can then write your LaTex script – I have an example linked out from https://blog.ouseful.info/2011/11/01/how-might-data-journalists-show-their-working-sweave/ and there are also some scripts (that barely work!) here: https://github.com/psychemedia/f1DataJunkie/tree/master/raceReports
When you’re working on the Sweave doc (.Rnw suffix?), RStudio gives you a couple of toolbar options for running/compiling the document.
I think the latest version of RStudio also integrates with git somehow, but I’m not sure how… yet!;-)
@ben, @martin I set up a git repo at https://github.com/psychemedia/Twitter-Backchannel-Analysis and have started trying to figure out how to work with it in RStudio (eg http://blog.rstudio.org/2012/01/25/rstudio-v0-95-released/ and http://support.rstudio.org/help/kb/advanced/version-control-getting-started ) but I’m a real git n00b so apols if the repo goes a bit crazy.
I guess there are a couple of approaches we could take? One is to all collaborate on the same repo, the other is for us all to clone the repo, make out own commits to that, then as required copy them to the master (note that I have no real idea how to do any of this atm;-)
Thought I’d try out the repo by commiting as a contributor (Tony needs to add you on github). Realise wordclouds have already been done, but was more of a case of chucking something at the repo to try and work out structure and workflow.
You’ll see I’ve already created a TAGS related folder, not sure what the other folders would be ‘live data’ maybe. Thought it would be useful to include example output. Again just making it up as I go using the same script name as the example image name adding ex00 in case a script generates more than one output
I’m not precious about any of this so happy if its ripped up and we start again
PS opted to using Git GUI rather than RStudio package
I added a doodles directory just because, a place that scripts/ideas/works in ropey progress can be placed. I also added a file for core functions which I think would simplify matters, and also posted it as an issue to start exploring how we might communicate in a rather more sensible place than here…;-) ALthough here has the advantage that we document our fumbling learnings around how to go about social coding?!
Comments are closed.