OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

A Quick View Over a MASHe Google Spreadsheet Twitter Archive of UKGC12 Tweets

Following on from A Tool Chain for Plotting Twitter Archive Retweet Graphs – Py, R, Gephi, here’s a quick view summary view over #UKGC12 tweets saved in Google Spreadsheet archive as developed by Martin Hawksey, generated from an R script (R code available here; #ukgc12 tweet archive here)…

(I did mean to tidy these up, add in titles etc etc but it’s late and I’m realllly tiered:-(

So for example, an ordered bar chart showing who was @’d most by hashtagged tweets:

Tweets to an individual

And a scatterplot showing the number of tagged tweets to and from particular individuals, sized by how many times RT’s of a person’s tweets there were:

ukgc2012 tweeps

(Hmmm..strikes me I could use a fourth dimension (colour) to capture the number of RTs issued by each person too…? I wonder if I can also tie the angle of each label to a parameter value?!)

I also had a quick peek at looking at folk who were using the tag and/or were heavily followed by tag users (nodes sized according to betweenness centrality):

Connections between recent users of the #ukgc12 hashtag and the folk they tend to follow (node size: betweenness centrality)

You can view a dynamic version of the conversation graph around the tag using Martin’s TAGSExplorer (about).

PS See the first comment below from Ben Marwick for a link to a text analysis script in R that can be easily tweaked to use archived tweets. When I get a chance, I’ll try to wrap this into a Sweave script (cf. How Might Data Journalists Show Their Working? Sweave for the automated generation of PDF and HTML reports.).

Written by Tony Hirst

January 21, 2012 at 12:31 am

Posted in Anything you want, Rstats

Tagged with

16 Responses

Subscribe to comments with RSS.

  1. I’ve been finding your R code for analysing tweets really handy, thanks for sharing it! For those interesting in content analysis of tweets, I’ve put together a little bit of R code on text mining (including clustering by multiscale bootstrap resampling) and topic modeling (using latent Dirichlet allocation) tweets over here: https://github.com/benmarwick/AAA2011-Tweets (which includes some of your previous code on retweet summaries).

    Ben

    January 22, 2012 at 6:59 am

    • @ben wonderful – thanks for linking to that… THough I think your R code/knowledge probably puts mine to shame! Putting together (i.e. cribbing someone else’s!;-) text analysis/topic modeling routines has been on my to do list for a some time. I guess another step in the chain might be to find where shortened links resolve to ultimately, and then produce stats that show the number of ultimate URLs, the number of different shortened forms of each URL etc?

      Have you posted any of the plots that script generates anywhere?

      Tony Hirst

      January 22, 2012 at 11:52 am

  2. @ben when generating wordlists and pulling out stop terms, I wonder if it would make sense to also use a stoplist that contains twitter user names? I guess a downside of this might be that it can be handy to have user names associated with topic lists? (Hmm… which also makes me wonder: if we do keep names in the mix, might it also make sense to append the name of a user to the tweet text before analysis so we can see see which senders are associated with topical items?

    Tony Hirst

    January 22, 2012 at 12:20 pm

    • @Tony, thanks for your kind words. My code is almost entirely cobbled from others and stackoverflow… I can’t claim much originality! I’ve popped the plots up here: http://imgur.com/a/BxWxA#0 The final figure showing 23 topics over time is a bit of a cheat, all the plots were done in R but I used Inkscape to get the tweets-per-hour bars along the bottom (I turned the R barplot sideways and then laid each bar end-to-end). The figures are also in a manuscript currently under review. Great suggestions about URL analysis and usernames in stoplists. I left them in as another way of finding out who were the high-impact tweeters. By the way I like the way you plot twitter data with the usernames as data points, very efficient! Thanks for the igraph post too (you’re so productive it’s hard to keep up). Some of my colleagues use the sna package for statistical network analysis, have you tried that?

      Ben

      January 27, 2012 at 8:32 pm

      • @ben yeah:-) – so you too, then, see R as arcana voodoo magic? Heh heh, that makes three: you, me and Martin Hawskey:-)

        Been thinking I need to get up to speed in using github as a social tool… would it make sense to create a repo as a focus for a set of scripts around hashtag analysis usinfg Twitter search an R, so we three can maybe: 1) (if necessary) get comfortable with social use of Git; 2) start working collaboratively on some Sweave scripts, maybe, for generating canned reports around hashtags?

        Re: the sna package – not tried it yet; I suspect @mhawksey will beat me to it if I’m not fast enough!

        Tony Hirst

        January 28, 2012 at 12:59 am

  3. [...] I could get anything back.I forgot about pursuing this angle until that is I saw Tony Hirst’s A Quick View Over a MASHe Google Spreadsheet Twitter Archive of UKGC12 Tweets in which he uses the statistical computing and graphing tool ‘R’ to read a spreadsheet of [...]

  4. Love the idea of R twitter script recipe repo. Had a quick look at R SNA think I’ll be sticking with igraph for now, but look forward to be convinced otherwise. I need to start processing this archive https://docs.google.com/spreadsheet/ccc?key=0AqGkLMU9sHmLdGVxUXV2cHpiRzN4UzByS19zS3FUSVE&hl=en_GB#gid=5 so you should both look forward to your scripts been reused very soon (or you might want to jut beat me too it ;)

    Martin
    @mhawksey

    mhawksey

    January 29, 2012 at 11:35 am

  5. I’ve got my eye on the MLA12 tweet archive, actually. I’ve love to get started with Sweave, Tony would it be too much to ask you post a bit of a how-to-get-started-with-Sweave-and-RStudio? Or direct me to your favorite introduction? I’m happy to work with github. One thing that might be interesting to combine our methods on is a comparison of the tweet archives of a small number of major academic conferences, what do you think?

    Ben

    February 1, 2012 at 5:44 am

    • Hi Ben – I think a shared effort could be really handy, and I think Martin Hawksey is interested in this approach too.

      As to Sweave, you need to get LaTeX installed (err, at least.. I think I worked out what I needed from any error messages!) and then RStudio picks it up. In R, if you open a new Sweave document, you can then write your LaTex script – I have an example linked out from http://blog.ouseful.info/2011/11/01/how-might-data-journalists-show-their-working-sweave/ and there are also some scripts (that barely work!) here: https://github.com/psychemedia/f1DataJunkie/tree/master/raceReports

      When you’re working on the Sweave doc (.Rnw suffix?), RStudio gives you a couple of toolbar options for running/compiling the document.

      I think the latest version of RStudio also integrates with git somehow, but I’m not sure how… yet!;-)

      Tony Hirst

      February 1, 2012 at 10:05 am

    • @ben, @martin I set up a git repo at https://github.com/psychemedia/Twitter-Backchannel-Analysis and have started trying to figure out how to work with it in RStudio (eg http://blog.rstudio.org/2012/01/25/rstudio-v0-95-released/ and http://support.rstudio.org/help/kb/advanced/version-control-getting-started ) but I’m a real git n00b so apols if the repo goes a bit crazy.

      I guess there are a couple of approaches we could take? One is to all collaborate on the same repo, the other is for us all to clone the repo, make out own commits to that, then as required copy them to the master (note that I have no real idea how to do any of this atm;-)

      Tony Hirst

      February 1, 2012 at 1:01 pm

    • Thought I’d try out the repo by commiting as a contributor (Tony needs to add you on github). Realise wordclouds have already been done, but was more of a case of chucking something at the repo to try and work out structure and workflow.

      You’ll see I’ve already created a TAGS related folder, not sure what the other folders would be ‘live data’ maybe. Thought it would be useful to include example output. Again just making it up as I go using the same script name as the example image name adding ex00 in case a script generates more than one output

      I’m not precious about any of this so happy if its ripped up and we start again

      Martin

      PS opted to using Git GUI rather than RStudio package

      mhawksey

      February 1, 2012 at 2:30 pm

      • I added a doodles directory just because, a place that scripts/ideas/works in ropey progress can be placed. I also added a file for core functions which I think would simplify matters, and also posted it as an issue to start exploring how we might communicate in a rather more sensible place than here…;-) ALthough here has the advantage that we document our fumbling learnings around how to go about social coding?!

        Tony Hirst

        February 1, 2012 at 3:26 pm

  6. [...] reading Tony Hirst’s Ouseful Blog, I came across his mention of Martin Hawksey’s TAGSExplorer tool.  Hawksey has written an interesting post about how to ‘archive event hashtags and create an [...]

  7. [...] as the backchannel is ‘short term low value’ and needs post-processing. Tony Hirst has fiddled about with the tweets using Martin Hawksey’s TAGSExplorer and there’s also a Storify from Brian Kelly for [...]

  8. [...] A quick view over a MASHe Google spreadsheet Twitter archive – @psychemedia does stuff which blows my mind! [...]

  9. […] reading Tony Hirst’s Ouseful Blog, I came across his mention of Martin Hawksey’s TAGSExplorer tool.  Hawksey has written an interesting post about how to ‘archive event hashtags and create an […]


Comments are closed.

Follow

Get every new post delivered to your Inbox.

Join 757 other followers

%d bloggers like this: