Last day of the holidays today, so I thought I’d try not to spend the day learning stuff from the web, but do something else instead, like read a book, or watch a film, clean the greenhouse, try out a new recipe, or maybe even get a (late) head start on some marking, start looking at putting a bid together, or draft a paper for some tinpot journal or other, (because I really, really, really need to generate some ‘research’ income, publish something academically credible, find some way, any way, of justifying my salary…).
But it never works out like that, does it…? Because I saw a couple of tweets from Martin, and they lead, as ever, to a bit of playing…
So for example, from this post of Martin’s – Free (and rebuild) the tweets! Export TwapperKeeper archives using Google Refine – I realised that Google Refine offers all sorts of import options I hadn’t really noticed before (like the ability to import data from an XML (incl. Atom/RSS), RDF or JSON feed. Which means I really need to have a quick play with that…
…for example, by seeing if I can load in some data directly from a Twitter search using a URL of the form:
http://search.twitter.com/search.json?q=from:mhawksey&rpp=5&include_entities=true&result_type=mixed
Ooh… magic:-)
Looking around various Google Refine works in progress, I also noticed that there appears to be a Python library for running Google Refine project scripts, which means you can presumably automate the Google Refine process? (Note to self – I still haven’t found a noddy tutorial to help me get started with running the Gephi Toolkit from Jython).
Or how about this tweet, “forwarded” by Martin in the sense that he replied to a tweet from @SuButcher and included me in the response, so I could look up the thread being replied to: “@mhawksey I want to use a Googledocs form to populate a google map with twitter users – see http://t.co/uJGC78b8 and http://t.co/hLbBEUXY” which got me idly wondering about the quickest way to geocode data in a Google spreadsheet, remembering the Google Gadget that will do just this (Google Spreadsheet geocoding map gadget – a quick play suggests you can set the range to a whole column (e.g. B:B), but setting the range from a specific row to the end of the column using the format B2:B (eg for use mapping results from a google form, excluding the title row), doesn’t seem to work?!)
Googling around, I also found this handy trick for getting the lat/long of a point out of Google maps in a recipe for Geocoding with Google Spreadsheets:
– CSV output for Google maps point geocoder: http://maps.google.com/maps/geo?output=csv&q=PLACENAME_OR_ADDRESS
And hence in the Google Spreadsheet context:
=ImportData(CONCATENATE(“http://maps.google.com/maps/geo?output=csv&q=”,”LOCATION”))
where “LOCATION” might equally be a reference to a cell that contains a location.
So… day wasted… again… may as well spend the rest of it doing F1DataJunkie stuff now…!
PS seems I was unwittingly sort of involved in the publication path of that Google Maps/CSV hack I linked to above… That’s why I do what I do… unforeseen consequences.
Bizarrely, when I’ve used Twitter Search, Google Refine seems to round up both id and str_id e.g. “166937349609364481” becomes 166937359609364480, whether quoted or unquoted – does this happen for you? Seems to be related to the length of the string (when it’s over 16 characters long).
@tom I seem to remember seeing something in a slightly different context about the new long form Twitter IDs causing problems.. I seem to remember there was some way of getting IDs back as strings rather than numerics for cases where long numbers couldn’t be handled…? This maybe? https://groups.google.com/forum/?fromgroups#!topic/twitter-api-announce/ccJeIN3T4_A