Archive for May 2009
Email By Reference, Not By Value
Last week, I did a fair bit of driving around the UK (one of the reasons why this blog was quiet) which meant that I got a chance to catch-up with a backlog of unlistened to podcasts, in particular a whole set of presentations from my IT Conversations subscription. One episode that especially caught my attention was one of Jon Udell’s Interviews with Innovators (one of the best podcast series I know): Computational Thinking for Everyone, with Joan Peckham.
(For those of you who haven’t come across the idea of computational thinking, check out this earlier interview with Jeannette Wing, the blurb for which describes computational thinking as “ways of thinking and problem-solving that involve algorithms and data structures and levels of abstraction and refactoring [and that] aren’t just for computer scientists, they’re really for everybody”. See also: Computational Thinking, Communications of the ACM, March 2006/Vol. 49, No. 3 (PDF).)
One of the items that came up was the idea of passing by variables by reference, rather than by value. (If this means nothing to you, check out Pass by Reference vs. Pass by Value. [If you can find a simpler explanation, ideally as a CC licensed OER, please link to it in the comments below*.])
Now this by chance got me thinking about email, and the painful way people insist on mailing the same document to a cc list of recipients, and they all they reply with documents attached containing their comments on the original doc, and so on. Email used that way is passing by value. If a document is a variable, when we pass it as an attachment we pass it by value. So when a recipient changes the value, they have to return it (again by value) if they want the originator to see the changes they have made. They have to return it… More cc’ing, more attachments…
But instead, what if we pass a link (a reference) to a Google doc. In that case, anyone can change the doc, and everyone else sees the consequences of those changes. I add a comment to the doc, you can see it. And I didn’t havr to email the doc to you as an attachment. We were all passed a reference, and when any of us makes a change to the thing that was referenced, whenever anyone else looks at the doc, they see the changes I made. Passing by reference.
Wouldn’t it be much easier if we passed documents by reference, not by value?
And then came the next thought – the old, old idea I had about wikimail is actually all about passing the contents of the body of an email by reference, not by value. Long time readers may remember my wikimail (aka read/write email) ramblings from some time ago (and also this possiprobably broken wikimail GM script for GMail, but if not, here’s a recap: the body of the email is a wiki page.
That’s it.
When I send you an email, I write a wikipage and send it to you. When you open the email, you actually open on a wiki page; so you can reply to me by typing in the page and sending me a wikimail reply.
Thinking about the usability of that, maybe rather than getting a new reply message in my email box, I should get a blue flag ‘recently changed’ notification on the “original” email, rather than a red flag (unread) reply message in my inbox? Alternatively, keeping tabs on the ‘recent changes’ feed to a wikimail page I’d sent (or received) would alert me to ‘responses’. (These change alerts could be tweeted to me, maybe?)
And just as media wiki pages have a content tab and a discussion tab, I guess wikimail messages could have a similar split personality?
Anyway, anyway, when I got back home, I saw announcements all over the place to Google Wave. I don’t know if it’s anything like wikimail (I haven’t had a chance to look at the info/watch the movie at all yet), but if not, I think there’s still scope there for reinventing mail by reference, rather than by value…
* This suggests a strategy to me for releasing OERs: having somewhere where I can request a resource that addresses a particular topic. OER publishers can then use that list to provide information about what materials the community needs and what they are likely to reuse… In this case, I’d like a mini tutorial on pass by value vs. pass by reference, in the abstract (i.e. not tied to the syntax of a particular language).
Here’s something to be going on with:
(Bah, WordPress can’t decipher a predetermined start time – watch the video starting 4 mins 19 seconds in…)
PS for a thought provoking initial critique of Google Wave, see Tim O’Reilly’s Google Wave: What Might Email Look Like If It Were Invented Today?.
First Steps Towards a Generic Google Spreadsheets Query Tool, or At Least, A Guardian Datastore Interactive Playground (aka the Guardian Datastore Explorer)
So I had another bit of a tinker, and came up with some code that’s breaking all over the place, but I think there’s enough of a vision there to have something to say, so I’ll say it…
How’s about a generic query’n'viz tool for the Guardian datastore? My first (and maybe last) attempt at a back of an envelope, sometimes it works, sometimes it doesn’t, bare bones rapid prototype of just such a thing can be found here.
In my original post on Making it a Little Easier to Use Google Spreadsheets as a Database (Hopefully!), I sketched out a simple form for helping create calls to a Google spreadsheet using the Google visualisation API query language. I then extended this to try to make the query building a little more transparent: Last Night’s Update to the “Google Spreadsheets as a Database” Demo. Today’s step is to see how we can make it easier to pull in spreadsheets from the datastore collection as a whole.
So referring to the image below, if you select a spreadsheet from the drop down list and click preview, you should get a preview of the column headings from that spreadsheet:
(The new link is to an original Guardian blog post announcing or describing the data.)
The list items are pulled in from a tag on my delicious account, which actually bookmarks the original data blog posts. The URI for the spreadsheet is added to the end of the bookmark description, and keyed with a –:
ISSUE 1: Sometimes the spreadsheet doesn’t load… I don’t know if this is down to something I’m (not) doing or not (if you’ve seen this sort of error and know a cause/fix, please post a comment below).
I’ve found if you just keep canceling the alert and clicking “Preview” the file loads in the end…
Scroll down on the page, and you can now start to build a query:
(See Last Night’s Update to the “Google Spreadsheets as a Database” Demo for more on this.)
Another new feature is the ability to preview results using various chart types, rather than just use a data table:
(Oh yes – the “bookmark” link should also allow you to share the current view with other people. At least, it shares the spreadsheet ID and the query, but not the view type…)
I haven’t implemented chart labeling, or the ability to set what values are used for what bit of the chart, so chart compomnent default rules apply. By juggling the queries (including changing the order of columns that appear in the various text boxes), you can sometimes get a reasonable chart out.
Of course, you can always just grab the CSV URL and then visualise the data in something like Many Eyes Wikified.
The chart components I used are all taken from the Google Visualisation API, so they play nicely with the Google data source representation that holds the data values.
So, that’s where it’s at, and that’s probably me done with it now… (I think I can see what’s possible so my fun is done…) And if you haven’t got an inkling of what it is I think I can see, it’s this:
A front end to the Guardian data store that lets readers:
- select a data set from the datastore (and maybe get a chance to view the original story from the datablog; I guess this could be pulled in from the Guardian OpenPlatform API?)
- write queries on that dataset to generate particular views of the data;
- generate CSV and HTML preview view URLs for a particular query, so the data can be shared with other people (turning different views on subsets of the data into social objects);
- generate quick visualisation previews of different views of the data.Nice to haves? Maybe links to stories that also mention the data, again via the OpenPlatform API? A collection of different bookmarks/views that use the same spreadsheet, so readers can share their own views of the data (the sort of social thing that Many Eyes Wikified offers). An opportunity to accept comments on a dataset? etc etc
All told, I reckon it’s taken less than 20 hours of solo effort (except for a bit of 3rd party bug spotting ;-), plus time to write the blog posts, to get this far (but the state of the code shows that: it’s truly scrappy). A fair amount of that time was spent learning how to do stuff and looking at exemplar code on Google AJAX APIs Code Playground. Of course, there are bugs’n'issues all over the place, but as people bring them to my attention, I tend to see if there’s a quick fix…
PS (I think) I’ve just noticed a Google data source wrapper for Pachube (Google Visualization API for Pachube history), which means that as well as pulling in Guardian datastore content from Google spreadsheets (as well as other publishers’ content on Google spreadsheets), this ‘interface’ could also be applied to Pachube data. (If you know of anyone else who exposes the Google visualisation/data source API, please post a link below.)
PPS search key: I also call this the Guardian Datastore Explorer…
Last Night’s Update to the “Google Spreadsheets as a Database” Demo
[If you're looking for my November 2010 Guardian Datastore/Government Spending Data explorer, see the post: Government Spending Data Explorer]
Yesterday I posted about Making it a Little Easier to Use Google Spreadsheets as a Database (Hopefully!), demoing a simple interface for constructing URIs to query Google spreadsheets using the Google query language.
One of the problems the interface presents is reconciling the column headings with the column letters (the first column is, by default, column ‘A’; the second is ‘B’, and so on).
So using the , I made a few tweaks to generate this version…
Firstly, when you preview the spreadsheet column headings at the top of the screen, a ‘headings by column label’ is also produced:
Along with that is a list box containing the column headings. (Note that this headings are pulled down from the spreadsheet whose key you entered. SO if you use another spreadsheet key, you’ll probably get different headings…;-)
As you select headings, two things happen: firstly, the column labels are added to the text box whose contents actually go towards producing the query:
(You’ll notice that the selection box is a multiple selection box).
Secondly, the ‘where’ selection box is populated using columns that have been selected:
(Note that it is quite possible to construct queries of the form select A,B,C where D=42, but I took the gamble that for a simple(?!) interface, it’s reasonable to expect users to be selecting columns that they’re also searching on; because the actual query is constructed based on the contents of the text boxes, the user is free to edit those as they will. The list boxes are just supposed to make things a little easier…)
When choosing items in the where selection box, I add some possible conditional options to the text box for that column:
If you choose multiple where conditions, they are wrapped in brackets and an and/or prompt is used to join the separate statements:
Another couple of minor tweaks:
- single quote marks weren’t being encoded properly in the created CSV and HTML URIs (thanks to an email from Andy Cotgreave for tipping me off to that one with an example query that wasn’t working for him); hopefully this is fixed now (encodeURIComponent doesnlt encode single quotes – but escape() does).
- you can now just past a Google spreadsheet URI into the ‘key’ entry box and it should be parsed out (thanks to Simon Dickson for that suggestion in a comment to the previous post).
And here’s the regular expression I used:
if (/key=([^&]*)/.test(key)) key=/key=([^&]*)/.exec(key)[1];
That’s it for now, so back to work…
PS A couple of gotchas… Firstly, if you own a spreadsheet that isn’t publicly viewable and you use this tool, if you’re logged in to your Google account it will probably work fine, but it won’t work for anyone who doesn’t have permission to see that sheet. Secondly, if you own a spreadsheet, it’s tempting to put ‘metadata’ in rows below the table. That means this metadata appears in data meaningful columns. So I’m starting to think that ‘metadata’ such as where the data was sourced from, possibly ownership and contact details for the data owner etc should be placed in a separate column at the right hand side of the table?
Making it a Little Easier to Use Google Spreadsheets as a Database (Hopefully!)
It was nice to see a couple of people picking up on my post about using Google Spreadsheets as a database (e.g. Using Google Spreadsheets as a database (no, it really is very interesting, honest) over at the Online Journalism blog), but it struck me that the URL hacking involved might still deter some people.
(Btw, the only list of keywords I’ve found to date for the query language are on the official documentation pages, and even then they aren’t complete…)
So – I spent an hour or so last night putting together a first attempt at a form based interface for writing the queries and constructing the URLs.
The form starts with a slot for the key of the spreadsheet you want to query – clicking on the preview button should display the column headings:
This preview can be used to help you select the columns you want to interrogate or return in your query, counting left-to-right: A, B, C and so on.
Next up are some hints on example queries:
and then there is the query form itself:
I’ve made a start on separating out the different bits of query, but there’s clearly lots more that could be done. For example, an optional “order by” slot could be provided (with a post-qualifying asc or desc selection), or the select entry box could be turned into a multiple selection listbox displaying the column headers, (but I only gave myself an hour, right?;-) [Note to self: lots of handy functions here - Google Visualization API Reference]
Anyway, once you make the query, links to the URIs of the HTML preview and CSV versions of the query are automatically generated, and the HTML table of results is displayed:
The CSV URI can then be used to import the data into a Many Eyes Wikified data page, for example.
Anyway, hopefully this makes it a little easier to get people started with these queries. A good place to start looking for spreadsheets is on the Guardian DataBlog.
Note that this “work” also ties in strongly to the idea of “data journalism (hashtag: #datajourn) which I’d be interested in hearing your thoughts about…
And a Tweet Later – Querying Shadow Cabinet Expenses on Google Spreadsheets with the Google Query Language
6 minutes past 5 this evening:
(There was a problem with the publishing/privacy settings – maybe the tech team would like to post a comment saying what, exactly – that meant while the spreadsheet was viewable in preview form and as CSV, it was impossible to run Google visualisation API query language queries against it.)
Half an hour later:
And ten minutes after that:
So what? So this:
(That’s a test query using the Google visulisation API query language on the Shadow Cabinet expenses (via Google the Shadow Cabinet’s expenses).
And with a little tweaking, we can get a summary of the expenses by Shadow Cabinet member – run a select A,C,sum(E) group by A,C query on the spreadsheet:
(If you prefer to see the full total by member, use this query: select A,sum(E) group by A)
Now take the CSV output version of the query, pipe it into a Many Eyes Wikified dta page and plot it as an interactive tree map:
Or if you prefer a bubble chart?
Or maybe a matrix chart?
(I think that Many Eyes WIkified updates its data pages from live data feeds, so hopefully the linked to visualisation should remain pretty much up to date? [All the visulisations can be reached from this Many Eyes Wikified page.])
Isn’t this fun?:-) So why don’t you have a go?????
PS for a more comprehensive review of what’s possible with the query language, I’ve posted a wide selection of examples here: Using Google Spreadsheets as a Database with the Google Visualisation API Query Language.
Using Google Spreadsheets as a Database with the Google Visualisation API Query Language
Wouldn’t it be handy if we could treat all the public spreadsheets uploaded to Google docs as queryable tables in a database? Well, it appears that you can do so, at least at an individual spreadsheet level: Introducing the Google Visualization API.
Over the weekend, I started exploring the Google Visualisation API Query Language, which is reminiscent of SQL (if that means anything to you!). This language provides a way of interrogating a data source such as a public online Google spreadsheet and pulling back the results of the query as JSON, CSV, or an HTML table.
Got that? I’ll say it again: the Google Visualisation API Query Language lets you use a Google spreadsheet like a database (in certain respects, at least).
Google query languages are defined on a spreadsheet in the following way:
http://spreadsheets.google.com/tq?tq=QUERY&key=SPREADSHEET_ID
Although defined, by default, to return JSON data from a query, wrapped in a pre-defined (and fixed?) callback function (google.visualization.Query.setResponse()), it is also possible to display the results of a query as an HTML table (which is “useful”, as the documentation says, “for debugging”). The trick here is to add another argument to the URL: tqx=out:html, so for example a query would now be defined along the lines of:
http://spreadsheets.google.com/tq?tqx=out:html&tq=QUERY&key=SPREADSHEET_ID
Using the Guardian datastore’s MPs expenses spreadsheet 2007-8 as an example, we can write quite a wide variety of queries, which I’ll show below in their ‘HTML preview’ form.
(In a ‘real’ situation, you are more likely to retrieve the data as JSON and then process it as an object. Or, as I will also demonstrate, take the results of the query as CSV output (tqx=out:csv rather then tqx=out:html) and pull it directly into a service such as Many Eyes WIkified.)
The generic URL is of the form: http://spreadsheets.google.com/tq?tqx=out:html&tq=QUERY&key=phNtm3LmDZEObQ2itmSqHIA.
In the examples, I will just show the unencoded select statement, but the link will be the complete, well-formed link.
So here we go:
- show everything – fetch the whole table: select * (in a lot of computer languages, ‘*’ often refers to ‘everything and anything’);
- just show some particular columns, but again for everyone: fetch just columns B (surname), C (first name) and I (total additional costs allowance): select B,C,I
- only show the names of people who have claimed the maximum additional costs allowance (£23,083): fetch just columns B, C and I where the value in column I is 23083: select B,C,I where I=23083 (column I is the additional costs allowance column);
- How many people did claim the maximum additional costs allowance? Select the people who claimed the maximum amount (23083) and count them: select count(I) where I=23083
- So which people did not claim the maximum additional costs allowance? Display the people who did not claim total additional allowances of 23083: select B,C,I where I!=23083 (using <> for ‘not equals’ also works); NB here’s a more refined take on that query: select B,C,I where (I!=23083 and I>=0) order by I
- search for the name, party (column D) and constituency (column E) of people whose first name is Jane or is recorded as John (rather than “Mr John”, or “Rt Hon John”): select B,C,D,E where (C contains ‘Joan’ or C matches ‘John’)
- only show the people who have claimed less than £100,000 in total allowances : select * where F<100000
- what is the total amount of expenses claimed? Fetch the summed total of entries in column I (i.e. the total expenses claimed by everyone): select sum(I)
- So how many MPs are there? Count the number of rows in an arbitrary column: select count(I)
- Find the average amount claimed by the MPs: select sum(I)/count(I)
- Find out how much has been claimed by each party (column D): select D,sum(I) where I>=0 group by D (Setting I>0 just ensures there is something in the column)
- For each party, find out how much (on average) each party member claims: select D,sum(I)/count(I) where I=0 group by D
To create your own queries, just hack around the URIs.
Many Eyes WIkified is no more…One other trick is to grab a CSV output, rather than an HTML output, and pull it into Many Eyes Wikified, and then visualise it within that environment – so we grab the data (in this case, using select D,sum(I) where I>=0 group byD, i.e. the total amount of additional costs allowance claims by party):
to give this:
and then visualise it in an appropriate way:
So to recap this final case, then, we are running a query on the original spreadsheet that calculates the total additional costs allowance claims per party, and emits the results as CSV. These results are imported into Many Eyes Wikified, and displayed therein.
Now I’m pretty sure that Many Eyes Wikified will continue (how often?) to synch data from a potentially changing data source, which means we should be able to use a similar approach to plot a running total of claims from the Shadow Cabinet Expenses spreadsheet…
…but, at the time of writing at least, it seems as if the publication/privacy settings on that spreadsheet are set such that access via th query language is denied…:-(
Anyway – that was a quick intro to the Google Visualisation API Query Language – so go play… ;-)
PS so what other spreadsheets might make for some interesting queries?
PPS @adrianshort has made a valuable point about how easy it is for a publisher to change the order of rows in a spreadsheet, and hence make a nonsense of your query. (Also, I think the approach I’m taking sort of assumes a simple, regular spreadsheet where row 1 is for headers, then the data, and ideally no other text e.g. in cells below the table describing the data in the table.) So always check… ;-)
PPPS If the first row in the table defines column headings, then there are intervening lines (maybe spaces) before the data starts, putting offset N (where N is a number) will skip that many rows before displaying the data.
Something else I noticed on the order by setting, this can be of the form order by COL asc (to sort in ascending order, which is the default) or order by COL desc ( to sort in descending order).
Querying a Google Spreadsheet of MPs’ Expenses Data: So Who Claimed for “biscuits”?
Yesterday, the Guardian published a spreadsheet to their Data Store containing all the MPs’ expenses revelations to date in a spreadsheet form (“MPs’ expenses in the news: all the revelations, as a spreadsheet“)*.
So it struck me that I should be able to find a way of easily searching that data to find just those MPs who had, for example, been claiming for biscuits…
[If you don't want to read how it's done, cust straight to the MPs' Expenses Search to find who's been claiming for biscuits]
* I actually found the link to the story just now from a known item search on Google, site limited to the Guardian domain and restricted in time to the last 24 hours using on of the new Google search options (I remembered seeing the story on the Guardian website somewhere yesterday). Note to self: this is a really handy trick for searching over recent content on a particular site:-)
(To tidy those search results even more, and remove the RSS feed results, just add -inurl:feedarticle to the search terms… i.e. exclude results that have feedarticle in the URL.)
Anyway, the question was, how to search the data in the spreadsheet. Now I had a half memory from HTML Tables and the Data Web of Google releasing a query language that would allow you to query data in a “data table” object embedded in a web page – the Google QUery Language – which it turns out can be used to interrogate anything defined as a Google visualisation API data source…
…and it just so happens that Google spreadsheets are so defined: Using a Google Spreadsheet as a Data Source.
So this means that I should be able to use the Google visualisation API query language to run a query on a Google Spreadsheet; like the MPs’ expenses data spreadsheet; like asking it for who’s claimed for biscuits…
So here’s what I want to do:
1) create a data table that pulls data in from a Google spreadsheet;
2) actually, that’s not strictly true – I want to run a query on the spreadsheet that pulls in some of the data from the spreadsheet (in particular, just the rows that satisfy the query);
3) I want to display the results in a table using the visualisastion API libraries (so then I don’t have to write any code to display the data myself; and more than that, I don’t even need to understand how the data has been returned from the spreadsheet).
Okay – so the ambitious next step is to try to write a test query on the spreadsheet by trying to make sense of the Google documentation, which is never as helpful as it might be.
No joy, so in the end, I copied and pasted some example code from the closest working example to what I wanted from Google’s interactive AJAX APIs Playground – an example of just getting data into a web page from a spreadsheet using the Google visualisation API libraries:
Okay – so what this example does is run a query on a spreadsheet and plot the data as a map. Just seeing the code isn’t much help though – what libraries do I need to load to run it? So I exported the whole example into a standalone worked example, did a View Source, and copied the code wholesale.
Good, I now have a canned example that pulls in data from a spreadsheet. Next step – I want to display a data table, not a map.
Again, the API Playground comes in handy – check out the table example and see what bits of the code need changing:
Change the demo code so it displays the data from the example spreadsheet as a table rather than a map, and check it works. It does. Good… So now change the spreadsheet key and see if it works to display the expenses data. It does. Good again.
Okay, now I can start to write some test queries. The AJAX API playground provides a crib again, this time in the form of the Using the Query Language example:
(Hmmm… maybe I should have just worked from this example from the start? Ah well, never mind, note to self: teach the changes required from just this example next time…)
Now it’s fun time… writing the query, the query language documentation suggests only equivalence style relations are possible, but I want to use a conditions along the lines of “select * where M LIKE ‘%biscuits%’ – that is, give me [select] all the columns in a row [*] where [where] column M [M] contains [LIKE] the word ‘biscuits’ ['%biscuits%'].
Typing a suitably encoded a test query URL (there’s a tool to encode the query string on the query language documentation page) into the browser location bar didn’t work :-( BUT, it turned up an informative error message that described some phrases the query language does support, or at least, that are expected by the spreadsheet:
So let’s try contains rather than LIKE… which works…
Okay, so now the long and the short of it is, I know how to write queries.
So for example, here’s searching the name column (so you can search for your MP by name):
var query=’select * where A contains “‘+q+’”‘ (e.g. search for Huhne)
Here’s searching the constitutency column (so you can search or your MP by constituency):
var query=’select * where B contains “‘+q+’”‘ (e.g. Edinburgh)
And here’s searching several columns for a particular item:
var query=’select * where (M contains “‘+q+’” OR O contains “‘+q+’” OR Q contains “‘+q+’” OR S contains “‘+q+’” OR U contains “‘+q+’” OR V contains “‘+q+’”)’
Add it all together, and what have you got? A way of searching to see who’s been claiming for biscuits:
Note that searches are case sensitive…(anyone know if there’s a way round this?)
So there you have it: an MP’s expenses search engine via Google Spreadsheets :-)
MindMap Navigation for Online Courses
We’re now a couple of weeks in to a new course (T151) and whilst I’m wary of posting too much about it just at the moment, there a some spinoff thoughts I do want to capture here.
The course is, in part, based on a model of weekly Topic Explorations, where I pose four or five questions and then provide a list of resources for the students to explore, as guided by the questions. An 800 word or so piece then captures some of my observations about the topic.
The structure was informed by a model my colleague John Naughton had used on a different course, and also resembles that of David Wiley’s Blogs, Wikis and New Media course.
One of the questions that came up in a course forum a day or two ago was the course legacy, in terms of access to course materials. The resources I link to from each topic exploration are all web based resources, although some of them are authentication required subscription journal articles, with access provided via the OU Library libezproxy service (the links are also constructed around DOIs, wherever possible).
As part of the Week 0 activities for the course, I provided a quick overview of social bookmarking services, suggesting that students could bookmark those resources that were useful to them, with the advantage that these resource links would still be available once the course had finished and access to the course materials on the VLE withdrawn. (Why we can’t provide a Moodle export version of the materials for students to put in their own Moodle installation at the end of a course, I don’t know? Eg I think NineHub lets you import Moodle courses into their 1-click setup hosted Moodle installations?)
One idea I did entertain was just bookmarking and tagging all the resources so that they could be pulled into the course automatically via an appropriate feed, or alternatively pulled by students into their own space, wherever that might be. The feed powered approach would also make a WiggLE possible ;-)
That’s still possibly on the cards, but instead I began considering another possiblity: delivering the course via an interactive mindmap.
One of the advantages that this offers, also picked up in a forum post, is that it addresses the issue of how and where to take notes: you can take them on the Mindmap. That is, the Mindmap becomes a navigation surface, and a note taking service.
So for example, here’s a fragment of David Wiley’s course, mindmap style (created using Freemind) showing in particular the first week’s resources (see the orginal course material here):
The red arrows identify links – click on a link and the corresponding page will open in a web browser. The course can be viewed and navigated in a far more powerful way than a hierarchical website, becuase mulriple nodes at diffferent levels, and mutliple leaves of the tree can be viewed (or collapsed) at once. The mindmap tool also allows the user to rearrange the spatial layout to suit their own needs. And of course, if they are viewing the mindmap in an interactive mindmap editor, they can add notes as subnodes to any of the resources.
Over the next few days, I think I’ll do T151 in mindmap form, and maybe offer it up as a resource. After all, the course is going out in pilot form, so it’d be foolish not to… ;-)
Searching By Looking Elsewhere
A couple of weeks or so ago, I got an email requesting a link to something I’d spoken about at a department meeting some time ago (the Gartner hype cycle, actually). Now normally I’d check my delicious bookmarks for a good link, or maybe even run a Google web search, but instead I ran a search for ‘gartner hypecycle 2008′ on Google Images…
…which is when it struck me that searching Google Images may on occasion lead to better quality, or more relevant, results than doing a normal web search, particularly if you use a level of indirection. In particular, it can often lead to a web document or post that provides some sort of analysis around a topic. (Remember, Google image search links to the web pages that contain the images that are displayed in the image search results, not just the images.)
So for example, a web search for games console sales chart [web search] turns up a different set of results to an image search for games console sales chart [image search]. And here’s where my gut feeling comes in about using the fact that documents contain images as a filter – if people have gone to the trouble of including a relevant image in something they have published, their post may be more considered on a particular topic than one that doesn’t. That is, the inclusion of a relevant image can be used as a valuable ranking term when searching for results. Essentially, you are running an advanced, search limited query around an image document type.
Note that it’s often sensible, when sharing image queries, to make the search a ‘safe’ (i.e. adult content filtered) one: in Google, just add &safe=active to the end of the URL.
(The image search approach also lets me quickly scan the results for one that appears to contain the sort of chart data I want. Supporting visual filtering is one reason why some search engines have experimented with including an image from each linked to page in the search engine results listing.)
Limiting searches by document type can also be achieved in a normal web search too, of course. For example, if you are looking for a report on knife crime in UK cities, then it might be reasonable to suspect that the most relevant documents were published as PDFs – so limit on that:
If you’d rather use the normal Google search box as a command line, the search query is: uk+knife+crime+report+filetype:pdf
If you’re looking for actual data, it might make sense to search on spreadsheet documents? uk knife crime statistics filetype:xls
As well as variously using the keyword ‘chart’ or ‘statistics’, the word ‘data’ or ‘table’ can also help tune results, particularly when running an image search. Remember, the point may not necessarily to find a chart, or set of data directly. Instead, it may be using the fact that a document contains a chart or a table to limit the results you get back (assuming that documents or posts containing charts, tables, etc., are likely to be more considered on a particular topic simply because the author has gone to the trouble of including a a chart or a table etc.)
Increasingly, I find I’m also using Youtube to search for particular items of BBC content. Note that my motivation here is not necessarily to use the video clip I have found directly, mainly because a lot of BBC related footage on Youtube has not been put there by the BBC – i.e. it is more likely to be copyright infringing content uploaded by an individual.
Instead, I am making use of:
1) the segmenting of video clips that individuals have done (chopping a 3 minute clip out of an hour long documentary, for example);
2) the user provided metadata around the clip – the title they have given it, the description text, the tags used to annotate it;
3) the automatically generated ‘related video’ service provided by Youtube,
to help me deep search into BBC content so that I can quickly find a clip that can then be obtained in a rights approved manner, without having to wade through hours and hours of video searching for a clip I want to use.
That is, it is possible to use Youtube as a great big index of BBC ‘deep clips’, in the sense that they are clipped from deep within a longer programme, to locate a particular clip that can then be obtained in a rights cleared fashion: searching Youtube to find something that I will then go elsewhere for.
So the take home message from this post? The best place to search for a particular resource may not be the obvious one.
Filter Tweets by Language
I’ve been so bored not tinkering lately, when I saw the follow tweet today from @tsimonite, I just had to have a 5 minute play:
So here was my thought process:
- hmm, Google has a language detection API, so we can just pass the tweets through that and filter on language; it’s probably easiest for a proof of concept to just filter a person’s Twitter RSS feed…
- now where’s the API? Search for google language detection api and turn up Google AJAX Language API documentation.
- ah, buggrit, too much like hard work – I’m sure I built a a pipe round this before: search for language detection yahoo pipe… This’ll do nicely:
Paul Donnelly’s Language Detector Pipe:
(For reference, the call to the API is simply:
http://ajax.googleapis.com/ajax/services/language/detect?q=TEXT_STRING&v=1.0
So not that hard at all! ;-)
Clone the pipe, and pop it into another pipe that accepts a Twitter RSS feed:
Hmm… that’s probably too hard for most people to use, expecting people to be able to find someone’s RSS twitter feed… so make it easy – create the feed URL from a Twitter username:
Here’s the pipe: English tweets only.
(Just by the by, it took considerably longer to write this post than it did from seeing the tweet to publishing the pipe…)
PS What would be really nice would be for a Twitter client, such as Tweetdeck, to offer its own plugin architecture/API, so I could create a little routine based on something like the above to provide a language filter for incoming tweets, maybe on a per column basis, as required (or not).
It would then be easy to wire in things like Tweetspeech, which speaks your tweets out aloud, or a simple tweet translator:
Then of course there are all the geo related plugins you could do, such as a simple Tweetmap, or feed annotation services like serendipitwitterous or serendipitwitternews.









































