Following on from a the previous posts in which I looked at pulling data tables out of wikipedia into a Google spreadsheet and accessing financial data within Google spreadsheets, in this post I’ll show you some of the things you can do with the Google spreadsheet formula “=GoogleLookup”.

If you’ve ever entered a search such as “capital of brazil“, you might have noticed that the Google web search engine will try to treat your query as a factual question, and answer it as such:

Why’s this useful? One thing it seems to be good for is as a way of doing lots of factual look-ups in one go. How so?

Create a new Google spreadsheet and type the following into cell B2:

In cell A2 enter the word France and in cell B1 the word Population.

If you hover over cell A2, now populated with a number corresponding to the population of France, a tooltip will pop up showing the provenance of the data (i.e. where it came from), with a link to the data source.

Select cell A2 (the one containing the formula) and drag it using the small square in the bottom right hand corner of the cell across the row to cell C5. Highlight cells A2 to D2 and drag the small square marker in the bottom right hand corner of cell D2 across to cell D5. The formula will be copied into cells A2 to D5. The “\$” symbols in the formula anchor various aspects of the formula so that it will look for the entity in column A of the current row, and the attribute in row 1 of the current column.

If you now add appropriate values into row 1 and column A, the GoogleLookup formula will be lookup the corresponding entity/attribute values (i.e. the corresponding facts).

Although I’ve had a quick look around for a comprehensive directory of allowable Google Lookup attribute values (and the sort of entity they should be paired with), I haven’t managed to find one. So I’ve put together a spreadsheet at http://spreadsheets.google.com/ccc?key=p1rHUqg4g421BB8IrYonydg where I’ll attempt to collate all the allowable lookup attributes I can find, along with appropriate exemplar entity values.

Using the ‘relative lookup formula’ approach shown above, the tables will show attribute values as column headings, and example entities as row headings.

At the moment I’ve started putting together Geography, Business and Chemistry lookup collections together:

In order to do a “bulk fact lookup”, enter a list of row label entity values into a spreadsheet, a set of attribute (fact) values as column headings, enter an appropriate ‘relativised GoogleLookup formula” into the first table data cell, and then drag it across the set of table data cells: voila – a bulk query made in a single shot across multiple entity and attribute values:-)

If you come across any more GoogleLookup attributes that seem to work, post a couple of example entity/attribute pairs in a comment to this post and I’ll add them to the =GoogleLookup formula attribute directory spreadsheet.

PS Looking at the sample spreadsheet just now, I notice that quite a few cells in the table may be unpopulated for particular entity values. So in a read/write web world, it should be possible to enter a value into the cell and have it logged, err, somewhere? Such as in a Google consensual fact engine?! ;-)

Following on from Data Scraping Wikipedia With Google Spreadsheets, here’s a quick post showing how you can use another handy Google spreadsheet formula:

This function will pull in live – and historical – price data for a stock.

Although I noticed this formula yesterday as I was exploring the “importHTML” formula described in the Wikipedia crawling post, I didn’t have time to have a play with it; but after a quick crib of HOWTO – track stocks in Google Spreadsheets, it struck me that here was something I could have fun with in a motion chart (you know, one of those Hans Rosling Gapminder charts….;-)

NB For the “official” documentation, try here: Google Docs Help – Functions: GoogleFinance)

# Stock quotes and other data may be delayed up to 20 minutes. Information is provided “as is” and solely for informational purposes, not for trading purposes or advice. For more information, please read our Stock Quotes Disclaimer.
# You can enter 250 GoogleFinance functions in a single spreadsheet; two functions in the same cell count as two.

So – let’s have some fun…

Fire up a new Google spreadsheet from http://docs.google.com, give the spreadsheet a name, and save it, and then create a new sheet within the spreadsheet (click on the “Add new sheet” button at the bottom of the page). Select the new sheet (called “Sheet2” probably), and in cell A1 add the following:

In case you didn’t know, AAPL is the stock ticker for Apple. (You can find stock ticker symbols for other companies on many finance sites.)

The formula will pull in the historical price data for Apple at weekly intervals from the start of 2008 to October 10th. (“all” in the formula means that all historical data will be pulled in on each sample date: opening price, closing price, high and low price, and volume.)

(If this was the live – rather than historical – data, it would be updated regularly, automatically…)

I’m not sure a bar chart or scatter chart are quite right for historical stock pricing… so how about a line chart:

Et voila:

If you want to embed this image, you can:

If I was using live pricing data, I think the image would update with the data…?

Now create a few more sheets in your spreadsheet, and into cell A1 of this new sheet (sheet3) paste the following:

This will pull in the historical price data for IBM.

Create two or three more new sheets, and in cell A1 of each pull in some more stock data (e.g. MSFT for Microsoft, YHOO for Yahoo, and GOOG…)

Now click on Sheet1, which should be empty. Fill in the following title cells by hand across cells A1 to G1:

Now for some magic…

That key value – the value between “key=” and “&hl=en_GB#” is important – to all intents and purposes it’s the name of the spreadsheet. Generally, the key will be the characters between “key=” and an “&” or the end of the URL; the “&” means “and here’s another variable” – it’s not part of the key.

In cell B2, enter the following:

=ImportRange(“YOURKEY”, “Sheet2!A2:F42”)

=ImportRange(“p1rHUqg4g423seyxs3O31LA”, “Sheet2!A2:F42”)

What ImportRange does is pull in a range of cell values from another spreadsheet. In this case, I’m pulling in the AAPL historical price data from Sheet2 (but using a different spreadsheet key, I could pull in data from a different spreadsheet altogether, if I’ve made that spreadsheet public).

In cell A2, enter the ticker symbol AAPL; highlight cell A2, click on the square in the bottom right hand corner and drag it down the column – when you release the mouse, the AAPL stock ticker should be added to all the cells above. Label each row from the imported data, and then in the next row, B column, import the data from Sheet 3:

These rows will need labeling “IBM”.

Import some more data if you like and then… let’s create a motion chart (info about motion charts.

Highlight all the cells in sheet1 (all the imported data from the other sheets) and then from the Insert menu select Gadget; from the Gadget panel that pops up, we want a motion chart:

Configure the chart, and have a play [DEMO]:

Enjoy (hit the play button… :-)

PS And remember, you could always export the data from the spreadsheet – though there are probably better API powered ways of getting hold of that data…

PPS and before the post-hegemonic backlash begins (the .org link is broken btw? or is that the point?;-) this post isn’t intended to show how to use the Google formula or the Motion Chart well or even appropriately, it’s just to show how to use it to get something done in a hacky mashery way, with no heed to best practice… the post should be viewed as a quick corridor conversation that demonstrates the tech in a casual way, at the end of a long day…

PS for a version of this post in French, see here: Créer un graphique de mouvement à partir de Google Docs.

Prompted in part by a presentation I have to give tomorrow as an OU eLearning community session (I hope some folks turn up – the 90 minute session on Mashing Up the PLE – RSS edition is the only reason I’m going in…), and in part by Scott Leslie’s compelling programme for a similar duration Mashing Up your own PLE session (scene scetting here: Hunting the Wily “PLE”), I started having a tinker with using Google spreadsheets as for data table screenscraping.

So here’s a quick summary of (part of) what I found I could do.

The Google spreadsheet function =importHTML(“”,”table”,N) will scrape a table from an HTML web page into a Google spreadsheet. The URL of the target web page, and the target table element both need to be in double quotes. The number N identifies the N’th table in the page (counting starts at 0) as the target table for data scraping.

So for example, have a look at the following Wikipedia page – List of largest United Kingdom settlements by population (found using a search on Wikipedia for uk city population – NOTE: URLs (web addresses) and actual data tables may have changed since this post was written, BUT you should be able to find something similar…):

Grab the URL, fire up a new Google spreadsheet, and satrt to enter the formula “=importHTML” into one of the cells:

Autocompletion works a treat, so finish off the expression:

=ImportHtml(“http://en.wikipedia.org/wiki/List_of_largest_United_Kingdom_settlements_by_population&#8221;,”table”,1)

And as if by magic, a data table appears:

All well and good – if you want to create a chart or two, why not try the Google charting tools?

Where things get really interesting, though, is when you start letting the data flow around…

So for example, if you publish the spreadsheet you can liberate the document in a variety of formats:

As well publishing the spreadsheet as an HTML page that anyone can see (and that is pulling data from the WIkipedia page, remember), you can also get access to an RSS feed of the data – and a host of other data formats:

See the “More publishing options” link? Lurvely :-)

Let’s have a bit of CSV goodness:

Why CSV? Here’s why:

Lurvely… :-)

(NOTE – Google spreadsheets’ CSV generator can be a bit crap at times and may require some fudging (and possibly a loss of data) in the pipe – here’s an example: When a Hack Goes Wrong… Google Spreadsheets and Yahoo Pipes.)

Unfortunately, the *’s in the element names mess things up a bit, so let’s rename them (don’t forget to dump the original row of the feed (alternatively, tweak the CSV URL so it starts with row 2); we might as well create a proper RSS feed too, by making sure we at least have a title and description element in there:

Make the description a little more palatable using a regular expression to rewrite the description element, and work some magic with the location extractor block (see how it finds the lat/long co-ordinates, and adds them to each item?;-):

DEPRECATED…. The following image is the OLD WAY of doing this and is not to be recommended…

…DEPRECATED

Geocoding in Yahoo Pipes is done more reliably through the following trick – replace the Location Builder block with a Loop block into which you should insert a Location Builder Block

The location builder will look to a specified element for the content we wish to geocode:

The Location Builder block should be configured to output the geocoded result to the y:location element. NOTE: the geocode often assumes US town/city names. If you have a list of town names that you know come from a given country, you may wish to annotate them with a country identify before you try to geocode them. A regular expression block can do this:

This block says – in the title element, grab a copy of everything – .* – into a variable – (.*) – and then replace the contents of the title element with it’s original value – \$1 – as well as “, UK” – \$1, UK

Note that this regular expression block would need to be wired in BEFORE the geocoding Loop block. That is, we want the geocoder to act on a title element containing “Cambridge, UK” for example, rather than just “Cambridge”.

Lurvely…

And to top it all off:

And for the encore? Grab the KML feed out of the pipe:

…and shove it in a Google map:

So to recap, we have scraped some data from a wikipedia page into a Google spreadsheet using the =importHTML formula, published a handful of rows from the table as CSV, consumed the CSV in a Yahoo pipe and created a geocoded KML feed from it, and then displayed it in a YahooGoogle map.

Kewel :-)

PS If you “own” the web page that a table appears on, there is actually quote a lot you can do to either visualise it, or make it ‘interactive’, with very little effort – see Progressive Enhancement – Some Examples and HTML Tables and the Data Web for more details…

PPS for a version of this post in German, see: http://plerzelwupp.pl.funpic.de/wikitabellen_in_googlemaps/. (Please post a linkback if you’ve translated this post into any other languages :-)

PPPS Once you have scraped the data into a Google spreadsheet, it’s possible to treat it as a database using the QUERY spreadsheet function. For more on the QUERY function, see Using Google Spreadsheets Like a Database – The QUERY Formula and Creating a Winter Olympics 2010 Medal Map In Google Spreadsheets.

## Visualising the OU Twitter Network

Readers of any prominent OU bloggers will probably have noticed that we appear to have something of Twitter culture developing within the organisation (e.g. “Twitter, microblogging and living in the stream“). After posting a few Thoughts on Visualising the OU Twitter Network…, I couldn’t resist the urge to have a go at drawing the OpenU twittergraph at the end of last week (although I had hoped someone else on the lazyweb might take up the challenge…) and posted a few teaser images (using broken code – oops) via twitter.

Anyway, I tidied up the code a little, and managed to produce the following images, which I have to say are spectacularly uninteresting. The membership of the ‘OU twitter network’ was identified using a combination of searches on Twitter for “open.ac.uk” and “Open University”, coupled with personal knowledge. Which is to say, the membership list may well be incomplete.

The images are based on a graph that plots who follows whom. If B follows A, then B is a follower and A is followed. In the network graphs, an arrow goes from A to B if A is followed by B (so in the network graph, the arrows point to people who follow you. The graph was constructed by making calls to the Twitter API for the names of people an individual followed, for each member of the OU Twitter network. An edge appears in the graph if a person in the OU twitter network follows another person in the OU Twitter network. (One thing I haven’t looked at is to see whether there are individuals followed by a large number of OpenU twitterers who aren’t in the OpenU twitter network… which might be interesting…)

Wordle view showing who in the network has the most followers (the word size is proportional to the number of followers, so the bigger your name, the more people there are in the OU network that follow you). As Stuart predicted, this largely looks like a function of active time spent on Twitter.

We can compare this with a Many Eyes tag cloud showing how widely people follow other members of the OU network (the word size is proportional to the number of people in the OU network that the named individual follows – so the bigger your name, the more people in the OU network you follow).

Note that it may be interesting to scale this result according to the total number of people a user is following:

@A’s OU network following density= (number of people @A follows in OU Twitter network)/(total number of people @A follows)

Similarly, maybe we could also look at:

@A’s OU network follower density= (number of people in OU Twitter network following @A)/(total number of people following @A)

(In the tag clouds, the number of people following is less than the number of people followed; I think this is in part because I couldn’t pull down the names of who a person was following for people who have protected their tweets?)

Here’s another view of people who actively follow other members of the OU twitter network:

And who’s being followed?

These treemaps uncover another layer of information if we add a search…

So for example, who is Niall following/not following?

And who’s following Niall?

I’m not sure how useful a view of the OU Twittergraph is itself, though?

Maybe more interesting is to look at the connectivity between people who have sent each other an @message. So for example, here’s how Niall has been chatting to people in the OU twitter network (a link goes from A to B if @A sends a tweet to @B):

We can also compare the ‘active connectivity’ of several people in the OU Twitter network. For example, who is Martin talking to, (and who’s talking to Martin) compared with Niall’s conversations?

As to why am I picking on Niall…? Well, apart from making the point that by engaging in ‘public’ social networks, other people can look at what you’re doing, it’s partly because thinking about this post on ‘Twitter impact factors’ kept me up all night: Twitter – how interconnected are you?.

The above is all “very interesting”, of course, but I’m not sure how valuable it is, e.g. in helping us understand how knowledge might flow around the OU Twitter network? Maybe I need to go away and start looking at some of the social network analysis literature, as well as some of the other Twitter network analysis tools, such as Twinfluence (Thanks, @Eingang:-)

PS Non S. – Many Eyes may give you a way of embedding a Wordle tagcloud…?)

## Google Personal Custom Search Engines?

A couple of days ago, I gave a talk about possible future library services, and in passing mentioned the way that my Google search results are increasingly personalised. Martin picked up on this in a conversation over coffee, and then in a blog post (“Your search is valuable to us“):

This made me think that your search history is actually valuable, because the results you get back are a product of the hours you have invested in previous searches and the subject expertise in utilising search terms. So, if you are an expert in let’s say, Alaskan oil fields, and have been researching this area for years, then the Google results you get back for a search on possible new oil fields will be much more valuable than the results anyone else would get.

[I]f you can assemble and utilise the expert search of a network of people, then you can create a socially powered search which is very relevant for learners. Want to know about really niche debates in evolution? We’ve utilised Richard Dawkins, Steve Jones and Matt Ridley’s search history to give you the best results. Or if you prefer, the search is performed as the aggregation of a specialist community.

There are more than a few patents in this area of course (you can get a feel for what the search engines are (thinking about) doing in this area by having a read through these SEO by the SEA posts on “search+history+personal”), but I was wondering:

how easy would it be to expose my personal search history reranking filter (or whatever it is that Google uses) as a custom search engine (under my privacy controls, of course)?

As Martin says (and as we discussed over coffee), you’d want to disable further personalisation of your CSE by users who aren’t you (to get round the Amazon equivalent of Barbie doll and My Little Pony “items for you” recommendations I seem to get after every Christmas!), but exposing the personal search engine would potentially be a way of exposing a valuable commodity.

In the context of the Library, rather going to the Library website and looking up the books by a particular author, or going to ORO and looking up a particular author’s papers, you might pull their personalised search engine off the shelf and use that for a bit of a topic related Googling…

In a comment to Martin’s post, Laura suggests “Aren’t the search results that the expert actually follows up and bookmarks more powerful? Every academic should be publishing the RSS feeds for their social bookmarks, classified by key terms. The user can choose to filter these according to the social rating of the URL and aggregate results from a group of experts according to their reputation in their field and their online expertise in finding valuable sources.”

I guess this amounts to a social variant of the various “deliSearch” search engines out there, that let you run a search over a set of bookmarked pages or domains (see Search Hubs and Custom Search at ILI2007, for example, or these random OUseful posts on delicious powered search etc)?

At times like these, I sometimes wish I’d put a little more effort into searchfeedr (example: searching some of my delicious bookmarks tagged “search’ for items on “personal search”). I stopped working on searchfeedr before the Google CSE really got going, so it’d be possible to build a far more powerful version of it now…

Anyway, that’s maybe something else to put on the “proof-of-concept to do” list…

PS Note to self – also look at “How Do I?” instructional video search engine to see how easy it would be to embed videos in the results…

A couple of weeks ago, I popped the Stanza ebook reader application on my iPod Touch (it’s been getting some good reviews, too: Phone Steals Lead Over Kindle ). I didn’t add any ebooks to it, but it did come with a free sample book, so when I was waiting for a boat on my way home last week, I had a little play and came away convinced that I would actually be able to read a long text from it.

So of course, of course, the next step was to have a go at converting OpenLearn courses to an ebook format and see how well they turned out…

There are a few ebook converters out there, such as the Bookglutton API that will “accept HTML documents and generates EPUB files. Post a valid HTML file or zipped HTML archive to this url to get an EPUB version as the response” for example, so it is possible to upload a download(!) of an OpenLearn unit ‘print version’ (a single HTML page version of an OpenLearn unit) or upload the zipped HTML version of a unit (although in that case you have to meddle with the page names so they are used in the correct order when generating the ebook).

The Stanza desktop app, free as a beta download at the moment, but set to be (affordable) payware later this year can also handle epub generation (in fact, it will output an ebook in all manner of formats).

The easiest way I’ve found to generate ebooks though is, of course, feed powered:-) Sign up for an account with Feedbooks, click on the news icon (err…?!) and then add a feed (err…?!)

(Okay, so the interface is a little hard to navigate at times… No big obvious way to “Add feed here”, for example, that uses a version of the feed icon as huge visual cue, but maybe that’ll come…)

Getting the ebook in Stanza on the iPod Touch/iPhone is also a little clunky at the the moment, although once it’s there it works really well. Whilst there is a route directly to Feedbooks from the app (as well as feed powered ebooks, Feedbooks also acts as a repository for a reasonable selection of free ebooks taht can be pulled into the iPhione Stanza app quite easily), the only way I could find to view my RSS powered feedbooks was to enter the URL; and on the iPod, the feedbook URLs were hard to find: logging in to my account on the Feedbooks site and clicking the ebook link just gave an error as the iPod tried to open a document format it couldn’t handle – and Safari wouldn’t show me the URL in the address bar (it redirected somewhere).

Anyway, user interface issues aside, the route to ebookdom for the OpenLearn materials is essentially a straightforward one – grab a unit content RSS feed, paste it into Feedbooks to generate an ePub book, and then view it in Stanza. The Feedbooks folks are working on extending their API too, so hopefully better integration within Stanza should be coming along shortly.

Once the feedbook has been synched to the Stanza iPhod app, it stays there – no further internet connection required. One neat feature of the app is that each book in your collection is bookmarked at the place you left off reading it, so you could have several OpenLearn units on the go at the same time, accessing them all offline, and being able to click back to exactly the point where you left it.

At the moment the ebooks that Feedbooks generates don’t contain images, so it might not be appropriate to try to read every OpenLearn unit as a Feedbooks ebook. There are also issues where units refer out to additional resources – external readings in the form of linked PDFs, or audio and video assets, but for simple text dominated units, the process works really well.

(I did wonder if Feedbooks replaced images from the OpenLearn units with their alt text, or transclusion of linked to longdesc descriptions, but apparently not. No matter though, as it seems that many OpenLearn images aren’t annotated with description text…)

If you have an iPhone or iPod Touch, and do nothing else this week, get Stanza installed and have a play with Feedbooks…

## Continous Group Exercise Feedback via Twitter?

Yesterday I took part in a session with Martin Weller and Grainne Conole pitching SocialLearn to the Library (Martin), exploring notions of a pedagogy fit for online social learning (Grainne) and idly wodering about how the Library might fit in all this, especially if it became ‘invisible’ (my bit: The Invisible Library):

As ever, the slides are pretty meaningless without me rambling over them… but to give a flavour, I first tried to set up three ideas of ‘invisibleness’:

– invisibility in everyday life (random coffee, compared to Starbucks: if the Library services were coffee, what coffee would they be, and what relationship would, err, drinkers have with them?);

– positive action, done invisibly (the elves and the shoemaker);

– and invisible theatre (actors ‘creating a scene’ as if it were real (i.e. the audience isn’t aware it’s a performance), engaging the audience, and leaving the audience to carry on participating (for real) in the scenario that was set up).

And then I rambled a bit a some webby ways that ‘library services’, or ‘information services’ might be delivered invisibly now and in the future…

After the presentations, the Library folks went into groups for an hour or so, then reported back to the whole group in a final plenary session. This sort of exercise is pretty common, I think, but it suddenly struck me that it could be far more interesting in the ‘reporter’ on each table was actually twittering during the course of the group discussion? This would serve to act as a record for each group, might allow ‘semi-permeable’ edges to group discussions (although maybe you don’t want groups to be ‘sharing’ ideas, and would let the facilitator (my experience is that there’s usually a facilitator responsible whenever there’s a small group exercise happening!) eavesdrop on every table at the same time, and maybe use that as a prompt for wandering over to any particular group to get them back on track, or encourage them to pursue a particular issue in a little more detail?