Archive for the ‘Tinkering’ Category
Home again after a few very enjoyable days away at IWMW2008 in Aberdeen, and I feel like I need a way of saying thank you to the web managers’ community for allowing an academic in…heh heh ;-)
So I spent half an hour or so (no… really…;-) on the train back from the airport putting together a front end for an HEI feed autodiscovery pipe that I knocked up in one of the presentations yesterday (I was giving the talk that was on at the time my full partial attention, of course ;-) that picks up on some of the conversation that was in the air at the end of the innovation competition session (I didn’t win, of course…;-(
The context is/was a comment from Mike Ellis that HEIs et al. could start opening up their data by offering RSS feeds of news releases, job/recruitment ads and event listings, because there’s no reason not to…. So my, err, gift(?!) back to the IWMW community is a little something where UK HEI web managers can proudly show off how they’ve taken up the challenge and published a whole suite of autodiscoverable RSS feeds from their home pages ;-): UK HEI autodiscoverable feeds.
(Cynics may say that the page actually names and shames sites that don’t offer any autodiscoverable feeds; I couldn’t possibly comment… ;-)
Anyway, the pipework goes like this…
First of all I grab a feed of UK HEI homepages… There isn’t an official one, of course, so as a stopgap I’ve scraped a dodgy secondary source (safe in the expectation that Mike Ellis will have an authoritative, hacked feed available from studentviews.net sometime soon…)
All that’s required then is to pull out the link in each item, that hopefully corresponds to the HEI homepage, and use that as the focus for feed autodiscovery:
Any feed URLs that are autodiscovered are then added as elaborations to the corresponding HEI feed item. Although these elaborations aren’t exposed in the RSS feed output from the pipe, they are available in the pipe’s JSON output, so the half-hour (offline) hack on the train earlier this afternoon consumes the JSON feed and gives a quick and dirty show’n’tell display of which institutions have autodiscoverable feeds on their homepage: UK HEI autodiscoverable feeds.
Looking at a couple of comments to the post Nudge: Improving Decisions About RSS Usage, (in which Brian Kelly tabulated the provision of RSS feeds from Scottish HEIs), it seems that publicly highlighting the lack of support for feed autodiscovery can encourage people to look at their pages and add the feature… So I wonder: when IWMW comes around next year, will the phrase No autodiscoverable feeds… be missing from the UK HEI autodiscoverable feeds page, possibly in part because that page exists?!
(Note that if you use this page to test a homepage you’ve added feed autodiscovery to, there is cacheing going on everywhere so you may not see any change in the display for an hour or so… I’ll try and post a cache-resistant feed autodiscovery page over the next few days; in the meantime, most browsers glow somewhere if they load an HTML page containing autodiscoverable feeds…)
In Back from Behind Enemy Lines, Without Being Autodiscovered(?!), I described a simple service that displays the autodiscoverable RSS feeds from UK HEI homepages (it went down over the weekend as the screenscraping broke, but it’s back now and some of the ‘issues’ with some of the linkscraping has been fixed ;-)
Over the weekend, I tweaked the code and created a parallel service that displays the ‘Page Not Found’ (HTML error code 404) splash page for UK HEIs using thumbnails generated using websnapr.
You can find the service here: UK HEI “Page Not Found” pages
The page takes in a list of UK HEI homepage URLs, generates a nonsense URL off each domain, and uses that nonexistent page URL as the basis for the thumbnail screenshot.
PS Brian Kelly pinged me with a note that he’s had a UK HEI 404 viewer around for just about forever… University 404 pages rolling demo… Just by the by, the script that Brian used to scroll through the pages was the inspiration for the original “deliShow” version of feedshow (about feedshow).
I haven’t posted for a few days (nothing to write about, sigh….) so here’s a cheap’n’lazy post reusing a couple of old visual demos (edupunk chatter, More Hyperbolic Tree Visualisations – delicious URL History: Users by Tag) to look at what’s happening around the use of the CCK08 tag that’s being used to annotate – in a distributed way – the Connectivism and Connective Knowledge online course…
For example, here’s a view of people who have been using the cck08 tag on delicious:
People twittering mentions of cck08:
And here’s how people have been tagging the Connectivism and Connective Knowledge course homepage on delicious (along with te people who’ve been using those tags).
The next step is to move from hierarchical info displays (such as the above) to mining networks – grous of people who are talking about the same URLs on delicious and twitter, and maybe even blogging about CCK08 too…
It’s been a long – and enjoyable – day today (err, yesterday, I forgot to post this last night!), so just a quick placeholder post, that I’ll maybe elaborate on with techie details at a later date, to show one way of making some use of the metadata that appears in the ORO/eprints resource splash pages (as described in ORO Goes Naked With New ePrints Server): a Yahoo SearchMonkey ORO augmented search result – ORO Reference Details (OUseful).
The SearchMonkey extension – which when “installed” in your Yahoo profile, will augment ORO results in organic Yahoo search listings with details about the publication the reference appears in, the full title (or at least, the first few characters of the title!), the keyowrds used to describe the reference and the first author, along with links to a BibTeX reference and the document download (I guess I could also add a link in there to a full HTML reference?)
The SearchMonkey script comes in two parts – a “service” that scrapes the page linked to from the results listing:
And a “presentation” part, that draws on the service to augment the results:
Jane’s list of “100+ (E-)Learning Professionals to follow on Twitter” (which includes yours truly, Martin and Grainne from the OpenU :-) has been doing the rounds today, so in partial response to Tony Karrer asking “is there an equivalent to OPML import for twitter for those of us who don’t want to go through the list and add people one at a time?”, I took an alternative route to achieving a similar effect (tracking those 100+ e-learning professionals’ tweets) and put together a Yahoo pipe to produce an aggregated feed – Jane’s edutwitterers pipe…
Scrape the page and create a semblance of a feed of the edutwitterers:
Tidy the feed up a bit and make sure we only include items that link to valid twitter RSS feed URLs (note that the title could do with a little more tidying up…) – the regular expression for the link creates the feed URL for each edutwitterer:
Replace each item in the edutwitterers feed with the tweets from that person:
From the pipe, subscribe to the aggregated edutwitters’ feed.
Note, however, that the aggregated feed is a bit slow – it takes time to pull out tweets for each edutwitterer, and there is the potential for feeds being cached all over the place (by Yahoo pipes, by your browser, or whatever you happen to view the pipes output feed etc. etc.)
A more efficient route might be to produce an OPML feed containing links to each edutwitterer’s RSS feed, and then view this as a stream in a Grazr widget.
Creating the OPML file is left as an exercise for the reader (!) – if you do create one, please post a link as a comment or trackback… ;-) Here are three ways I can think of for creating such a file:
- add the feed URL for each edutwitter as a separate feed in an Grazr reading list (How to create a Grazr (OPML) reading list). If you don’t like/trust Grazr, try OPML Manager;
- build a screenscraper to scrape the usernames and then create an output OPML file automatically;
- view source of Jane’s orginal edutwitterers page, cut out the table that lists the edutwitterers, paste the text into a text editor and work some regular ecpression ‘search and replace’ magic; (if you do this, how about posting your recipe/regular expressions somewhere?!;-)
Enough – time to start reading Presentation Zen…
Having been tipped off about about a Netvibes page that the Library folks are pulling together about how to discover video resources (Finding and reusing video – 21st century librarianship in action, methinks? ;-) I thought I’d have a look at pulling together an OU iTunes OPML bundle that could be used to provide access to OU iTunes content in a Grazr widget (or my old RadiOBU OpenU ‘broadcast’ widget ;-) and maybe also act as a nice little container for viewing/listening to iTunes content on an iPhone/iPod Touch.
To find the RSS feed for a particular content area in iTunesU, navigate to the appropriate page (one with lists of actual downloadable content showing in the bottom panel), make sure you have the right tab selected, then right click on the “Subscribe” button and copy the feed/subscription URL (or is there an easier way? I’m not much of an iTunes user?):
You’ll notice in the above case that as well as the iPod video (mp4v format?), there is a straight video option (.mov???) and a transcript. I haven’t started to think about how to make hackable use of the transcripts yet, but in my dreams I’d imagine something like these Visual Interfaces for Audio/Visual Transcripts! ;-) In addition, some of the OU iTunesU content areas offer straight audio content.
Because finding the feeds is quite a chore (at least in the way I’ve described it above), I’ve put together an OU on iTunesU OPML file, that bundles together all the separate RSS from the OU on iTunesU area (to view this file in an OPML widget, try here: OU iTunesU content in a Grazr widget).
The Grazr widget lets you browse through all the feeds, and if you click on an actual content item link, iit should launch a player (most likely Quicktime). Although the Grazr widget has a nice embedded player for MP3 files, it doesn’t seem to offer an embedded player for iTunes content (or maybe I’m missing something?)
You can listen to the audio tracks well enough in an iPod Touch (so the same is presumably true for an iPhone?) using the Grazr iphone widget – but for some reason I can’t get the iPod videos to play? I’m wondering if this might be a mime-type issue? or maybe there’s some other reason?
(By the by, it looks like the content is being served from an Amazon S3 server… so has the OU bought into using S3 I wonder? :-)
For completeness, I also started to produce a handcrafted OPML bundle of OU Learn Youtube playlists, but then discovered I’d put together a little script ages ago that will create one of these automatically, and route each playlist feed through a feed augmentation pipe that adds a link to each video as a video enclosure:
Why would you want to do this? Because if there’s a video payload as an enclosure, Grazr will provide an embedded player for you… as you can see in this screenshot of Portable OUlearn Youtube playlists widget (click through the image to play with the actual widget):
These videos will play in an iPod Touch, although the interaction is a bit clunky, and actually slight cleaner using the handcrafted OPML: OUlearn youtube widget for iphone.
PS it’s also worth remembering that Grazr can embed Slideshare presentations, though I’m pretty sure these won’t work on the iPhone…
Readers of any prominent OU bloggers will probably have noticed that we appear to have something of Twitter culture developing within the organisation (e.g. “Twitter, microblogging and living in the stream“). After posting a few Thoughts on Visualising the OU Twitter Network…, I couldn’t resist the urge to have a go at drawing the OpenU twittergraph at the end of last week (although I had hoped someone else on the lazyweb might take up the challenge…) and posted a few teaser images (using broken code – oops) via twitter.
Anyway, I tidied up the code a little, and managed to produce the following images, which I have to say are spectacularly uninteresting. The membership of the ‘OU twitter network’ was identified using a combination of searches on Twitter for “open.ac.uk” and “Open University”, coupled with personal knowledge. Which is to say, the membership list may well be incomplete.
The images are based on a graph that plots who follows whom. If B follows A, then B is a follower and A is followed. In the network graphs, an arrow goes from A to B if A is followed by B (so in the network graph, the arrows point to people who follow you. The graph was constructed by making calls to the Twitter API for the names of people an individual followed, for each member of the OU Twitter network. An edge appears in the graph if a person in the OU twitter network follows another person in the OU Twitter network. (One thing I haven’t looked at is to see whether there are individuals followed by a large number of OpenU twitterers who aren’t in the OpenU twitter network… which might be interesting…)
Wordle view showing who in the network has the most followers (the word size is proportional to the number of followers, so the bigger your name, the more people there are in the OU network that follow you). As Stuart predicted, this largely looks like a function of active time spent on Twitter.
We can compare this with a Many Eyes tag cloud showing how widely people follow other members of the OU network (the word size is proportional to the number of people in the OU network that the named individual follows – so the bigger your name, the more people in the OU network you follow).
Note that it may be interesting to scale this result according to the total number of people a user is following:
@A’s OU network following density= (number of people @A follows in OU Twitter network)/(total number of people @A follows)
Similarly, maybe we could also look at:
@A’s OU network follower density= (number of people in OU Twitter network following @A)/(total number of people following @A)
(In the tag clouds, the number of people following is less than the number of people followed; I think this is in part because I couldn’t pull down the names of who a person was following for people who have protected their tweets?)
Here’s another view of people who actively follow other members of the OU twitter network:
And who’s being followed?
These treemaps uncover another layer of information if we add a search…
So for example, who is Niall following/not following?
And who’s following Niall?
I’m not sure how useful a view of the OU Twittergraph is itself, though?
Maybe more interesting is to look at the connectivity between people who have sent each other an @message. So for example, here’s how Niall has been chatting to people in the OU twitter network (a link goes from A to B if @A sends a tweet to @B):
We can also compare the ‘active connectivity’ of several people in the OU Twitter network. For example, who is Martin talking to, (and who’s talking to Martin) compared with Niall’s conversations?
As to why am I picking on Niall…? Well, apart from making the point that by engaging in ‘public’ social networks, other people can look at what you’re doing, it’s partly because thinking about this post on ‘Twitter impact factors’ kept me up all night: Twitter – how interconnected are you?.
The above is all “very interesting”, of course, but I’m not sure how valuable it is, e.g. in helping us understand how knowledge might flow around the OU Twitter network? Maybe I need to go away and start looking at some of the social network analysis literature, as well as some of the other Twitter network analysis tools, such as Twinfluence (Thanks, @Eingang:-)
PS Non S. – Many Eyes may give you a way of embedding a Wordle tagcloud…?)
Prompted in part by a presentation I have to give tomorrow as an OU eLearning community session (I hope some folks turn up – the 90 minute session on Mashing Up the PLE – RSS edition is the only reason I’m going in…), and in part by Scott Leslie’s compelling programme for a similar duration Mashing Up your own PLE session (scene scetting here: Hunting the Wily “PLE”), I started having a tinker with using Google spreadsheets as for data table screenscraping.
So here’s a quick summary of (part of) what I found I could do.
The Google spreadsheet function =importHTML(“”,”table”,N) will scrape a table from an HTML web page into a Google spreadsheet. The URL of the target web page, and the target table element both need to be in double quotes. The number N identifies the N’th table in the page (counting starts at 0) as the target table for data scraping.
So for example, have a look at the following Wikipedia page – List of largest United Kingdom settlements by population (found using a search on Wikipedia for uk city population – NOTE: URLs (web addresses) and actual data tables may have changed since this post was written, BUT you should be able to find something similar…):
Grab the URL, fire up a new Google spreadsheet, and satrt to enter the formula “=importHTML” into one of the cells:
Autocompletion works a treat, so finish off the expression:
And as if by magic, a data table appears:
All well and good – if you want to create a chart or two, why not try the Google charting tools?
Where things get really interesting, though, is when you start letting the data flow around…
So for example, if you publish the spreadsheet you can liberate the document in a variety of formats:
As well publishing the spreadsheet as an HTML page that anyone can see (and that is pulling data from the WIkipedia page, remember), you can also get access to an RSS feed of the data – and a host of other data formats:
See the “More publishing options” link? Lurvely :-)
Let’s have a bit of CSV goodness:
Why CSV? Here’s why:
(NOTE – Google spreadsheets’ CSV generator can be a bit crap at times and may require some fudging (and possibly a loss of data) in the pipe – here’s an example: When a Hack Goes Wrong… Google Spreadsheets and Yahoo Pipes.)
Unfortunately, the *’s in the element names mess things up a bit, so let’s rename them (don’t forget to dump the original row of the feed (alternatively, tweak the CSV URL so it starts with row 2); we might as well create a proper RSS feed too, by making sure we at least have a title and description element in there:
Make the description a little more palatable using a regular expression to rewrite the description element, and work some magic with the location extractor block (see how it finds the lat/long co-ordinates, and adds them to each item?;-):
Geocoding in Yahoo Pipes is done more reliably through the following trick – replace the Location Builder block with a Loop block into which you should insert a Location Builder Block
The location builder will look to a specified element for the content we wish to geocode:
The Location Builder block should be configured to output the geocoded result to the y:location element. NOTE: the geocode often assumes US town/city names. If you have a list of town names that you know come from a given country, you may wish to annotate them with a country identify before you try to geocode them. A regular expression block can do this:
This block says – in the title element, grab a copy of everything – .* – into a variable – (.*) – and then replace the contents of the title element with it’s original value – $1 – as well as “, UK” – $1, UK
Note that this regular expression block would need to be wired in BEFORE the geocoding Loop block. That is, we want the geocoder to act on a title element containing “Cambridge, UK” for example, rather than just “Cambridge”.
And to top it all off:
And for the encore? Grab the KML feed out of the pipe:
So to recap, we have scraped some data from a wikipedia page into a Google spreadsheet using the =importHTML formula, published a handful of rows from the table as CSV, consumed the CSV in a Yahoo pipe and created a geocoded KML feed from it, and then displayed it in a
PS If you “own” the web page that a table appears on, there is actually quote a lot you can do to either visualise it, or make it ‘interactive’, with very little effort – see Progressive Enhancement – Some Examples and HTML Tables and the Data Web for more details…
PPS for a version of this post in German, see: http://plerzelwupp.pl.funpic.de/wikitabellen_in_googlemaps/. (Please post a linkback if you’ve translated this post into any other languages :-)
PPPS this is neat – geocoding in Google spreadsheets itself: Geocoding by Google Spreadsheets.
PPPS Once you have scraped the data into a Google spreadsheet, it’s possible to treat it as a database using the QUERY spreadsheet function. For more on the QUERY function, see Using Google Spreadsheets Like a Database – The QUERY Formula and Creating a Winter Olympics 2010 Medal Map In Google Spreadsheets.
Following on from Data Scraping Wikipedia With Google Spreadsheets, here’s a quick post showing how you can use another handy Google spreadsheet formula:
=GoogleFinance(“symbol”, “attribute”, “start_date”, “end_date”, “interval”)
This function will pull in live – and historical – price data for a stock.
Although I noticed this formula yesterday as I was exploring the “importHTML” formula described in the Wikipedia crawling post, I didn’t have time to have a play with it; but after a quick crib of HOWTO – track stocks in Google Spreadsheets, it struck me that here was something I could have fun with in a motion chart (you know, one of those Hans Rosling Gapminder charts….;-)
NB For the “official” documentation, try here: Google Docs Help – Functions: GoogleFinance)
# Stock quotes and other data may be delayed up to 20 minutes. Information is provided “as is” and solely for informational purposes, not for trading purposes or advice. For more information, please read our Stock Quotes Disclaimer.
# You can enter 250 GoogleFinance functions in a single spreadsheet; two functions in the same cell count as two.
So – let’s have some fun…
Fire up a new Google spreadsheet from http://docs.google.com, give the spreadsheet a name, and save it, and then create a new sheet within the spreadsheet (click on the “Add new sheet” button at the bottom of the page). Select the new sheet (called “Sheet2″ probably), and in cell A1 add the following:
=GoogleFinance(“AAPL”, “all”, “1/1/2008″, “10/10/2008″, “WEEKLY”)
In case you didn’t know, AAPL is the stock ticker for Apple. (You can find stock ticker symbols for other companies on many finance sites.)
The formula will pull in the historical price data for Apple at weekly intervals from the start of 2008 to October 10th. (“all” in the formula means that all historical data will be pulled in on each sample date: opening price, closing price, high and low price, and volume.)
(If this was the live – rather than historical – data, it would be updated regularly, automatically…)
It’s easy to plot this data using a Google chart or a Google gadget:
I’m not sure a bar chart or scatter chart are quite right for historical stock pricing… so how about a line chart:
If you want to embed this image, you can:
If I was using live pricing data, I think the image would update with the data…?
Now create a few more sheets in your spreadsheet, and into cell A1 of this new sheet (sheet3) paste the following:
=GoogleFinance(“IBM”, “all”, “1/1/2008″, “10/10/2008″, “WEEKLY”)
This will pull in the historical price data for IBM.
Create two or three more new sheets, and in cell A1 of each pull in some more stock data (e.g. MSFT for Microsoft, YHOO for Yahoo, and GOOG…)
Now click on Sheet1, which should be empty. Fill in the following title cells by hand across cells A1 to G1:
Now for some magic…
Look at the URL of your spreadsheet in the browser address bar – mine’s “http://spreadsheets.google.com/ccc?key=p1rHUqg4g423seyxs3O31LA&hl=en_GB#”
That key value – the value between “key=” and “&hl=en_GB#” is important – to all intents and purposes it’s the name of the spreadsheet. Generally, the key will be the characters between “key=” and an “&” or the end of the URL; the “&” means “and here’s another variable” – it’s not part of the key.
In cell B2, enter the following:
YOURKEY is, err, your spreadsheet key… So here’s mine:
What ImportRange does is pull in a range of cell values from another spreadsheet. In this case, I’m pulling in the AAPL historical price data from Sheet2 (but using a different spreadsheet key, I could pull in data from a different spreadsheet altogether, if I’ve made that spreadsheet public).
In cell A2, enter the ticker symbol AAPL; highlight cell A2, click on the square in the bottom right hand corner and drag it down the column – when you release the mouse, the AAPL stock ticker should be added to all the cells above. Label each row from the imported data, and then in the next row, B column, import the data from Sheet 3:
These rows will need labeling “IBM”.
Import some more data if you like and then… let’s create a motion chart (info about motion charts.
Highlight all the cells in sheet1 (all the imported data from the other sheets) and then from the Insert menu select Gadget; from the Gadget panel that pops up, we want a motion chart:
Configure the chart, and have a play [DEMO]:
Enjoy (hit the play button… :-)
PS And remember, you could always export the data from the spreadsheet – though there are probably better API powered ways of getting hold of that data…
PPS and before the post-hegemonic backlash begins (the .org link is broken btw? or is that the point?;-) this post isn’t intended to show how to use the Google formula or the Motion Chart well or even appropriately, it’s just to show how to use it to get something done in a hacky mashery way, with no heed to best practice… the post should be viewed as a quick corridor conversation that demonstrates the tech in a casual way, at the end of a long day…
PS for a version of this post in French, see here: Créer un graphique de mouvement à partir de Google Docs.
Following on from a the previous posts in which I looked at pulling data tables out of wikipedia into a Google spreadsheet and accessing financial data within Google spreadsheets, in this post I’ll show you some of the things you can do with the Google spreadsheet formula “=GoogleLookup”.
If you’ve ever entered a search such as “capital of brazil“, you might have noticed that the Google web search engine will try to treat your query as a factual question, and answer it as such:
The Google spreadsheet =GoogleLookup(“entity”; “attribute”) formula provides a way of asking this sort of factual query within a Google spreadsheet:
Why’s this useful? One thing it seems to be good for is as a way of doing lots of factual look-ups in one go. How so?
Create a new Google spreadsheet and type the following into cell B2:
In cell A2 enter the word France and in cell B1 the word Population.
If you hover over cell A2, now populated with a number corresponding to the population of France, a tooltip will pop up showing the provenance of the data (i.e. where it came from), with a link to the data source.
Select cell A2 (the one containing the formula) and drag it using the small square in the bottom right hand corner of the cell across the row to cell C5. Highlight cells A2 to D2 and drag the small square marker in the bottom right hand corner of cell D2 across to cell D5. The formula will be copied into cells A2 to D5. The “$” symbols in the formula anchor various aspects of the formula so that it will look for the entity in column A of the current row, and the attribute in row 1 of the current column.
If you now add appropriate values into row 1 and column A, the GoogleLookup formula will be lookup the corresponding entity/attribute values (i.e. the corresponding facts).
Although I’ve had a quick look around for a comprehensive directory of allowable Google Lookup attribute values (and the sort of entity they should be paired with), I haven’t managed to find one. So I’ve put together a spreadsheet at http://spreadsheets.google.com/ccc?key=p1rHUqg4g421BB8IrYonydg where I’ll attempt to collate all the allowable lookup attributes I can find, along with appropriate exemplar entity values.
Using the ‘relative lookup formula’ approach shown above, the tables will show attribute values as column headings, and example entities as row headings.
At the moment I’ve started putting together Geography, Business and Chemistry lookup collections together:
In order to do a “bulk fact lookup”, enter a list of row label entity values into a spreadsheet, a set of attribute (fact) values as column headings, enter an appropriate ‘relativised GoogleLookup formula” into the first table data cell, and then drag it across the set of table data cells: voila – a bulk query made in a single shot across multiple entity and attribute values:-)
If you come across any more GoogleLookup attributes that seem to work, post a couple of example entity/attribute pairs in a comment to this post and I’ll add them to the =GoogleLookup formula attribute directory spreadsheet.
PS Looking at the sample spreadsheet just now, I notice that quite a few cells in the table may be unpopulated for particular entity values. So in a read/write web world, it should be possible to enter a value into the cell and have it logged, err, somewhere? Such as in a Google consensual fact engine?! ;-)