Mashup Mayhem BCS (Glasgow Branch) Young Professionals Talk

On Monday I gave a presentation for the BCS Glasgow branch at the invite of Daniel Livingstone, who I met in the mashup mart session at the CETIS bash last year.

I’d prepared some slides – even rehearsed a couple of the mashups I was going to do – and then fell apart somewhat when the IE6 browser I was using on the lectern PC failed to play nicely with either Pageflakes or Yahoo Pipes. (I had intended to use my own laptop, but the end of the projector cable was locked away…)

“Why not use Firefox Portable?” came a cry from the floor (and I did, in the end, thanks to Daniel…). And indeed, why not? When I was in the swing of doing regular social bookmarking sessions, often in IT training suites, I always used the local machines, and I always used Portable Firefox.

But whilst I’ve started “playing safe” by uploading at least a basic version of the slides I intend to use to Slideshare before I leave home on the way to a presentation, I’ve stopped using Portable Firefox on a USB key even if I am taking the presentation off one… (There is always a risk that “proxy settings” are required when you use your own browser, of course, but a quick check beforehand usually sorts that…)

So note to self – get back in the habit of taking everything on a USB key, as well as doing the Slideshare backup, and ideally prepping links in a feed somewhere (I half did that on Monday) so they can be referred to via a live bookmark or feedshow.

Anyway, some of the feedback from the session suggested handouts would have been handy, so here are handouts of a sort – a set of repurposed slides in which I’ve taken some of the bits that hopefully worked on Monday, along with a little bit of extra visual explanation added in. The slides probably still don’t work as a standalone resource, but that’s what the talking’s for, right?!;-)

There are also some relevant URLs collected together under the glasgowbcs tag on my delicious account: http://delicious.com/psychemedia/glasgowbcs.

Referrer Traffic from Amazon – WTF?!

As it happens, I have been know to look at my blog stats from time (!), and today I noticed something odd in the referrer stats:

A referral from Amazon. WTF?

The link goes to a book detail page for a book about Wikipedia:

Scrolling down a bit, I found this:

A blog post syndicated in the product page from one of the book’s authors that linked to my post on Data Scraping Wikipedia with Google Spreadsheets.

My immediate thought – is there any way we can blog info about courses that use set textbooks back into the related Amazon product page?

Amazon “Edge Services” – Digital Manufacturing

When is a web service not a web service? When it’s an edge service, maybe?

Last night I was pondering the Amazon proposition, which at first glance broadly seems to break down into:

The retail bit splits down further: physical goods and digital downloads, shipped by Amazon; and marketplace goods, where products from other retailers are listed (using Amazon ecommerce webservices, I guess) and Amazon takes a cut from each sale.

It was while I was looking at the digital downloads that the idea of “edge services” came to mind – web services that result in physical world actions (if you’re familiar with the Terminator movies, think: “Skynet manufacturing”;-) [It seems an appropriate phrase has already been coined: direct digital manufacturing, (DDM) – “the process of going directly from an electronic digital representation of a part to the final product [for example] via additive manufacturing”. See also “Digital Manufacturing — Bridging Imagination and Manufacturing“.]

But first let’s set the scene: just what is Amazon up to in the digital download space?

Quite a lot, as it happens – here’s what they offer directly under the Amazon brand, for example (on the Amazon.com domain):
Amazon MP3 Downloads store, a DRM free music downloads site;
Amazon Video on Demand Store – for movie and TV downloads;
Amazon e-books and docs – download e-books and electronic documents (“eDocs”);
– the Kindle store. If you haven’t heard about it already, Kindle is Amazon’s consumer electronics play, an e-book reader with wi-fi connectivity and a direct line back to the Amazon store;
– and just this week (and what prompted this post initially), Amazon bought up Reflexive, a company that among other things is into the online game distribution business.

And although it doesn’t quite fit into the “digital download” space, don’t forget the person-in-the-machine product Amazon Mechanical Turk, a web service for farming out piece work to real people.

But that’s not all – here, for example, are the companies that I know about that are in the Amazon Group of Companies:
IMDb – The Internet Movie Database (which apparently is now streaming movies and TV programmes for free);
Audible – audio book downloads;
Booksurge – book printing on-demand (just by-the-by, in the UK, Amazon’s Milton Keynes fulfilment centre is about to go into the PoD business (press release);
CreateSpace: PoD plus, I guess? Create print on demand books DVDs and CDs, backed up by online audio and video distribution services.

(Amazon also own Shelfari, a site for users to organise and manage their own online bookshelves, and have a stake in LibraryThing, another service in the same vein, through the acquisition of second-hand, rare and out-of-print book retailer Abebooks.)

UPDATE: And they’ve just bought the Stanza e-book reader.

So here’s where it struck me: Amazon is increasingly capable of turning digital bits into physical stuff. This is good for warehousing, of course – the inventory in a PoD driven distribution service is blanks, not one or two copies of as many long tail books you can fit in the warehouse – though of course the actual process of PoD is possibly a huge bottleneck. And it takes Amazon from retailer, to manufacturer? Or to a retailer with an infinite inventory?

If this is part of the game plan, then maybe we can expect Amazon to buy up the following companies (or companies like them) over the next few months:
MOO.com – personalised business card printing, that’s also moving into more general card printing. Upload your photos (or import them from services like flickr) and then print ’em out… photobox does something similar, though it maybe prints onto a wider range of products than MOO currently does?
Spreadshirt – design (and sell) your own printed T-shirts;
Ponoko, or Shapeways – upload your CAD plans and let their 3D printers go to work fabricating your design;
Partybeans – personalised “candy boxes”. Put your own image on a tin containing your favourite sweets:-)

(For a few more ideas, see Money on Demand: New Revenue Streams for Online Content Publishers.)

That said, Amazon built up it’s retail operation based on reviews and recommendations (“people who bought this, also bought that”). The recommendation engine was (is) one way of surfacing long tail products to potential purchasers. And I’m not convinced that the long tail rec engine will necessarily work on ‘user-generated’ content (although maybe it will scale across to that?!). But if you run an inventoryless operation, then does it matter?! Because maybe you can resell uploaded, user-contributed content to friends and family anyway, and several sales for the price of one upload that way?

Or maybe they’ll move into franchising POD and fab machines, and scale-up manufacturing that way? One thing I keep noticing at conferences and events is that coffee increasingly comes in Starbucks labeled dispensers (Starbucks – For Business). So maybe we’ll start seeing Amazon branded POD and fab machines in our libraries, bookstores and catalogue shops? (Espresso, anyone? Blackwell brews up Espresso: “Blackwell is introducing an on-demand printer the Espresso Book Machine to its 60-store chain after signing an agreement with US owner On Demand Books.“) Also on the coffee front – brand your latte

A few further thoughts:
– if Amazon is deliberately developing a digital manufacturing capacity to supplement it’s retail operation (and find ways of reducing stock levels of “instanced products” (i.e. particular books, or particular DVDs) then is the next step moving into the design and user-contributed content business? Like photo-sharing, or video editing..? How’s the Yahoo share price today, I wonder?! ;-)
– Amazon starts (privacy restrictions allowing, and if it doesn’t already do so) to use services like Shelfari and IMDb (though its playlists) to feed its recommendation engine, and encourages the growth of consumer curated playlists; will it have another go at pushing the Your Media Library service, or will it happily exploit verticals like IMDb and Shelfari?
– will companies running on Amazon webservices that are offering “edge services” start to become acquisition targets? After all, if they’re already running on Amazon infrastructure, that makes integration easier, right? Because the Amazon website itself is built on top of those services (and is itself actually a presentation layer for lots of loosely coupled web services already?) (And if you go into conspiracy mode, was the long term plan always to use Amazon webservices as a way of fostering external business innovation that might then be bought up and rolled up into Amazon itself?!)

There’s a book in there somewhere, I think?!

PS another riff on services at the edge, AWS satellite ground stations: Instead of building your own ground station or entering in to a long-term contract, you can make use of AWS Ground Station on an as-needed, pay-as-you-go basis. You can get access to a ground station on short notice in order to handle a special event: severe weather, a natural disaster, or something more positive such as a sporting event. If you need access to a ground station on a regular basis to capture Earth observations or distribute content world-wide, you can reserve capacity ahead of time and pay even less. AWS Ground Station is a fully managed service. You don’t need to build or maintain antennas, and can focus on your work or research.

Mashup Reuse – Are You Lazy Enough?

Late on Friday night, I picked up a challenge (again on Twitter) from Scott Leslie:

After a little more probing, the problem turned out to be to do with pulling a list of URLs from a page on the Guardian blog together into a playlist: The 50 greatest arts videos on YouTube.

As Scott suggested, it would have been far more useful to provide the list as a Youtube playlist. But they didn’t… So was there an easy way of creating one?

Now it’s quite possible that there is a way to programmatically create a playlist via the Youtube gdata API, but here’s a workaround that uses a Grazr widget as a player for a list of Youtube URLs:

So let’s work back from this widget to see how it was put together.

The Grazr widget works by loading in the URL of an OPML or RSS feed, or the URL of an HTML page that contains an autodiscoverable feed:

The URL loaded into the widget is this:

http://pipes.yahoo.com/pipes/pipe.run?_id=crKH_KOc3RGjaN6YPxJ3AQ&_render=rss

If we load this URL into our normal browser, (and then maybe also “View Source” from the browser Edit menu, or “Page Source” from the browser View menu) this is what the Grazr widget is consuming:

If you know anything about Grazr widgets, then you’ll maybe know that if the feed contains a media enclosure, Grazr will try to embed it in an appropriate player…

So where is the feed the Grazr widget is feeding on actually come from? The answer, a Yahoo pipe. This pipe in fact:

Let’s have a look at how it works – click on the “Edit Pipe” button (or maybe Clone the pipe first to get your own copy of it – though you’ll have to log in and/or create a Pipes user account using your Yahoo account first…):

And here’s what we find:

What this pipe does is Fetch a CSV file from another URL and rename the pipe’s internal representation of the data that was loaded in from the CSV file in such a way that the pipe now represents a valid, if not too informative, RSS feed:

The Loop element is used to run another pipe function (TinyURL to full (preview) URL), whose name suggests that it returns the target (original) URL from a TinyURL:

This pipe block is actually one I created before (and remembered creating) – if you inspect the debug output of the block, you’ll see the TinyURLs have been unpacked to Youtube video page URLs.

(If you want to explore how it works, you can find it here: TInyurl to full (preview) URL.)

The final bit of the pipe renames the attribute that was added to the pipe’s internal feed representation as an enclosure.url, and then rewrites the links from URLs that point to a Youtube video (splash) page to the URL of the video asset itself (a Flash swf file).

So that’s how the pipe works – it takes a CSV file input from somewhere on the web, and generates an RSS feed with a video file enclosure that can be played in a Grazr widget.

So where does the CSV file come from? If we look at the URL that the CSV block is loading in in the pipe we can find out:

http://spreadsheets.google.com/pub?key=p1rHUqg4g421ms8AMsu2-Tw
&output=csv&gid=0&range=A30:A82

Here’s the spreadsheet itself: http://spreadsheets.google.com/ccc?key=p1rHUqg4g421ms8AMsu2-Tw

And the cells we want are cells A30:A82:

TinyURLs :-)

But where are they coming from?

Hmm, they are CONTINUEing in from cell A1:

The little orange square in the top right of cell A1 in the spreadsheet shows a formula is being used in that cell…

So let’s see what the formula is:

Here it is:
=ImportXML(
http://www.guardian.co.uk/technology/2008/aug/31/youtube.jazz”,”//a”)

I recognise that URL! ;-) So what this formula does is is load in the The 50 greatest arts videos on YouTube page from the Guardian website, and then pull out all the anchor tags – all the <a> tags… which happen to include the links to the movies which we found at cells A30:A82.

Just looking back at the original page, here’s what it looked like:

And here’s what it really looks like if we view the page source:

So to recap, what appears to be going on is this:

A Google spreadsheet loads in the Guardian web page as an XML document, and strips out the URLs. The top 50 video URLs appear in contiguous cells as TinyURLs. These are published as a CSV file and consumed by a Yahoo pipe. The pipe takes the TinyURLs in the CSV feed, creates an RSS feed from them, unpacks them to their full Youtube URL form, and adds a link to the actual Youtube video asset as a feed enclosure. The feed is then rendered in a Grazr widget that automatically loads an embedded video player when it sees the Youtube video enclosure.

So how did I put this “mashup” together?

Firstly, I looked at the original page that contained the links that Scott (remember Scott?… strains of Alice’s Restaurant etc etc ;-), and knowing that I could use a Grazr widget as a player for a feed that contained Youtube movie enclosures all I had to do was get the URLs into a feed… and so I looked at the 50 top videos for cluses as to whether the links were in a form I could do something with, maybe using the Yahoo pipes ‘impoort HTML’ page scraping block; but the page didn’t look that friendly, so then I View source‘d. And the page structure didn’t look overly helpful either; but the links were there so rather than look t the page too closely, I though (on the off chance) I’d see what they looked like if I just link scraped the page. And knowing that the Google importXML function (with the //a XPATH) is a link scraper, I gave it a try; and the TinyURLs were all blocked together, so I knew I could use ’em by publishing that block of cells via a CSV file. And I remembered I’d created a TinyURL decoder block in Yahoo pipes some time ago, and I remembered creating a Youtube enclosure pipe before too, so I could crib that as well. And so it fell together…

And it fell together because I’d built reuable bits before, and I remembered where I could find them, so l took the lazy solution and wired them up together.

And I think that’s maybe what we need more of it mashups are going to become more widely used – more laziness… Many mashups are bespoke one-off’s because it’s getting easier to build “vertical” disposable mashup solutions. But sometimes there’s even easier to put together if they’re made out of Lego… ;-)

See also: Mashup Recycling: Now this is Green IT! and The Revolution Starts (near) Here

Getting an RSS Feed Out of a Google Custom Search Engine (CSE)

Alan posted me a tweet earlier today asking me to prove my “genius” credentials (heh, heh;-):

As far as I know, Google CSEs don’t offer an RSS output (yet: Google websearch doesn’t either, though rumour has it that it will, soon… so maybe CSEs will open up with opensearch too?)

So here’s a workaround…

If you make a query in a Google CSE – such as the rather wonderful How Do I? instructional video CSE ;-) – you’ll notice in the URL an argument that says &cx=somEGobbleDYGookNumber234sTUfF&cof….

google cse

The characters between cx= and either the end of the URL or an ampersand (&) are the ID of the CSE. In the case of How Do I?, the ID is 009190243792682903990%3Aqppoopa3lxa – almost; the “%3A” is a safe encoding for the web of the character “:”, so the actual CSE ID is 009190243792682903990:qppoopa3lxa. But we can work round that, and work with the encoded CSE ID cut straight from the URL.

Using the Google AJAX search API, you can create a query on any CSE that will return a result using the JSON format (a javascript object that can be loaded into a web page). The Google AJAX search API documentation tells you how: construct a Google AJAX web search query using the root http://ajax.googleapis.com/ajax/services/search/web?v=1.0 and add a few extra arguments to pull in results from a particular CSE: Web Search Specific Arguments.

JSON isn’t RSS, but we can get it into RSS quite easily, using a Yahoo pipe…

Just paste in the ID of a CSE (or the whole results URL), add your query, and subscribe to the results as an RSS feed from the More Options menu:

The pipe works as follows…

First up, create a text box to let a user enter a CSE ID cut and pasted from a CSE results page URL (this should work if you paste in the whole of the URL of the results page from a query made on your CSE):

Then create the search query input box, and along with the CSE ID use it to create a URL that calls the Google AJAX API:

Grab the JSON data feed from the Google AJAX Search API and translate the results so that the pipe will output a valid RSS feed:

And there you have it – an RSS feed for a particular query made on a particular Google CSE can be obtained from the Get as RSS output on the pipe’s More Options menu.

Viewing Campaign Finance Data In a Google Spreadsheet via the New York Times Campaign Data API

It’s all very well when someone like the New York Times opens up an API to campaign finance data (announced here), but only the geeks can play, right? It’s not like they’re making the data available in a spreadsheet so that more people have a chance to try and do something with it, is it? Err, well, maybe this’ll help? A way of getting data out from the NYT API and into an online Google spreadsheet, where you can do something with it… (you can see an example here: campaign data in a Google Spreadsheet (via The New York Times Campaign Data API)).

Following up on a comment to Visualising Financial Data In a Google Spreadsheet Motion Chart by Dan Mcquillan: “FWIW, there’s a lot of campaigning potential in these tools”, I just had a quick tinker with another Google spreadsheet function – importXML – that pulls XML data into a Google spreadsheet, and in particular pointed it at the newly released New York Times Campaign Finance API.

To use this API you need to get yourself a key, which requires you to register with the New York Times, and also provide a URL for your application: create a new Google spreadsheet, and use the URL for it (it’ll look something like http://spreadsheets.google.com/ccc?key=p1rHUqg4g4223vNKsJW8GcQ&hl=en_GB) as the URL for your NYT API app.

First up, enter “API KEY” as a label in a cell – I used G22 – and paste your New York Times API Campaign Finance API key for that spreadsheet into the cell next to it – e.g. H32. (My spreadsheet was cobbled together in scratchpad mode, so the cells are all over the place!)

In cell G23 enter the label “Names URL” and in cell H23 the following formula, which constructs a URL for the “Candidate Summaries”:
=CONCATENATE(“http://api.nytimes.com/svc/elections/us/v2/president/2008/finances/totals.xml?api-key=&#8221;,H22)

[aaargghhh – crappy styylesheet… the formula is this, but all on one line:
=CONCATENATE(“http://api.nytimes.com/svc/elections/us/v2/
president/2008/finances/totals.xml?api-key=”,H22)

]

(The formula concatenates a NYT API calling URL with your API key.)

We’re now going to pull in the list of candidates: in e.g. cell A24 enter the formula:
=ImportXML(H23,”//candidate”)

H23, remember, is a reference to the cell that contains the URL for the API call that grabs all the candidate summaries. The =importXML grabs the XML result from the API call, and the “//candidate” bit looks through the XML file till it finds a “candidate” element. If you look at the XML from the webservice, you’ll see it actually contains several of these:

The spreadsheet actually handles all of these and prints out a summary table for you:

All we did was configure one cell to grab all this information, remember: cell A24 which contains =ImportXML(H23,”//candidate”).

We’re now going to pull in more detailed campaign info for each candidate using the “Candidate Details” API request, which looks something like http://api.nytimes.com/svc/elections/us/v2/president/2008/finances/candidates/romney.xml

(The (lack of) capitalisation of the surname of the candidate in the URL is not necessary – it seems to work okay with “Romney.xml” for example).

To construct the URLs for each candidate, let’s write a formula to construct the URL for one of them.

In my spreadsheet, the Candidate Summary import in cell A24 built a table that filled cells A24:E42, so we’re going to start working in additional columns alongside that table.

You’ll notice that the URL for the detailed report just requires the candidate’s surname, whereas the candidate summary provides the forename as well.

In cell F24, add the following formula: SPLIT(A24, “,”)

What this does is split the contents of cell A24 (e.g. Obama, Barack) at the comma, and populate two cells (in columns F and G) as a result:

If you highlight the cell (F24), click on the little blue square in the bottom right hand corner, and drag down, you should be able to magically copy the formula and so spilt all the candidates’ names.

Now what we need to do is construct the URL that will pull back the detailed campaign information for each candidate. We’ll build the URLs in the next free column along (column H in my spreadsheet – the split name is across columns F and G).

In cell H24, enter the following formula (all on one line – split here for convenience):

=CONCATENATE(“http://api.nytimes.com/svc/elections/us/v2/president/2008/
finances/candidates/”,F24,”.xml?api-key=”,H$22)

H$22 refers to the cell that contains your NYT API key. The $ is there to anchor the row number in the formula, so when you drag the cell to copy it, the copied formula still refers to that exact same cell.

Highlight cell H24 and drag the little blue square down to complete the table and construct URLs for each candidate.

Good stuff :-)

Now we can construct another formula to import the full campaign details for each candidate.

in cell A2 in my spreadsheet, I entered the following:

=ImportXML(H24,”//candidate”)

H24 is the URL for the candidate details API call for the first candidate (Barack Obama in my case).

Hopefully, you should get a row of data pulled into the spreadsheet:

The XML file responsible looks something like this:

Use your eyes and work out what the contents of each cell might refer to:

Click on cell A2, and drag the blue square down to A20. The contents of cell A2 will be copied in a relative way, and should now be importing detailed information for all the candidates.

Now you have the data in a form you can play with :-)

And it should update whenever the New York Times updates the data it exposes through the API.

(I would publish the spreadsheet in all its glory, but then I’d have to give away my API key… Ah, what the heck = here it is: Spreadsheet of Campaign Finances, courtesy of the New York Times. (Maybe I should read the T&C to check this is a fair use…!))

For other New York Times Campaign Data API functions, see the documentation. You should know enough now to be able to work out how to use it…

PS the “//candidate” XPATH stuff can be tricky to get your head round. If anyone wants to post as a comment a list of XPATH routines to pull out different data elements, feel free to do so :-)

If you make use of the data in any interesting ways, please link back here so we can see what you’re getting up to and how you’re using the data…

Link Love for Martin – “I Heart Twitter” Video

Just released, a follow up to Martin’s Edupunk response to my Changing Expectations video:

A Twitter Love Song (Weller)

Now available on Youtube…

PS (Weller)? Hmmm – any relation?!

PPS So it’ll be my turn again… hmm – you’ve been practising with Camtasia, haven’t you Martin? Maybe I’ll have to see what I can do with Jing and Jumpcut or Jaycut?