Mashup Reuse – Are You Lazy Enough?

Late on Friday night, I picked up a challenge (again on Twitter) from Scott Leslie:

After a little more probing, the problem turned out to be to do with pulling a list of URLs from a page on the Guardian blog together into a playlist: The 50 greatest arts videos on YouTube.

As Scott suggested, it would have been far more useful to provide the list as a Youtube playlist. But they didn’t… So was there an easy way of creating one?

Now it’s quite possible that there is a way to programmatically create a playlist via the Youtube gdata API, but here’s a workaround that uses a Grazr widget as a player for a list of Youtube URLs:

So let’s work back from this widget to see how it was put together.

The Grazr widget works by loading in the URL of an OPML or RSS feed, or the URL of an HTML page that contains an autodiscoverable feed:

The URL loaded into the widget is this:

http://pipes.yahoo.com/pipes/pipe.run?_id=crKH_KOc3RGjaN6YPxJ3AQ&_render=rss

If we load this URL into our normal browser, (and then maybe also “View Source” from the browser Edit menu, or “Page Source” from the browser View menu) this is what the Grazr widget is consuming:

If you know anything about Grazr widgets, then you’ll maybe know that if the feed contains a media enclosure, Grazr will try to embed it in an appropriate player…

So where is the feed the Grazr widget is feeding on actually come from? The answer, a Yahoo pipe. This pipe in fact:

Let’s have a look at how it works – click on the “Edit Pipe” button (or maybe Clone the pipe first to get your own copy of it – though you’ll have to log in and/or create a Pipes user account using your Yahoo account first…):

And here’s what we find:

What this pipe does is Fetch a CSV file from another URL and rename the pipe’s internal representation of the data that was loaded in from the CSV file in such a way that the pipe now represents a valid, if not too informative, RSS feed:

The Loop element is used to run another pipe function (TinyURL to full (preview) URL), whose name suggests that it returns the target (original) URL from a TinyURL:

This pipe block is actually one I created before (and remembered creating) – if you inspect the debug output of the block, you’ll see the TinyURLs have been unpacked to Youtube video page URLs.

(If you want to explore how it works, you can find it here: TInyurl to full (preview) URL.)

The final bit of the pipe renames the attribute that was added to the pipe’s internal feed representation as an enclosure.url, and then rewrites the links from URLs that point to a Youtube video (splash) page to the URL of the video asset itself (a Flash swf file).

So that’s how the pipe works – it takes a CSV file input from somewhere on the web, and generates an RSS feed with a video file enclosure that can be played in a Grazr widget.

So where does the CSV file come from? If we look at the URL that the CSV block is loading in in the pipe we can find out:

http://spreadsheets.google.com/pub?key=p1rHUqg4g421ms8AMsu2-Tw
&output=csv&gid=0&range=A30:A82

Here’s the spreadsheet itself: http://spreadsheets.google.com/ccc?key=p1rHUqg4g421ms8AMsu2-Tw

And the cells we want are cells A30:A82:

TinyURLs :-)

But where are they coming from?

Hmm, they are CONTINUEing in from cell A1:

The little orange square in the top right of cell A1 in the spreadsheet shows a formula is being used in that cell…

So let’s see what the formula is:

Here it is:
=ImportXML(
http://www.guardian.co.uk/technology/2008/aug/31/youtube.jazz”,”//a”)

I recognise that URL! ;-) So what this formula does is is load in the The 50 greatest arts videos on YouTube page from the Guardian website, and then pull out all the anchor tags – all the <a> tags… which happen to include the links to the movies which we found at cells A30:A82.

Just looking back at the original page, here’s what it looked like:

And here’s what it really looks like if we view the page source:

So to recap, what appears to be going on is this:

A Google spreadsheet loads in the Guardian web page as an XML document, and strips out the URLs. The top 50 video URLs appear in contiguous cells as TinyURLs. These are published as a CSV file and consumed by a Yahoo pipe. The pipe takes the TinyURLs in the CSV feed, creates an RSS feed from them, unpacks them to their full Youtube URL form, and adds a link to the actual Youtube video asset as a feed enclosure. The feed is then rendered in a Grazr widget that automatically loads an embedded video player when it sees the Youtube video enclosure.

So how did I put this “mashup” together?

Firstly, I looked at the original page that contained the links that Scott (remember Scott?… strains of Alice’s Restaurant etc etc ;-), and knowing that I could use a Grazr widget as a player for a feed that contained Youtube movie enclosures all I had to do was get the URLs into a feed… and so I looked at the 50 top videos for cluses as to whether the links were in a form I could do something with, maybe using the Yahoo pipes ‘impoort HTML’ page scraping block; but the page didn’t look that friendly, so then I View source‘d. And the page structure didn’t look overly helpful either; but the links were there so rather than look t the page too closely, I though (on the off chance) I’d see what they looked like if I just link scraped the page. And knowing that the Google importXML function (with the //a XPATH) is a link scraper, I gave it a try; and the TinyURLs were all blocked together, so I knew I could use ’em by publishing that block of cells via a CSV file. And I remembered I’d created a TinyURL decoder block in Yahoo pipes some time ago, and I remembered creating a Youtube enclosure pipe before too, so I could crib that as well. And so it fell together…

And it fell together because I’d built reuable bits before, and I remembered where I could find them, so l took the lazy solution and wired them up together.

And I think that’s maybe what we need more of it mashups are going to become more widely used – more laziness… Many mashups are bespoke one-off’s because it’s getting easier to build “vertical” disposable mashup solutions. But sometimes there’s even easier to put together if they’re made out of Lego… ;-)

See also: Mashup Recycling: Now this is Green IT! and The Revolution Starts (near) Here

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

6 thoughts on “Mashup Reuse – Are You Lazy Enough?”

  1. Very cool; I couldn’t figure out the first trick, the google spreadsheet one; as a generic technique, that is VERY handy, especially b/c google spreadsheets themselves can then be published in so many different formats. Well done Tony, and more important, thanks for this tutorial – I learn so much from many of your posts, but even more in a case like this where the original context of the problem was clear to me because, well, I asked for it!

  2. If a page contains a lot of embedded youtube movies, then if you load the page in to a Google spreadsheet using “=importHTML” you can get hold of the Youtube video URLs with this XPATH expression:

    //object/param[@name=’movie’]/@value

    1. Isn’t that XPath, or is SPATH something I haven’t heard about?

      [Yes – XPATH – fixed now…]

Comments are closed.

%d bloggers like this: