Feed Autodiscovery With YQL

Just a quick follow up to the post on using Beautiful Soup for RSS feed autodetection – it struck me that I should be able to do a similar thing with YQL:

YQL feedautodetect

Remember, feed autodiscovery relies on web page containing the following construction in the HTML <head> element:
<link rel=”alternate” type=”application/rss+xml” href=”FEED_URL” title=”FEED_NAME” />

So to try and autodetect the feed in a web page, we can use the following YQL statement:

select * from html where url="http://news.google.co.uk" and
xpath='//link[@rel="alternate" and @type="application/rss+xml"]'

Feed autodetection in YQL

We can then generalise this and create a query alias that allows us to pass in a URL and get any autodetected feeds back:

That is, use the query:

select * from html where url=@url and
xpath='//link[@rel="alternate" and @type="application/rss+xml"]'

We can look for atom feeds too:
select * from html where url=@url and xpath='//link[@rel="alternate" and (@type="application/rss+xml" or @type="application/atom+xml")]'

In this case, I’ve used the argument url for the original page URL, and specified the query alias feedautodetect, which means I can run a query remotely as follows:

http://query.yahooapis.com/v1/public/yql/psychemedia/feedautodetect
?url=PAGE_URL&format=json

The format=json switch forces the query to provide the response using JSON (example).

Easy:-)

PS …though of course I expect that @hapdaniel knows an ever more elegant/powerful/efficient way of doing this?;-)

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...