Posts Tagged ‘feed autodetection’
For what it’s worth, I’ve posted a demo showing a couple of feed autodiscovery/autodetection tricks that let you autodiscover feeds in remote pages via a couple of online services: the Google feed api, and YQL (Feed Autodiscovery With YQL).
I’ve also added in a routine that uses the Google feed api to look up historical entries on an RSS feed. As soon as Google is alerted to a feed (presumably by anyone or any means), it starts cacheing entries. The historical entries API lets you grab up to 250 of the most recent entries from a feed, irrespective of how many items the feed itself currently contains…
PS Just by the by, I added a Scraperwiki view to my UK HEI autodiscovered feeds Scraperwiki. I added a little bit of logic to try to pull out feeds on a thematic basis too…
On the to do list is to create some OPML output views so you can easily subscribe to, or display, batches of the feeds in one go.
I guess I should also add a table to the scraper to start logging the number of feeds that are autodiscoverably out there over time?
It’s that time of year again when Brian’s banging on about IWMW, the Instituional[ised?] Web Managers’ Workshop, and hence that time of year again when he reminds me* about my UK HE Feed Autodiscovery app that trawls through various UK HEI home pages (the ones on .ac.uk, rather than the one you get by searching for a uni name in Google;-)
* that is, tells me the script is broken and, by implication, gently suggests that I should fix it…;-)
As ever, most universities don’t seem to be supporting autodiscoverable feeds (neither are many councils…), so here are a few thoughts about what feeds you might link to, and why…
– news feeds: the canonical example. News feeds can be used to pipe news around various university websites, and also syndicate content to any local press or hyperlocal news sites. If every UK HEI published a news feed that was autodiscoverable as such, it would be trivial to set up a UK universities aggregated newswire.
– research announcements: I was told that one reason for putting out press releases was simply to build up an institutional memory/archive of notable events. Many universities run research newsletters that remark on awarded grants. How about a “funded research” feed from each university detailing grant awards and other research funding. Again, at a national level, this could be aggregated to provide a research funding newswire, as well as contribtuing data to local archives of research funding success.
– jobs: if every UK HEI published a jobs/vacancies RSS feed, it would trivial to build an aggregator and let people roll their own versions of jobs.ac.uk.
– events: universities contribute a lot to local culture through public talks and exhibitions. Make it easy for the local press and hyperlocal news sites to syndicate this info, and add events to their own aggregated “what’s on” calendars. (And as well as RSS, give ‘em an iCal feed for your events.)
– recent submissions to local repository: provide a feed listing recent submissions to the local research output/paper repository (and/or maybe a feed of the most popular downloads); if local feeds are you thing, the library quite possibly makes things like recent acquisition feeds available…
– YouTube uploads: you might was well add an autodiscoverable feed to your university’s recent uploads on YouTube. If nothing else, it contributes an informal ownership link to the web for folk who care about things like that.
– your university Twitter feed: if you’ve got one. I noticed Glasgow Caledonian linked to their Twitter feed through an autodiscoverable link on their university homepage.
– tenders: there’s a whole load of work going on in gov at the moment regarding transparency as it relates to procurement and tendering. So why not get open with your procurement and tendering data, and increase the chances of SMEs finding out what you’re tendering around. If the applications have to go through a particular process, no problem: link to the appropriate landing page in each feed item.
– energy data: releasing this data may well become a requirement in the not so far off future, so why not get ahead of the game, e.g. as Lincoln are starting to do (Lincoln U energy data)? If everyone was publishing energy data feeds, I’m sure DevCSI hackday folk would quickly roll together something like the aggregating service built by college student @issyl0 out of a Rewired State hack that pulls together UK gov department energy data: GovSpark
– XCRI-CAP course marketing data feeds: JISC is giving away shed loads of cash to support this, so pull your finger out and get the thing published.
– location data: got a KML feed yet? If not, why not? e.g. Innovations in Campus Mapping
PS the backend of my RSS feed autodiscovery app (founded: 2008) is a Yahoo pipe. Just because, I thought I’d take half an hour out to try and build something related on Scraperwiki. The code is here: UK University Autodiscoverable RSS feeds. Please feel free to improve or, fork it, etc. University homepage URLs are identified by scraping a page on the Universities UK website, but I probably should use a feed from the JISC Monitoring Unit (e.g. getting UK University location/contact data).
PPS this could be handy for some folk – the code that runs the talks@cam events site: http://source.caret.cam.ac.uk/svn/projects/talks.cam/. (Thanks Laura:-) – does it do feeds nicely now?! Related: Keeping Up With Events, a quickly hacked app from my Arcadia project that (used to) aggregate Cambridge events feeds.)
Just a quick follow up to the post on using Beautiful Soup for RSS feed autodetection – it struck me that I should be able to do a similar thing with YQL:
Remember, feed autodiscovery relies on web page containing the following construction in the HTML <head> element:
<link rel=”alternate” type=”application/rss+xml” href=”FEED_URL” title=”FEED_NAME” />
So to try and autodetect the feed in a web page, we can use the following YQL statement:
select * from html where url="http://news.google.co.uk" and
xpath='//link[@rel="alternate" and @type="application/rss+xml"]'
We can then generalise this and create a query alias that allows us to pass in a URL and get any autodetected feeds back:
That is, use the query:
select * from html where url=@url and
xpath='//link[@rel="alternate" and @type="application/rss+xml"]'
We can look for atom feeds too:
select * from html where url=@url and xpath='//link[@rel="alternate" and (@type="application/rss+xml" or @type="application/atom+xml")]'
In this case, I’ve used the argument url for the original page URL, and specified the query alias feedautodetect, which means I can run a query remotely as follows:
The format=json switch forces the query to provide the response using JSON (example).
PS …though of course I expect that @hapdaniel knows an ever more elegant/powerful/efficient way of doing this?;-)