Around the time of the IWMW event last year, I put together a couple of quick pages that published the 404/page not found error pages for all the UK HEI homepages I could find (UK HEI “Page Not Found” Error Pages) and all the autodiscoverable RSS feeds that could be found on the HEI web homepages (Back from Behind Enemy Lines, Without Being Autodiscovered(?!)).
(Rather tellingly, some of the 404 pages are still, err, rather basic, and and many of the sites still haven’t quite got the idea of the utility of this RSS malarkey yet…)
So given that I’ve started poking around various government department websites, here’s a page that pulls back images of their 404/page not found pages, as well as links to any RSS feeds that are autodiscoverable from the department’s home web page: UK Government Department webpage auditor.
The list of department homepage URLs is scraped from the central government department sites page on the Number10 website via this Yahoo pipe- UK Gov Dept Website Audit pipe, which scrapes a list of links from the central government department sites page HTML. (If there’s a more authoritative list somewhere, feel fee to post a link in the comments to this post.)
The pipe then annotates each department item with a non-existent page link and tries to autodiscover any RSS feeds that are linked to from the department homepage.
The pipe output feed is then loaded into the auditor webpage, and pulls in a thumbail for each 404 page from the Thummer service. (I actually hit this quite hard over the weekend… Sorry, Matt… However, the thumbnail generating code is available from the site, so if anyone fancies hosting a copy an maybe setting up a tracking service so we can see how government department website 404 pages change over the coming weeks, that’d be a neat thing to do..;-)
So what sorts of feed might be good to find on a Government department website? (It’s worth remembering you can link to several.) Typical offerings include news feeds and job ads. As of a week or two ago, a quick win has become available for grabbing the job ads from the Civil Service Job Service API on the Civil Service (beta) Developers page. And if that’s too hard, Steph Gray’s Civil Service jobs, your way describes a service he knocked together in no time that will “[g]enerate an RSS feed of jobs from any specific department”: Government Jobs Direct. So for example, here’s a Jobs feed from DIUS that could be made autodiscoverable from the DIUS homepage? ;-)
I’d quite like to see a feed of current consultations (and maybe one with a full list of recent consultations, both open and closed). As a quick win, the maintainers of the department websites could even just link to a feed of consultations being held by their department from Tell Them What You Think. For example, here’s where you can grab a feed of recent consultations from the Home Office:
(Harry, have you thought of making the feeds autodiscoverable from those pages too?;-)
Okay, that’s more than enough for now – I’ve probably already done more than enough to cause a few people grief this morning ;-) Just to recap: here’s a link to the UK Government Department 404 and feed autodiscovery page.
PS digging through my Pipes collection, I found another one to do with feed autodiscovery from government websites: Autodiscover Government Consultation feeds. This uses a pipe that grabs a list of Government Department consultation websites (via TellThemWhatYouThink) and then runs those pages through a feed autodiscovery routine.
When I get a chance, I’ll add this info to the auditor web page…
8 thoughts on “404 “Page Not Found” Error pages and Autodiscoverable Feeds for UK Government Departments”
Ouch. Yes. It’s on my very long list of things I should do :(
Great work. This is going to be very useful for something I’m working on.
Do you know where I could get a list of local authorities’ names/URLs? I could scrape them from here but I’d rather not have to.
@Adrian I’ve been pondering that too… the http://www.direct.gov.uk/en/Dl1/Directories/Localcouncils/AToZOfLocalCouncils/DG_A-Z_LG is a bit of a bind, becuase you have to follow each link to get the council link.
This site: http://www.gwydir.demon.co.uk/uklocalgov/localtxt.htm is a lttle easier to hack (rather than grabbing live data, I’d be tempted to pull the HTML into text editor, than have a reg exp conversation with it (along with a bit of cutting) to produce a list I could dump into a Google or Zoho spreadsheet?
This page is maybe machine scrapeable without too much grief?
Let me know if you publish a list/data feed somewhere… :-)
I’m inclined to scrape DirectGov as I hope (perhaps naively) that it’s the most authoritative source. The index/detail page layout is probably quite tricky to handle in Pipes but it’ll be easy enough in Ruby.
Results will be online as soon as it’s done in a suitable mashable format.
“RSS usage in Whitehall websites”
Comments are closed.