Fragment – Virtues of a Programmer, With a Note On Web References and Broken URLs

Ish-via @opencorporates, I came across the “Virtues of a Programmer”, referenced from a Wikipedia page, in a Nieman Lab post by Brian Boyer on Hacker Journalism 101,, and stated as follows:

  • Laziness: I will do anything to work less.
  • Impatience: The waiting, it makes me crazy.
  • Hubris: I can make this computer do anything.

I can buy into those… Whilst also knowing (from experience) that any of the above can lead to a lot of, erm, learning.

For example, whilst you might think that something is definitely worth automating:

the practical reality may turn out rather differently:

The reference has (currently) disappeared from the Wikipedia page, but we can find it in the Wikipedia page history:

Larry_Wall_-_Wikipedia_old

The date of the NiemanLab article was 

Larry_Wall__Revision_history_-_Wikipedia

So here’s one example of a linked reference to a web resource that we know is subject to change and that has a mechanism for linking to a particular instance of the page.

Academic citation guides tend to suggest that URLs are referenced along with the date that the reference was (last?) accessed by the person citing the reference, but I’m not sure that guidance is given that relates to securing the retrievability of that resource, as it was accessed, at a later date. (I used to bait librarians a lot for not getting digital in general and the web in particular. I think they still don’t…;-)

This is an issue that also hits us with course materials, when links are made to third party references by URI, rather than more indirectly via a DOI.

I’m not sure to what extent the VLE has tools for detecting link rot (certainly, they used to; now it’s more likely that we get broken link reports from students failing to access a particular resource…) or mitigating against broken links.

One of the things I’ve noticed from Wikipedia is that it has a couple of bots for helping maintain link integrity: InternetArchiveBot and Wayback Medic.

Bots help preserve link availability in several ways:

  • if a link is part of a page, that link can be submitted to an archiving site such as the Wayback machine (or if it’s a UK resource, the UK National Web Archive);
  • if a link is spotted to be broken (header / error code 404), it can be redirected to the archived link.

One of the things I think we could do in the OU is add an attribute to the OU-XML template that points to an “archive-URL”, and tie this in with service that automatically makes sure that linked pages are archived somewhere.

If a course link rots in presentation, students could be redirected to the archived link, perhaps via a splash screen (“The original resource appears to have disappeared – using the archived link”) as well as informing the course team that the original link is down.

Having access to the original copy can be really helpful when it comes to trying to find out:

  • whether a simple update to the original URL is required (for example, the page still exists in its original form, just at a new location, perhaps because of a site redesign); or,
  • whether a replacement resource needs to be found, in which case, being able to see the content of the original resource can help identify what sort of replacement resource is required.

Does that count as “digital first”, I wonder???

Author: Tony Hirst

I'm a lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

3 thoughts on “Fragment – Virtues of a Programmer, With a Note On Web References and Broken URLs”

  1. Would change ‘laziness’ to ‘intelligent idleness’ … I had a boss once who insisted that I be intelligently idle, he explained that this meant he wanted me to get a result with the minimum amount of effort so that I could get through more work.

  2. Nice to see the problem of link rot addressed again.

    I remember bleating on about it in a review I did quit a while ago for Convergence: The International Journal of Research into New Media Technologies (Vol 9, Iss 3, Sept 2003, pp. 98–100) of a book by Roy Rada “Understanding Virtual Universities”. Many of the refs in the book were given just as URLs and several had already rotted away in the two years between publication and review.

    I think my solution was to put the responsibility on the publishers to maintain a database of web refs and the source so that readers of their books / journals would never suffer a 404! Naively simplistic, of course, in its expectation that publishers would lay out the money for such an enterprise. But in a university course situation, one might reasonably expect students not to be inconvenienced by link rot and for the university/college to make sure that all the refs they supplied in their course material were ‘findable’.

    PS – the author’s name in Convergence was given as Simon Roe, some sort of printer’s error I guess, it was me wot rote it…

    1. I’m not sure what the current OU practice is? The Library used to provide a managed links service (ROUTES), although I don’t recall if that incorporated link archiving and if so, how that archived material was used?

      Certainly, for presentation courses, first we seem to hear about broken links are when students raise it in the forums, or via an AL. Which is not brilliant, though it does show at least some of them are trying to click through on a link, which is a stat we never see. (To my knowledge, we have no reports to MTs that say whether any of the links included in online course material are ever clicked at all. Which I have ranted about many times!

Comments are closed.