Academic referencing is designed, in part, to support the retrieval of material that is being referenced, as well as recognising provenance.
The following guidance, taken from the OU Library’s Academic Referencing Guidelines, is, I imagine, typical:
That page appears in an OU hosted Moodle course (OU Harvard guide to citing references) that requires authentication. So whilst stating the provenance, it won’t necessarily support the retrieval of content from that site for most people.
Where an (n.d) — no date — citation is provided, it also becomes hard for someone checking the page in the future whether or not the content has changed, and if so, which parts.
Looking at the referencing scheme for organisational websites, there’s no suggestion that authentication is required is listed in the citation (the same is true in the guidance for citing online newspaper articles).
I also didn’t see guidance offhand for how to reference pages where the page presentation is likely customised by “an algorithm” according to personal preferences or interaction history; placement of things like ads are generally dynamic, and often personalised (personalisation may be based on multiple things, such as the cookie state of the browser with which you are looking at a page, or the history of transactions (sites visited) from the IP address you are connecting to a site from).
This doesn’t matter for static content, but it does matter if you want to reference something like a screenshot / screencapture, for example showing the results of a particular search on a web search engine. In this case, adding a date and citing the page publisher (that is, the web search engine, for example) is about as good as you can get, but it misses a huge amount of context. The fact that you got extremist results might be because your web history reveals you to be a raging fanatic, and the fact that you grabbed the screenshot from the premises of your neo-extremist clubhouse just added more juice to the search. One partial solution to disabling personalisation features might be to run a search in a “private” browser session where cookies are disabled, and cite that fact, although this still won’t stop IP address profiling and browser fingerprinting.
I’ve pondered related things before, eg when asking Could Librarians Be Influential Friends? And Who Owns Your Search Persona?, as well as in a talk given 10 years ago and picked up at the time by Martin Weller on his original blog site (Your Search Is Valuable To Us; or should that be:
Weller, M. (2008) 'Your Search Is Valuable To Us' *The Ed Techie*, 9 September [Blog] Available at http://nogoodreason.typepad.co.uk/no_good_reason/2008/10/your-search-is-valuable-to-us.html (Accessed 26 September 2018).?).
Most of the time, however, web references are to static content, so what role does the Accessed on date play here? I can imagine discussions way back when, when this form was being agreed on (is there a history of the discussion that took place when formulating and adopting this form?) where someone said something like “what we need is to record the date the page was accessed on and capture it somewhere“, and then the second part of that phrase was lost or disregarded as being too “but how would we do that?”…
One of the issues we face in maintaining OU courses, where content starts being written 2 years before a course start and is expected to last for 5+ years of presentation, is maintaining the integrity of weblinks. Over that period of time, you might expect pages to change in a couple of ways, even if the URL persists and the “content” part remains largely the same:
- the page style (that is, the view as presented) may change;
- the surrounding navigation or context (for example, sidebar content) may change.
But let’s suppose we can ignore those. Instead, let’s focus on how we can try to make sure that the a student can follow a link to the resource we intend.
One of the things I remember from years ago were conversations around keeping locally archived copies of webpages and presenting those copies to students, but I’m not sure this ever happened. (Instead, there was a short of middle ground compromise of running link checkers, but I think that was just to spot 404 page not found errors rather than checking a hash made on the content you were interested in, which would be difficult.)
At one point, I religiously kept archived copies of pages I referenced in course materials so that if the page died, I could check back on my own copy to see what the sense of the page now lost was so I could find a sensible alternative, but a year or two off course production and that practice slipped.
Back to the (Accessed DATE) clause. So what? In Fragment – Virtues of a Programmer, With a Note On Web References and Broken URLs I mentioned a couple of Wikipedia bots that check link integrity on Wikipedia (see also: Internet Archive blog: More than 9 million broken links on Wikipedia are now rescued). These can perform actions like archiving web pages, checking links are still working, and changing broken links to point to an archived copy of the same link. I hinted that it would be useful if the VLE offered the same services. They don’t, at least, not going by reports from early starters to this year’s TM351 presentation who are already flagging up broken links (do we not run a link checker anymore? (I think I asked that in the Broken URLs post a year ago, too…?)
Which is where (Accessed DATE) comes in. If you do accede to that referencing convention, why not make sure that that an archived copy of that page, ideally made on that date. Someone chasing the reference can then see what you accessed, and perhaps if they are visiting the page somewhen in the future, see how the future page compares with the original. (This won’t help with authentication controlled content or personalised page content though.)
An easy way of archiving a page in a way that others can access it is to use the Internet Archive’s Wayback Machine (for example, If You See Something, Save Something – 6 Ways to Save Pages In the Wayback Machine).
From the Wayback Machine homepage, you can simply add a link to a page you want to archive:
SAVE NOW (note, this is saving a different page; I forgot to save the screenshot of the previous one, even though I had grabbed it. Oops…):
and then you have access to the archived page, on the date it was accessed:
A more useful complete citation would now be
Weller, M. (2008) 'Your Search Is Valuable To Us' *The Ed Techie*, 9 September [Blog] Available at http://nogoodreason.typepad.co.uk/no_good_reason/2008/10/your-search-is-valuable-to-us.html (Accessed 26 September 2018. Archived at https://web.archive.org/web/20180926102430/http://nogoodreason.typepad.co.uk/no_good_reason/2008/10/your-search-is-valuable-to-us.html).
Two more things…
Firstly, my original OUseful.info blog was hosted on an OU blog server; when that was decommissioned, I archived the posts on a subdomain of
.open.ac.uk I’d managed to grab. That subdomain was deleted a few months ago, taking with it the original blog archive. Step in the Wayback Machine. It didn’t have a full copy of the original blog site, but I did manage to retrieve quite a few of the pages using this wayback machine downloader using the command
wayback_machine_downloader http://blogs.open.ac.uk/Maths/ajh59 or for a slightly later archive
wayback_machine_downloader http://ouseful.open.ac.uk/blogarchive. I made the original internal URLs relative (
find . -name '*.html' | xargs perl -pi -e 's/http:\/\/blogs.open.ac.uk\/Maths\/ajh59/./g'), (or as appropriate for
http://ouseful.open.ac.uk/blogarchive), used a similar approach to remove tracking scripts from the pages, uploaded the pages to Github (psychemedia/original-ouseful-blog-archive), enabled the repo as a Github pages site, and the pages are now at https://psychemedia.github.io/original-ouseful-blog-archive/pages/ It looks like the best archive is at the UK Web Archive, but I can’t see a way of getting a bulk export from that? https://www.webarchive.org.uk/wayback/archive/20170623023358/http://ouseful.open.ac.uk/blogarchive/010828.html
Secondly, bots; VLE bots… Doing some maintenance on TM351, I notice it has callouts to other OU courses, including TU100, which has been replaced by TM111 and TM112. It would be handy to be able to automatically discover references to other courses made from within a course to support maintenance. Using some OU-XML schema markup to identify such references would be sensible? The OU-XML document source structure should provide a veritable playground for OU bots to scurry around. I wonder if there are any, and if so, what do they do?
PS via Richard Nurse, reminding me that Memento is also useful when trying to track down original content and/or retrieve content for broken link pages from the Internet Archive: UK Web Archive Mementos search and time travel.
Richard also comments that “OU modules are being web archived by OU Archive – 1st, mid point and last presentation -only have 1 in OUDA currently https://www.open.ac.uk/library/digital-archive/web/warc:11589fbe-f596-4730-ac81-98eb10d4042e … staff login only – on list to make them more widely available but prob only to staff given 3rd party rights in OU courses“. Interesting…
PPS And via Herbert Van de Sompel, a list of archives accessed via time travel, as well as a way of decorating web links to help make them a bit more resilient: Robust Links – Link Decoration.
By the by, Richard, Kevin Ashley and @cogdog/Alan also point me to the various browser extensions that make life easier adding pages to archives or digging into their history. Examples here: Memento tools. I’m not sure what advice the OU Library gives to students about things like this; certainly my experience of interactions with students, academics and editors alike around broken links suggests that not many of them are aware of the Internet Archive / UK web Archive, Wayback Machine, etc etc?
OUseful.info – where the lede is usually buried…