According to the Alzheimer’s Society, Alzheimer’s disease – one of the most common forms of dementia – memory lapses tend to be one of the first symptoms sufferers become aware of along with “difficulty recalling recent events and learning new information”.
One of the things I have been aware of for some time but only started trying to pay more attention to recently, is how Google search increasingly responds to many of my tech related web queries with results that are dated 2013 and 2014. In addition, the majority of traffic to my blog is directed to a few posts that are themselves several years old, and that were shared – through blog links and links from other public websites at the time they were posted.
(I also note that Google web search is increasingly paranoid. If I run search limited queries, for example using the site: or inurl: or filetype: search limits, it often interrupts the search with a dialog asking if I am a robot.)
So I’m wondering, has Google web search, and the web more generally, got a problem?
Google’s early days of search, that helped promote it’s use, were characterised by a couple of things that I remember from discovering via the MetaCrawler web search engine that aggregated results from several other web search engines: one was that the results were relevant, the other was that the Google search engine results came back quickly.
Part of Google’s secret sauce at the time was PageRank, an algorithm mined the link structure of the web – how websites linked to pages on other sites – to try to work out which pages the web thought were important. The intuition was that folk link to things they are (generally) happy to recommend, or reference as in some way authoritative, and so by mining all these links you could rank pages, and websites, according to how well referenced they were, and how well regarded the linking sites were in turn.
Since its early days, Google has added many more ranking factors (that is, decision criteria it uses to decide which results to put at the top of a search results listing for a particular query) to its algorithm.
To the extent that Google can generate a significant proportion of a websites traffic from its search results pages, this led to many websites engaging in “search engine optimisation”, where they try to identify Google’s secret ranking factors and maximise their webpages’ scores against them. It also means that the structural properties of webcontent itself may be being shaped by Google, or at least web publishers’ ideas of what the Google search engine favours.
If it is true that many of the pages from 2013 or 2014 are the most appropriate web results for the technical web searches I run, this suggests that the web may have a problem: as a memory device, new memories are not being laid down (new, relevant content, is not being posted).
On the other hand, it may be that content is still being laid down (I still post regularly to this blog, for example), but it is being overlooked – or forgotten – by the gateways that mediate our access to it, which for many is Google web search.
To the extent that Google web search still uses PageRank, this may reflect a problem with the web. If other well regarded sites don’t link to a particular web page, then the link structure of the web that gives sites their authority (based, as it is in PageRank, on the quality of links incoming from other websites) is impoverished, and the old PageRank factors, deeply embedded in the structure of the web that holds over from 2013 or 2014, may dominate. Add in to the mix that one other ranking factor is likely to be the number of times a link is followed from the Google search results listing (which in turn is influenced by how high up the results the link appears), and you can start to see how a well told story, familiar in the telling, keeps on being retold: the old links dominate.
If the new “memories” are still being posted into the web, then why aren’t they appearing in the search results? Some of them may do, at least in the short term. Google’s web crawlers never sleep, so content is being regularly indexed, often shortly after it was posted. (I still remember a time when it could take days for a web page to be indexed by the search engines; nowadays it can be near instant.) If a ranking factor is recency (as well as relevance), a new piece of content can get a boost if a search to which it is relevant is executed soon after the content is posted.
Recently posted content may also get a boost from social media shares (maybe?), in which a link to a piece of content is quickly – and easily – shared via a social network. The “half-life” of links shared on such media is not very long, links typically being shared soon after they are first seen, and then forgotten about.
Such sharing causes a couple of problems when it comes to laying down structural “web memories”. For example, links shared on any given social media site may not be indexable, either usefully, or at all, either in whole, or in part, by the public web search engines, for several reasons:
- shares are often “ephemeral”, in that they may disappear (to all intents and purposes) from the social network after a short period of time. (Just try searching for a link you saw shared on Twitter three of four weeks ago, if you can remember one from that far back…).
- the sheer volume of links shared on global social networks can be overwhelming;
- the authority of people sharing links may be suspect, and the fact that links are shared by large numbers of unauthoritative actors may swamp signal in noise. (There is also the issue of the number of false actors on social media – easily created bot accounts, for example, slaved to sharing or promoting particular sorts of content.)
Whilst it’s never been easier to “share” a link, or highlight it (through “favouriting” or “liking”), the lack of effort in doing so is reflected by the lack on interest reflected in the deeper structure of the web. If you don’t add your recommendations to the structural web, or contribute content to it, it starts to atrophy. However, if you take the time to make a permanent mark in the structure of the web, by posting a blog post to a lasting, public domain, with a persistent URL that others can link to, and in turn embed your content in a contextually meaningful way by linking to other posts that you value as useful context related to the content of your own post, you can help build new memories and help the web keep digital dementia at bay.
See also: The Web Began Dying in 2014, Here’s How and The Web We Lost /via @charlesarthur . And another: Indeed, it seems that Google IS forgetting the old Web