Search Results
Another Nail in the Coffin of “Google Ground Truth”?
So we all know that the Google web search engine famously (and not just apocryphally) returns different results from it’s different national representations (google.com. google.co.uk, google.cn, etc.)…
…and hopefully we all know that if you are signed in to Google when you run a search, the default settings are such that Google will record your search and search results click-thru behaviour using Google Web History, and then in turn potentially use this intelligence to tweak your personal search results…
…and depending on how much you’ve been paying attention, you may know that Google Search Wiki lets you “customize search by re-ranking, deleting, adding, and commenting on search results. With just a single click you can move the results you like to the top or add a new site. You can also write notes attached to a particular site and remove results that you don’t feel belong. These modifications will be shown to you every time you do the same search in the future.“
Well now it seems that Google is experimenting with Google Preferred Sites, which let selected guinea pigs “set your Google Web Search preferences so that your search results match your unique tastes and needs. Fill in the sites you rely on the most, and results from your preferred sites will show up more often when they’re relevant to your search query” (see the official support page here”: Preferences: Preferred sites).
So the next time you give someone directions to a website using an instruction of the form “just google whatever, and it’ll be the first or second result”, bear in mind that it might not be…
(For what it’s worth, I run a cookie free, never logged in to Google browser to compare the results I get from my logged in’n'personalised Google results page and a raw organic” Google results page.)
A Final Nail in the Coffin of “Google Ground Truth”?
I’ve written before about how Google’s personalisation features threaten the notion of some sort of “Google Ground Truth”, the ability for two different individuals in different locations to enter the same term into the Google search box, and get back similar results (e.g. Another Nail in the Coffin of “Google Ground Truth”?).
So what threats are there? Google Personalised Search for logged in Google users is one obvious source of differences, as are regional differences from the different national search engines (e.g. google.ca versus google.co.uk).
With more and more browsers become location aware, I wonder whether we will increasingly see regional, or even hyperlocal, differences in standard web search based on browser location (something that presumably already exists in the local search engines).
Social signals (links from your friends or amplified by them) and real time signals also act as potential sources of difference for personalised ranking factors.
And for users engaged in a search session, the ranking of results you see in the third search in a session may even be influenced by the terms (and results you clicked on?!) in the first or second queries of that session.
Anyway, it seems that as of the weekend, there is another threat – perhaps a final threat – to that notion: Personalized Search for everyone:
Previously, we only offered Personalized Search for signed-in users, and only when they had Web History enabled on their Google Accounts. What we’re doing today is expanding Personalized Search so that we can provide it to signed-out users as well. This addition enables us to customize search results for you based upon 180 days of search activity linked to an anonymous cookie in your browser. It’s completely separate from your Google Account and Web History (which are only available to signed-in users). You’ll know when we customize results because a “View customizations” link will appear on the top right of the search results page. Clicking the link will let you see how we’ve customized your results and also let you turn off this type of customization.
Chris Lott also made a very perceptive comment:
PS It also looks like Google are looking for even more traffic data to help feed their stats collection’n'analysis engines: Introducing Google Public DNS
PPS it seems that Google just announced real time search results integration into the Google homepage. It’s still rolling out, but here’s a preview of what the integration looks like:
Read more at Relevance meets the real-time web. Exciting times…
PPPS Seems like there’s no global, or necessarily even national, ground truth in Google Suggest results either: Google localised Suggest…
Filter Bubbles, Google Ground Truth and Twitter EchoChambers
As the focus for this week’s episode [airs live Tues 21/6/11 at 19.32 UK time, or catch it via the podcast] in the OU co-produced season of programmes on openness with Click (radio), the BBC World Service radio programme formerly known as Digital Planet, we’re looking at one or two notions of diversity…
If you’re a follower of pop technology, over the last week or two you will probably have already come across Eli Pariser’s new book, The Filter Bubble: What The Internet Is Hiding From You, or his TED Talk on the subject:
Eli Pariser, :The Filter Bubble”, TED Talks
It could be argued that this is the Filter Bubble in action… how likely is it, for example, that a randomly selected person on the street would have heard of this book?
To support the programme, presenter Gareth Mitchell has been running an informal experiment on the programmes Facebook page: Help us with our web personalisation experiment!! The idea? To see what effect changing personalisation settings on Google has on a Google search for the word “Platform”. (You can see results of the experiment from Click listeners around the world on the Facebook group wall… Maybe you’d like to contribute too?)
It might surprise you to learn that Google results pages – even for the same search word – do not necessarily always give the same results, something I’ve jokingly referred to previously as “the end of Google Ground Truth”, but is there maybe a benefit to having very specifically focussed web searches (that is, very specific filter bubbles)? I think in certain circumstances there may well be…
Take education, or research, for example. Sometimes, we want to get the right answer to a particular question. In times gone by, we might have asked a librarian for help, if not to such a particular book or reference source, at least to help us find one that might be appropriate for our needs. Nowadays, it’s often easier to turn to a web search engine than it is to find a librarian, but there are risks in doing that: after all, no-one really knows what secret sauce is used in the Google search ranking algorithm that determines which results get placed where in response to a particular search request. The results we get may be diverse in the sense that they are ranked in part by the behaviour of millions of other search engine users, but from that diversity do we just get – noise?
As part of the web personalisation/search experiment, we found that for many people, the effects of changing personalisation settings had no noticeable effect on the first page of results returned for a search on the word “platform”. But for some people, there were differences… From my own experience of making dozens of technology (and Formula One!) related searches a day, the results I get back for those topics hen I’m logged in to Google are very different to when I have disabled the personalised reslults. As far as my job goes, I have a supercharged version of Google that is tuned to return particular sorts of results – code snippets, results from sources I trust, and so on. In certain respects, the filter bubble is akin to my own personal librarian. In this particular case, the filter bubble (I believe), works to my benefit.
Indeed, I’ve even wondered before whether a “trained” Google account might actually be a valuable commodity: Could Librarians Be Influential Friends? And Who Owns Your Search Persona?. Being able to be an effective searcher requires several skills, including the phrasing of the search query itself, the ability to skim results and look for signals that suggest a result is reliable, and the ability to refine queries. (For a quick – and free – mini-course on how to improve your searching, check out the OU Library’s Safari course.) But I think it will increasingly rely on personalisation features…which means you need to have some idea about how the personalisation works in order to make the most of its benefits and mitigate the risks.
To take a silly example: if Google search results are in part influenced by the links you or your friends share on Twitter, and you follow hundreds of spam accounts, you might rightly expect your Google results to be filled with spam (because your friends have recommended them, and you trust your friends, right? That’s one of the key principles of why social search is deemed to be attractive.)
As well as the content we discover through search engines, content discovered through social networks is becoming of increasing importance. Something I’ve been looking at for some time is the structure of social networks on Twitter, in part as a “self-reflection” tool to help us see where we might be situated in a professional social sense based on the people we follow and who follow us. Of course, this can sometimes lead to incestuous behaviour, where the only people talking about a subject are people who know each other.
For example, when I looked at the connection of people chatting on twitter about Adam Curtis’ All Watched Over By Machines of Loving Grace documentary, I was surpised to see it defined a large part of the UK’s “technology scene” that I am familiar with from my own echochamber…
So what do I mean by echochamber? In the case of Twitter, I take it to refer to a group of people chatting around a topic (as for example, identified by a hashtag) who are tightly connected in a social sense because they all follow one another anyway… (To see an example of this, for a previous OU/Click episode, I posted a simple application (it’s still there), to show the extent to which people who had recently used the #bbcClickRadio hashtag on Twitter were connected.)
As far as diversity goes, if you follow people who only follow each other, then it might be that the only ideas you come across are ideas that keep getting recycled by the same few people… Or it might be the case that a highly connected group of people shows a well defined special interest group on a particular topic….
To get a feel for what we can learn about our own filter bubbles in Twitterspace, I had a quick look at Gareth Mitchell’s context (@garethm on Twitter). One of the dangers of using public apps is that anyone can do this sort of analysis of course, but the ethics around my using Gareth as a guinea pig in this example is maybe the topic of another programme…!
So, to start with, let’s see how tightly connected Gareth’s Twitter friends are (that is, to what extent do the people Gareth follows on Twitter follow each other?):
The social graph showing how @garethm’s friends follow each other
The nodes represent people Gareth follows, and they have been organised into coloured groups based on a social network analysis measure that tries to identify groups of tightly interconnected individuals. The nodes are sized according to a metric known as “Authority”, which reflects the extent to which people are followed by other members of the network.
A crude first glance at the graph suggests a technology (purple) and science (fluorine-y yellowy green) cluster to me, but Gareth might be able to label those groups differently.
Something else I’ve started to explore is the extent to which other people might see us on Twitter. One way of doing this is to look at who follows you; another is to have a peek at what lists you’ve been included on, along with who else is on those lists. Here’s a snapshot of some of the lists (that actually have subscribers!) that Gareth is listed on:
The flowers are separate lists. People who are on several lists are caught on the spiderweb threads connecting the list flowers… In a sense, the lists are filter bubbles defined by other people into which Gareth has been placed. To the left in the image above, we see there are a few lists that appear to share quite a few members: convergent filters?!
In order to try to looking outside these filter bubbles, we can get an overview of the people that Gareth’s friends follow that Gareth doesn’t follow (these are the people Gareth is likely to encounter via retweets from his friends):

Who @garethm’s friends follow that @garethm doesn’t follow…
My original inspiration for this was to see whether or not this group of people would make sense as recommendations for who to follow, but if we look at the most highly followed people, we see this may not actually make sense (unless you want to follow celebrities!;-)

Popular friends of Gareth’s that he doesn’t follow…
By way of a passing observation, it’s also worth noting that the approach I have taken to constructing the “my friends friends who aren’t my friends” graph tends to place “me” at the centre of the universe, surrounded by folk who are a just a a friend of a friend away…
For extended interviews and additional material relating to the OU/Click series on openness, make sure you visit Click (#bbcClickRadio) on OpenLearn.
Google Impact…? The “Google Suggest” Factor
So apparently, OU promotion process means I get feedback on my promotion case before it goes to the full committee… Here’s what I need to address:
- put references in proper OU CV style;
- don’t write career history or list stuff done in the promotion case, instead list impact and significant contributions; [but for a case based around digital engagement, that can be hard to judge…? I wonder whether the ability to drive traffic to the OU would count? I wonder if I could create a traffic blip on an OU web page anyway? If you fancy taking part in an ad hoc, not really experiment to see if I can drive traffic to the OU, please click through here: OU Accreditations and Partnerships…)
What other sorts of impact are there? Eponymous laws? Google impact… Hmmm…. How about a Google Suggest factor…?
Let’s see… (whilst I made this searches in a browser that wasn’t logged in to Google, and had cookies cleaned, I’m not suggesting any Google ground truth in these “results”.)
(the actor/voiceover artist isn’t me… ;-)
So what ingredients might go into a “Google Suggest” Impact Factor?
Number of correct mentions? Number of incorrect mentions? Explicit association with host university, or subject area?
And what might a Google Suggest Factor measure? Personal discoverability? Personal associations? Personal specialism areas?
One thing I didn’t manage to do was find any phrases that autosuggested a name from a term in the following way:
i.e. term firstname surname
So what does Google Suggest think about you?! ;-)
Related, in a roundabout sort of way: Where is the Open University Homepage?
Could Librarians Be Influential Friends? And Who Owns Your Search Persona?
Every so often, I’ve posted about the erosion of a universal Google ground truth as Google rolls out personalisation features that tweak the ranking of search results presented to you based on what Google knows about you. So with a recent announcement from Bing about its search integration with Facebook, I started wondering: could academic subject librarians (in a professional capacity) start to influence the search results of their charges (students, researchers, academics), simply by developing a strong persona as seen by the search engines, and friending their patrons in a public way also visible to the search engines.
So what exactly did Bing announce? Search Engine Land’s Danny Sullivan described it in Bing, Now With Extra Facebook: See What Your Friends Like & People Search Results as follows: “Bing is now making use of it to show new “Liked By Your Friends” matches and Facebook-powered people search results.” Liked results (when they appear), are currenlty presented in a specially marked out “liked by your Facebook friends” listing (Danny’s post shows some screenshot examples). However:
[o]utside the Liked Results, Facebook’s data is not being used to reshape the “regular” results, the listings found from crawling the web. Rather, traditional ranking factors such as the content on the pages and how people link to them is used — similar to what Google does.
…
Like Results are also unique to each person. What I get depends upon who my friends are. Someone else, with a different set of friends, will see different links suggested.
…
One thing is certain. If you haven’t been paying attention to Facebook like buttons, get moving. There’s already some direct benefit in search, and chances are this will grow.
So, the question that immediately came to my mind was: if librarians become Facebook friends of their patrons, and start “Liking” high quality resources they find on the web, might they start influencing the results that are presented to their patrons on particular searches?”
That is: could librarians take on a role of “influential friends” in a particular topic area, much as a subject librarian helps guide a patron in a traditional library? Or how about recasting the idea of the “embedded librarian” as a librarian who is embedded in the network, and who role is essentially to provide SEO services for content they want to help their patrons discover? (This relates to the question: if discovery happens elsewhere, how can librarins influence that discovery? Is SEO of other peoples’ content in some way akin to a weak form of collection development?!)
Where else might this line of thinking take us? If the Goog can track folk signed into a Google Apps for Edu domain, such as open.ac.uk, could that network of people be used to influence search results somehow…?
Just by the by, here are a couple of other examples of how content published or curated by one person might appear in or influence* the results of a person they are socially or organisationally connected to:
- Explore Interesting, Personal Photos on Yahoo! Search describes how their “new ‘Facebook Album Search beta’ feature, [allows you to] find public albums from the friends and family you’re connected to on Facebook (after you have linked your Yahoo! and Facebook accounts)”.
- Is Google Custom Search Influencing Google Web Search? starts to consider how the curation of a custom search engine might influence the discovery or ranking of sites and pages listed the CSE in the general web search context. (Or by extension of the above, maybe CSEs curated by trusted sources in a Google Apps for Edu domain be used to provide additional ranking factors to searches run logged in members of that domain?) If CSEs do influence rankings, maybe CSE development is a form of collection development that can influence the search results of others at a distance (i.e. on Google web search?!)
*I think this is a distinction worth bearing in mind as things play out: the ability for one person to publish content that is directly favourably ranked in another person’s results, versus the ability for one person to directly influence the ranking of third party content that appears in another person’s results.
Search Histories, Personas and Profiles as Intellectual Capital
Given the above, let us suppose that an individual can gain influence over the search results of people they are connected to by virtue of the way they have “touched” the web. If we consider the actual searches made by an individual themselves, this may also have value (as for example when a search engine tunes the results it displays for you based on your persoanl search history). I’ve touched on this before, e.g. in the context of a discussion I had with Martin Weller a couple of years ago (Your search is valuable to us) that crystallised this idea out me that I keep coming back to – that your profile as a search engine user is something of value not only to an individual, but potentially also to an institution or a service. Which is to say: the combination of what a search engine knows about you (incl social circle, things you search for, click on, search history, etc etc) and how it uses that information to tweak your personal search engine ranking factors define a “search engine persona”, which is a valuable knowledge commodity.
I think this question then follows: should institutions develop role-based personas that run searches, Like things on the web and so on, that are the “property” of the institution and inhabited by individuals employed to the role (a user employed as web-embedded Science librarian must use the weblibrarian_science account for example), or the should the liking, research librarian search history and so on be carried out by individuals using their personal Google accounts? In the former case, when an indivudal leaves the role, they also leave behind the persona and the machine advantage it brings (e.g. in terms of pesonal search recommendations) they have developed.
Time was when academics used to leave behind valuable collections of books and papers (valuable in the sense of being a particular collection). We’re now getting to a stage where you if work with machines that learn from your actions, that learning is valuable. So who has a right to it? (I think it wouldnlt be too hard to push this argument into the realm of transhumanism and “downloading”?!)
PS It seems that Google+ may now be influencing personalised search results, tweaking them include public Google+ updates from members of your Google+ Circles: The latest update to Google Social Search: Public Google+ Posts
PPS related…. although this may not be a true story (/via @charlesarthur), could we expect to start seeing things like this? Bruce Willis May Sue Apple Over Right To Bequeath His iTunes Library
Invisible Library Support – Now You Can’t Afford Not to be Social?
If you live by pop tech feed or Twitter, you’ve probably heard that Google is rolling out a new style of socially powered search results. If not, or if you’re still not clear about what it entails, read Phil Bradley’s post on the matter: Why Google Search Plus is a disaster for search.
Done that? If not, why not? This post isn’t likely to make much sense at all if you don’t know the context. Here’s the link again: Why Google Search Plus is a disaster for search
So the starting point for this post is this: Google is in the process of rolling out a new web search service that (optionally) offers very personal search results that contains content from folk that Google thinks you’re associated with, and that Google is willing to show you based on license agreements and corporate politics.
Think about this for a minute…. in e the totally personalised view, folk will only see content that their friends have published or otherwise shared…
In Could Librarians Be Influential Friends?, I wondered aloud whether it made sense for librarians and other folk involved with providing support relating to resource discovery and recommendation to start a) creating social network profiles and encouraging their patrons to friend them, and b) start recommending resources using those profiles in order to start influencing the ordering/ranking of results in patrons’ search results based on those personal recommendations. The idea here was that you could start to make invisible frictionless recommendations by influencing the search engine results returned to your patrons (the results aren’t invisible because your profile picture may appear by the result showing that you recommend it. They’re frictionless in the sense that having made the original recommendation, you no longer have to do any work in trying to bring it to the attention of your patron – the search engines take care of that for you (okay, I know that’s a simplistic view;-). [Hmm.. how about referring to it as recommendation mode support?]
(Note that there is an complementary form of support to the approach which I’ve previously referred to as Invisible Library Tech Support (responsive mode support?; which I guess is also frictionless, at least from the perspective of the patron) in which librarians friend their patrons or monitor generic search terms/tags on Q&A sites and then proactively respond to requests that users post into their social networks more generally.)
With the aggressive stance Google now seems to be taking towards pushing social circle powered results, I think we need to face up to the fact – as Phil Bradley pointed out – that if librarians want to make sure they’re heard by their patrons, they’re going to need to start setting up social profiles, getting their patrons to friend them, and start making content and resource recommendations just anyway in order to make them available as resources that are indexed by patrons’ personal search engines. The same goes for publishers of OERs, academic teaching staff, and “courses”.
If we think of Google social search as searching over custom search engines bound by resources created and recommended by members of a users social circle, if you want to make (invisible) recommendations to a user via their (personalised) web search results, you’re going to need to make sure that the resources/content you want to recommend is indexed by their personal search engines. Which means: a) you need to friend them; and b) you need to share that content/those resources in that social context.
(Hmmm…this makes me think there may be something in the course custom search engine approach after all… Specifically, if the course has a social profile, and recommends the links contained within the course via that profile, they become part of the personalised search index of student’s following that course profile?)
Just by the by, as another example of Google completely messing things up at the moment, I notice that when I share links to posts on this blog via Google+, they don’t appear as trackbacks to the post in question. Which means that if someone refers to a post on this blog on Google+, I don’t know about it… whereas if they blog the link, I do…
See also my chronologically ordered posts on the eroding notion of “Google Ground Truth”.
[Invisible vs frictionless (and various notions of that word) is all getting a bit garbled; see eg @briankelly's Should Higher Education Welcome Frictionless Sharing and my comments to it for a little more on this...]
PS I’ve been getting increasingly infuriated by the clutter around, and lack of variation within, Google search results lately, so I changed my default search engine to Bing. The results are a bit all over the place compared to the Google results I tend to get, but this may be down in part to personalisation/training. I am still making occasional forays to Google, but for now, Bing is it… (because Bing is not Google…)
PPS Hah – just noticed: Google Search Plus doesn’t mean plus in the sense of search more, it means search Google+, which is less, or minus the wider world view…;-)
PPPS I keep meaning to blog this, and keep forgetting: Turn[ing] off [Google] search history personalization, in particular: “If you’ve disabled signed-out search history personalization, you’ll need to disable it again after clearing your browser cookies. Clearing your Google cookie clears your search settings, thereby turning history-based customizations back on.” WHich is to say, when you disable personalisation, you don’t disable personalisation against your Google account, you disable it only insofar as it relates to your current cookie ID?
Google Lock-In Lock-Out
As John Naughton feels obliged to remind folk every now and again, the web is not the internet. Because we all know that for many people, Facebook apparently is. Or Google is.
And as anyone following my tweets over the last year or two will know, I’ve started finding Google more and more irksome.
It’s not just that the one or two people I know who use Google Plus (Google+?) are now all but lost to me as sources of neat ideas because I don’t do Gooplus and it doesn’t do RSS…
It’s not just because Google is shutting down the Google Reader backbone that powers a lot of RSS and Atom syndication feed services (and leaves me wondering: how long is Feedburner for this world? Maybe it’s time to start moving your feeds and trying to get folk off that piece of infrastructure…)…
It’s not just that geocoding done within Fusion Tables is not exported – if you look at a KML feed from Google Fusion Tables, you’ll find there’s no lat-long data there. To get a geo-view, you need to stick in Google Fusion Tables or wire the feed into Google Earth, which will then “initiate geocoding of location descriptions while viewing [the] KML file”…
It’s not just that Google is deprecating gadgets from spreadsheets, which as Martin points out means that if I want to visualise data in a spreadsheet all I’m going to be left with is Google’s crappy charts…
It’s not just that Google moved away from using CalDav to support calendar interoperability… (announcement: “CalDAV API will become available for whitelisted developers, and will be shut down for other developers on September 16, 2013. Most developers’ use cases are handled well by Google Calendar API, which we recommend using instead.”) [UPDATE: seems Google may have had a rethink, thought I'm thinking not not really for the reason given...Making Google’s CalDAV and CardDAV APIs available for everyone]
It’s not just that Google is moving away from using the XMPP instant messaging protocol (and nor, I think, making a move towards using MQTT?)…
It’s not just that Google will be using your photos to create photos you never took and presumably offer them up via your image gallery in favour of photos it thinks aren’t up to scratch…
Though I’m sure that Google wouldn’t start pushing images in just the WebP image format so that you’d feel obliged to use Chrome…
And also in the browser, I’m sure Google wouldn’t start using Google Public DNS as a Chrome default setting. (Is the same true of Chromebook? Presumably folk connected to Google Fiber use Google Public DNS?) But does it use SPDY as a default? How about on Android?
It’s not just that Google will tag your social media posts using tags you might never use yourself, and as it does so altering the externalised memory embodied by that post…
It’s not just that as web search gets increasingly personalised and localised, we lose any sense of Google ground truth; I’m not quite sure how the info-skills trainers are going to address this when training a motley crew of different learners to discover a particular resource other than by using known-item search strategies (which sort of misses the point). Or maybe it’s right that a cohort of students should all get different results when they run ostensibly the same search?
Hmmm.. thinks: if personalised/localised search could be reduced to raw search phrase (whatever I put in the search box) plus a set of invisible search limits that reflect the personalisation/localisation tweaks applied to my search, how might my hidden/invisible search limits compare with yours?
It’s not just that Google uses tax efficient corporate structures to minimise its tax bill, because lots of companies do that…
It’s not just any one of these things, taken on its own merits… it’s all of them taken together…
“Embrace, extend, extinguish”… where have we heard that before?
Drip; drip; drip…
PS see also M. Wunsch on The Great Google Goat Rodeo
PPS Although not an open standard, I forgot this one – Google dropped support for the closed Microsoft ActiveSync protocol (see also Google Sync End of Life)











