Course Librarians and Search Assist…

For all their success in attracting universities to adopt Google Apps (Tradition meets technology: top universities using Apps for Education), it’s not obvious to me how – or even if – Google is actually doing much around search signal detection and innovation in an educational context?

I’ve floated this a couple of times before (eg Could Librarians Be Influential Friends? And Who Owns Your Search Persona? and Integrating Course Related Search and Bookmarking?), but with yet another announcement from Google about how they’re incorporating social signals into search rankings (Hide sites from anywhere in the world: “We’ve … started incorporating data about sites people have blocked into our general search ranking algorithms to help users find more high quality sites.”), I’m going to raise it again…

To what extent are course and subject librarians setting up course/subject personas that engage in recommending and sharing high quality links in an appropriate social content, and encouraging students to follow those accounts in order to benefit from personalisation of search results based on social signals?

Furthermore, to what extent might the development of search personas represent the creation of a “scholarly agent” that can be used to offer “search assist” to followers of that agent/persona?

I don’t find it that hard to imagine myself taking a course, following the course recommender on a social network (an account that might send out course related reminders as well as relevant links), with an icon depicting my university and the associated course, that on occasion appeared to “recommend” links to me when I was searching for topics relating to my course. (In the normal scheme of things, it wouldn’t actively be recommending links to me, of course. For that, I’d need to subscribe to something like Subscribed Links, as mentioned in Integrating Course Related Search and Bookmarking?.)

University Search Engine Sitelinks and (Rich) Snippets

A long time ago, I started running with the idea that an organisation’s homepage on the web was the dominant search engine’s search results page for the most common search term associated with that organisation. So for example, the OU’s effective home page is not http://www.open.ac.uk, but the Google search results page for open university. (You can test the extent to which this claim is supported by checking out your website logs, and comparing how many direct visitors you get to the “official” homepage for your website compared to the amount of traffic referred to your site from google for common search terms used to find your site. You do know what those search terms are, right?!;-)

At one time, the results page might include several links to different institutional web pages, each listed as a separate individual search result item. Then five years or so ago, Google started introducing sitelinks into the search results page, a display technique that would include a list of several links to specific pages within the same domain within the context of a single result item, headed by the top level domain.

OU snippets

Or maybe how about the library?

OU library snippets and sitelinks

[For an overview, see Anatomy Of A Google Snippet and maybe also Meta Description Mutiny! Take Control of Your Text Snippets; if you want to take some responsibility for what appears, see the Google Webmaster Tools posts on sitelinks, Changing a site title and description and Removing snippets and Instant Preview]


Video: Matt Cutts introduces the original snippets

A recent (August 2011) update sees the Goog placing even more focus on the display of sitelinks (The evolution of sitelinks: expanded and improved).

Here are few things about this update that I think are worth noting, particularly in light of recommendations emerging from the JISC “Linking You” project, which offers best practice guidance on the design of top-level URI schemes for university websites:

Sitelinks will now be full-size links with a URL and one line of snippet text—similar to regular results—making it even easier to find the section of the site you want. We’re also increasing the maximum number of sitelinks per query from eight to 12. …

In addition, we’re making a significant improvement to our algorithms by combining sitelink ranking with regular result ranking to yield a higher-quality list of links. This reduces link duplication and creates a better organized search results page. Now, all results from the top-ranked site will be nested within the first result as sitelinks, and all results from other sites will appear below them. The number of sitelinks will also vary based on your query—for example, [museum of art nyc] shows more sitelinks than [the met] because we’re more certain you want results from http://www.metmuseum.org.

So what do we learn from this?

  • Sitelinks will now be full-size links with a URL and one line of snippet text—similar: so check your results listing and see if the “one line of snippet text” for each displayed result makes sense. (For some ideas about how to influence snippet text, see e.g. the Google Webmaster Tools links above.)
  • We’re … increasing the maximum number of sitelinks per query from eight to 12: what links would you like to appear in the sitelinks list, compared to what links actually do appear? Would consensus in how UK HEIs architect top-level URLs (as, for example, recommended by Linking You, provide uniformity of display of results in the Google SERPs space? Would consensus open Google up to discussion relating to the most effective way of displaying search results for UK HEI sitelink results?
  • we’re making a significant improvement to our algorithms by combining sitelink ranking with regular result ranking to yield a higher-quality list of links: which is to say – SEO, and URI path design, may play a role in determining what links get displayed as sitelinks.
  • This reduces link duplication and creates a better organized search results page. That is, better orgainised, as defined by Google. If you want to influence the way links to your site are displayed as sitelinks, you need to figure how – you don’t control how Google provides this top level navigation to your site as sitelinks, but you may be able to influence the display through good website design….
  • The number of sitelinks will also vary based on your query

If you have pages that mainly contain lists of items, these may also be handled differently in the context of snippets: New snippets for list pages

See also:
Rich snippets microdata – if Google handled edu microdata, what would it describe…?
– Google Webmaster tools: Rich Snippets testing tool – “enter a web page URL to see how it may appear in search results”

PS I wonder how sitelink displays interact with rich snippets…?

PPS There’s a great write up of the Linking You project that I’ve just come across here: Lend Me Your Ears Dear University Web Managers!. Go and read it…. now… Hmmm.. thinks… what would a similar exercise for local council websites look like?

Getting Library Catalogue Searches Out There…

As a long time fan of custom search engine offerings, I keep wondering why Google doesn’t seem to have much active interest in this area? Google Custom Search updates are few and far between, and typically go unreported by the tech blogs. Perhaps more surprisingly, Custom Search Engines don’t appear to have much, if any, recognition in the Google Apps for Education suite, although I think they are available with a Google Apps for education ID?

One of the things I’ve been mulling over for years is the role that automatically created course related search engines might have to play as part of a course’s VLE offering. The search engine would offer search results either over a set of web domains linked to from the actual course materials, or simply boost results from those domains in the context of a “normal” set of search results. I’ve recently started thinking that we could also make use “promoted” results to highlight specific required or recommended readings when a particular topic is searched for (for example, Integrating Course Related Search and Bookmarking?).

During an informal “technical” meeting around three JISC funded reseource discovery projects at Cambridge yesterday (Comet, Jerome, SALDA; disclaimer: I didn’t work on any of them, but I was in the area over the weekend…), there were a few brief mentions of how various university libraries were opening up their catalogues to the search engine crawlers. So for example, if you do a site: limited search on the following paths:

– sabre.sussex.ac.uk/vufindsmu/Record/
– jerome.library.lincoln.ac.uk/catalogue/
– webcat.hud.ac.uk/catlink/bib/
– search.lib.cam.ac.uk/

you can get (partial?) search results, with a greater or lesser degree of success, from the Sussex, Lincoln, Huddersfield and Cambridge catalogues respectively.

In a Google custom search engine context, we can tunnel in a little deeper in an attempt to returns results limited to actual records:

– sabre.sussex.ac.uk/vufindsmu/Record/*/Description
– jerome.library.lincoln.ac.uk/catalogue/*
– webcat.hud.ac.uk/catlink/bib/*
– search.lib.cam.ac.uk/?itemid=*

I’ve added these to a new Catalogues tab on my UK HE library website CSE (about), so we can start to search over these catalogues using Google.

I’m not sure how useful or interesting this is at the moment, except to the library systems developers maybe, who can compare how informatively their library catalogue content is indexed and displayed in Google search results compared to other libraries… (so for example, I noticed that Google appears to be indexing the “related items” that Huddersfield publishes on a record page, meaning that if a search term appears in a related work, you might get a record that at first glance appears to have little to do with your search term, in effect providing a “reverse related work” search (that is, search on related works and return items that have the search term as the related work)).

Searching UK HE library catalogues via a Google CSE

But it’s a start… and with the addition of customised rankings, might provide a jumping off point for experimenting with novel ways of searching across UK HE catalogues using Google indexed content. (For example, a version of the CSE on the cam.ac.uk domain might boost the Cambridge results; within an institution, works related to a particular course through mention on a reading list might get a boost if a student on that course runs a search… and so on…

PS A couple of other things that may be worth pondering… could Google Apps for Education account holders be signed up to to Subscribed Links offering customised search results in the main Google domain relating to a particular course. (That is, define subscribed link profiles for a each course, and automatically add those subscriptions to an Apps for Edu user’s account based on the courses they’re taking?) Or I wonder if it would be possible to associate subscribed links to public access browsers in some way?

And how about finding some way of working with Google to open up “professional” search profiles, where for example students are provided with “read only” versions of the personalised search results of an expert in a particular area who has tuned, through personalisation, a search profile that is highly specialised in a particular subject area, e.g. as mentioned in Google Personal Custom Search Engines? (see also Could Librarians Be Influential Friends? And Who Owns Your Search Persona?).

If anyone out there is working on ways of using Google customised and personalised search as a way of delivering “improved” search results in an educational context, I’d love to hear more about what you’re getting up to…

Tweaking Ranking Factors in the Course Detective Custom Search Engine

This is a note-to-self as much as anything, relating to the Course Detective custom search engine that searches over UK HE course prospectus web pages about the extent to which we might be able to use data such as the student satisfaction survey results (as made available via Unistats) to boost search results around particular subjects in line with student satisfaction ratings, or employment prospects, for particular universities?

It’s possible to tweak rankings in Google CSEs in a variety of ways. On the one hand, we can BOOST (improve the ranking), FILTER (limit results to members of a given set) or ELIMINATE (exclude) sites appearing in the search results listing. In the simplest case, we assign a BOOST, FILTER or ELIMINATE weight to a label, and then apply labels to annotations so that they benefit from the corresponding customisation. We can further refine the effect of the modification by applying a score to each annotation. The product of score and weight values determines the overall ranking modification that is applied to each result for a label applied to an annotation.

So here’s what I’m thinking:

– define labels for things like achievement or satisfaction that apply a boost to a result;
– allow uses to apply a label to a search;
– for each university annotation (that is, the listing that identifies the path to the pages for a particular university’s online prospectus), add a label with a score modifier determined by the achievement or satisfaction value, for example, for that institution;
– for refinement labels that tweak search rankings within a particular subject area, define labels corresponding to those subject areas and apply score modifiers to each institution based on, for example, the satisfaction level with that subject area. (Note: I’m not sure if the same path can have several different annotations provided to it with different scores?

For example, an annotation file typically contains a fragment that looks like:

<Annotations>
  <Annotation about="webcast.berkeley.edu/*" score="1">
    <Label name="university_boost_highest"/>
    <Label name="lectures"/>
  </Annotation>

  <Annotation about="www.youtube.com/ucberkeley/*" score="1">
    <Label name="university_boost_highest"/>
    <Label name="videos_boost_mid"/>
    <Label name="lectures"/>
  </Annotation>
</Annotations>

I don’t know if this would work:

<Annotations>
  <Annotation about="example.com/prospectus/*" score="1">
    <Label name="chemistry"/>
  </Annotation>
<Annotations>
  <Annotation about="example.com/prospectus/*" score="0.5">
    <Label name="physics"/>
  </Annotation>
</Annotations>

That said, if the URLs are nicely structured, we might be able to do something like:

<Annotations>
  <Annotation about="example.com/prospectus/chemistry/*" score="1">
    <Label name="chemistry"/>
  </Annotation>
<Annotations>
  <Annotation about="example.com/prospectus/physics/*" score="0.5">
    <Label name="physics"/>
  </Annotation>
</Annotations>

albeit at the cost of having to do a lot more work in terms of identifying appropriate URI paths.

I also need to start thinking a bit more about how to apply refinements and ranking adjustments in course based CSEs.

Autocuration Signals in My Personalised Google Search Results

I spotted this for the first time last night:

Auto-curation signals in my search results

I had actually read the post in the Google Reader context (so Google knew that), but I wonder: if I hadn’t read the post, would it still have shown up like that?

As far as personalised ranking signals go:

– does the fact that I subscribe to the feed in Google Reader affect the rank of items from that feed in my personalised search results?
– if I have read the post in Google reader, does that also affect the rank of that specific post in my personalised search results?

If I have shared a link – through Google+, or Twitter, for example – are the ranking of those links positively affected in my personalised search results. That is, might social search actually be most useful when the Goog picks up on things I have shared myself, and then “reminds” me of them via a ranking boost in my personalised search results when I’m searching on a related topic?

Maybe tweeting and sharing into the void is actually yet another way of invisibly building search refinements into your personalised search context?

Integrating Course Related Search and Bookmarking?

Not surprisingly, I’m way behind on the two eSTEeM projects I put proposals in for – my creative juices don’t seem to have been flowing in those areas for a bit:-( – but as a marking avoidance strategy I thought I’d jot down some thoughts that have been coming to mind about how the custom search project at least might develop (eSTEeM Project: Custom Course Search Engines).

The original idea was to provide a custom search engine that indexes pages and domains that are referenced within a course in order to provide a custom search engine for that course. The OU course T151 is structured as a series of topic explorations using the structure:

– topic overview
– framing questions
– suggested resources
– my reflections on the topic, guided by the questions, drawing on the suggested resources and a critique of them

One original idea for the course was that rather than give an explicit list of suggested resources, we provide a set of links pulled in live from a predefined search query. The list would look as if it was suggested by the course team but it would actually be created dynamically. As instructors, we wouldn’t be specifying particular readings, instead we would be trusting the search algorithm to return relevant resources. (You might argue this is a neglectful approach… a more realistic model might be to have specifically recommended items as well as a dynamically created list of “Possibly related resources”.)

At this point it’s maybe worth stepping back a moment to consider what goes into producing a set of search results. Essentially, there are three key elements:

– the index, the set of content that the search engine has “searched” and from which it can return a set of results;
– the search query; this is run against the index to identify a set of candidate search results;
– a presentation algorithm that determines how to order the search results as presented to the user.

If the search engine and the presentation algorithm are fixed, then for a given set of search terms, and a given index, we can specify a search term and get a known set of results back. So in this case, we could use a fixed custom search engine, with know search terms, and return a known list of suggested readings. The search engine would provide some sort of “ground truth” – same answer for the same query, always.

If we trust the sources and the presentation algorithm, and we trust that we have written an effective search query, then if the index is not fixed, or if a personalised ranking algorithm (that we trust) is used as part of the search engine, we would potentially be returning search results that the instructor has not seen before. For example, the resources may be more recent than the last time the instructor searched for resources to recommend, or they better fit the personalisation criteria for the user under the ranking algorithm used as part of the presentation algorithm.

In this case, the instructor is not saying: “I want you to read this particular resource”. They are saying something more along the lines of: “these are potentially the sorts of resource I might suggest you look at in order to study this topic”. (Lots of caveats in there… If you believe in content led instruction, with students referring to to specifically referenced resources, I imagine that you would totally rile against this approach!)

At times, we might want to explicitly recommend one or two particular resources, but also open up some other recommendations to “the algorithm”. It struck me that it might be possible to do this within the context of a Google Custom Search approach using “special results” (e.g. Google CSEs: Creating Special Results/Promotions).

For example, Google CSEs support:

promotions: “A promotion is simply an association between a pre-defined set of query terms and a link to a webpage. When a user types a search that exactly matches one of your query terms, the promotion appears at the top of the page.” So by using a specific search term, we can force the return of a specific result as the top result. In the context of a topic exploration, we could thus prepopulate the search form of an embedded search engine with a known search phrase, and use a promotion to force a “recommend reading” link to the top of the results listing.

Promotion links are stored in a separate config file and have the form:

<Promotions>
  <Promotion id="1"
    queries="wanderer, the wanderer" 
    title="Groo the Wanderer" 
    url="http://www.groo.com/"
    description="Comedy. American series illustrated by Sergio Aragonés."
    image_url="http://www.newsfromme.com/images5/groo11.jpg" />
</Promotions>

subscribed links: subscribed links allow you to return results in a specific format (such as text, or text and a link, or other structured results) based on a perfect match with a specific search term. In a sense, subscribed links represent a generalised version of promotions. Subscribed links are also available to users outside the context of a CSE. If a user subscribes to a particular subscribed link file, then if there is an exact match against of one the search phrases in the subscribed link file and a search phrase used by a subscribing user on Google web search (i.e. on google.com or google.co.uk), the subscribed link will be returned in the results listing.

In the simplest case, subscribed links can be defined at the individual link level:

Google subscribed link definition

If your search term is an exact match for the term in the subscribed link definition, it will appear in the main search results page:

Google subscribed links

It’s also possible to define subscribed link definition files, either as simple tab separated docs or RSS/Atom feeds, or using a more formal XML document structure. One advantage of creating subscribed links files for use within in custom search engine is that users (i.e. students) can subscribe to them as a way of augmenting or enhancing their own Google search results. This has the joint effect of increasing the surface area of the course, so that course related recommendations can be pushed to the student for relevant queries made through the Google search engine, as well as providing a legacy offering: students can potentially take away a subscription when then finish the course to continue to receive “academically credible” results on relevant search topics. (By issuing subscription links on a per course presentation basis (or even on a personalised, unique feed per student basis), feeds to course alumni might be customised, or example by removing links to subscription content (or suggesting how such content might be obtained through a subscription to the university library), or occasionally adding in advertising related links (so if a student searches using a “course” keyword, make recommendations around that via a subscribed links feed; in the limit, this could even take on the form of a personalised, subscription based advertising channel).

Another way in which “recommended” links can be boosted in a custom search result listing is through boosting search results via their ranking factors (Changing the Ranking of Your Search Results).

In the case of both subscribed links and boosted search results, it’s possible to create a configuration file dynamically. Where students are bookmarking search results relating to a course, it would therefore be possible to feed these into a course related custom search engine definition file, or a subscribed link file. If subscribed link files are maintained at a personal level, it would also be possible to integrate a student’s bookmarked links in to their subscribed links feed, at least for use on Google websearch (probably not in the custom search engine context?). This would support rediscovery of content bookmarked by the student through subscribed link recommendations.

Just by the by, a PR mailing in my inbox today threw up another example of how search and bookmarking might be brought more closely together: SearchTeam (screenshots [pdf]).

The model here is based around defining search contexts that one or more users can contribute to, and then saving out results from a search into a topic based bookmark area. The video suggests that particular results can also be blocked (and maybe boosted? The greyed plus on the left hand side?) – presumably this is a persistent feature, so if you, or another member of your “search team” runs the search, the blocked result doesn’t appear? (Is a list of blocked results and their corresponding search terms available anywhere I wonder?) In common with the clipping blog model used by sites such as posterous, it’s possible to post links and short blog posts into a topic area. Commenting is also supported.

To say that search was Google’s initial big idea, it’s surprising that it seems to play no significant role in Google’s offerings for education through Google Apps. Thinking back, search related topics were what got me into blogging and quick hacks; maybe it’s time to return to that area…

Google Playing the SEO Link Building Game to Drive Uptake Of Google Profiles?

As you’re probably aware by now, yesterday Google announced its Google+ social network. A key part of every social network is a user’s personal profile page, the “social object” that other people can actually connect to.

Google has offered personal profile pages for some time, (here’s my rather basic ), but they’ve never really been a part of anything, and they’re not really linkable to – which means there’s little reason for PageRank based search algorithms such as Google’s to return Google Profile pages in the top results for you if anyone ever searches for you.

(PageRank is the algorithm that gave Google its early edge in the search engine wars; links from one page to another count as “votes” regarding the quality of the page that is linked to. Crudely put, if people link to you, those links contribute to your PageRank and you’re more likely to make it to the top of a search results page.)

Until now, that is (or at least, until a couple of weeks ago… I missed this announcement at the time it was made…): Authorship markup and web search, a technique for “supporting markup that enables websites to publicly link within their site from content to author pages”.

The method is described as follows:

To identify the author of an article, Google checks for a connection between the content page (such as an article), an author page, and a Google Profile.

A content page can be any piece of content with an author: a news article, blog post, short story …
An author page is a page about a specific author, on the same domain as the content page.
A Google Profile is Google’s version of an author page. It’s how you present yourself to the web and to Google.

In confirming authorship, Google looks for:

Links from the content page to the author page (if the path of links continues to a Google Profile, we can also show Profile information in search results)
A path of links back from your Google Profile to your content.
These reciprocal links are important: without them, anyone could attribute content to you, or you could take credit for any content on the web.
….
The rel=”author” link indicates the author of an article [so for example: <a rel=”author” href=”https://profiles.google.com/tony.hirst/”>Google Profile: Tony Hirst</a>]

Source: Authorship

Here’s why you might be tempted to do this…:

Many of you create great content on the web, and we work hard to make that content discoverable on Google. Today, we will start highlighting the people creating this content in Google.com search results.

Google author identified links

As you can see …, certain results will display an author’s picture and name — derived from and linked to their Google Profile — next to their content on the Google Search results page.

Source: Highlighting content creators in search results; [my emphasis]

So… if you want to assert authorship and be recognised as the author in the Google search results, you need to start linking all your content back to your Google Profile Page…

…and so start feeding PageRank juice to your Google profile page…

…so that when folk search for you on the web, they’re more likely to see that page…

This is a harsh reading, of course: authorship can also be asserted by linking within a domain to a page that you have asserted to Google that represents you: The rel=”author” link indicates the author of an article, and can point to .. an author page on the same domain as the content page: Written by <a rel="author" href="../authors/mattcutts">Matt Cutts</a>. The author page should link to your Google Profile using rel=”me”.

(I wonder why <link rel=”author” href=”../authors/mattcutts”/> isn’t supported? Or maybe it is?)

Algorithmically, the assertion of authorship might also help in Google’s fight against spamblogs, which republish content blindly from original sources. That is, by asserting authorship of a page, if someone reposts your content, google will be able to identify you as the original author and return a link back to your page in the search results listing, rather than the republished page.

I imagine there might also be personal reputation benefits – for example, if people +1 a page you have claimed authorship of, it might give you a “Reputation Rank” boost for the subject area associated with that page?

Filter Bubbles, Google Ground Truth and Twitter EchoChambers

As the focus for this week’s episode [airs live Tues 21/6/11 at 19.32 UK time, or catch it via the podcast] in the OU co-produced season of programmes on openness with Click (radio), the BBC World Service radio programme formerly known as Digital Planet, we’re looking at one or two notions of diversity

If you’re a follower of pop technology, over the last week or two you will probably have already come across Eli Pariser’s new book, The Filter Bubble: What The Internet Is Hiding From You, or his TED Talk on the subject:


Eli Pariser, :The Filter Bubble”, TED Talks

It could be argued that this is the Filter Bubble in action… how likely is it, for example, that a randomly selected person on the street would have heard of this book?

To support the programme, presenter Gareth Mitchell has been running an informal experiment on the programmes Facebook page: Help us with our web personalisation experiment!! The idea? To see what effect changing personalisation settings on Google has on a Google search for the word “Platform”. (You can see results of the experiment from Click listeners around the world on the Facebook group wall… Maybe you’d like to contribute too?)

It might surprise you to learn that Google results pages – even for the same search word – do not necessarily always give the same results, something I’ve jokingly referred to previously as “the end of Google Ground Truth”, but is there maybe a benefit to having very specifically focussed web searches (that is, very specific filter bubbles)? I think in certain circumstances there may well be…

Take education, or research, for example. Sometimes, we want to get the right answer to a particular question. In times gone by, we might have asked a librarian for help, if not to such a particular book or reference source, at least to help us find one that might be appropriate for our needs. Nowadays, it’s often easier to turn to a web search engine than it is to find a librarian, but there are risks in doing that: after all, no-one really knows what secret sauce is used in the Google search ranking algorithm that determines which results get placed where in response to a particular search request. The results we get may be diverse in the sense that they are ranked in part by the behaviour of millions of other search engine users, but from that diversity do we just get – noise?

As part of the web personalisation/search experiment, we found that for many people, the effects of changing personalisation settings had no noticeable effect on the first page of results returned for a search on the word “platform”. But for some people, there were differences… From my own experience of making dozens of technology (and Formula One!) related searches a day, the results I get back for those topics hen I’m logged in to Google are very different to when I have disabled the personalised reslults. As far as my job goes, I have a supercharged version of Google that is tuned to return particular sorts of results – code snippets, results from sources I trust, and so on. In certain respects, the filter bubble is akin to my own personal librarian. In this particular case, the filter bubble (I believe), works to my benefit.

Indeed, I’ve even wondered before whether a “trained” Google account might actually be a valuable commodity: Could Librarians Be Influential Friends? And Who Owns Your Search Persona?. Being able to be an effective searcher requires several skills, including the phrasing of the search query itself, the ability to skim results and look for signals that suggest a result is reliable, and the ability to refine queries. (For a quick – and free – mini-course on how to improve your searching, check out the OU Library’s Safari course.) But I think it will increasingly rely on personalisation features…which means you need to have some idea about how the personalisation works in order to make the most of its benefits and mitigate the risks.

To take a silly example: if Google search results are in part influenced by the links you or your friends share on Twitter, and you follow hundreds of spam accounts, you might rightly expect your Google results to be filled with spam (because your friends have recommended them, and you trust your friends, right? That’s one of the key principles of why social search is deemed to be attractive.)

As well as the content we discover through search engines, content discovered through social networks is becoming of increasing importance. Something I’ve been looking at for some time is the structure of social networks on Twitter, in part as a “self-reflection” tool to help us see where we might be situated in a professional social sense based on the people we follow and who follow us. Of course, this can sometimes lead to incestuous behaviour, where the only people talking about a subject are people who know each other.

For example, when I looked at the connection of people chatting on twitter about Adam Curtis’ All Watched Over By Machines of Loving Grace documentary, I was surpised to see it defined a large part of the UK’s “technology scene” that I am familiar with from my own echochamber…

#awobmolg echochamber
#awobmolg echo chamber

So what do I mean by echochamber? In the case of Twitter, I take it to refer to a group of people chatting around a topic (as for example, identified by a hashtag) who are tightly connected in a social sense because they all follow one another anyway… (To see an example of this, for a previous OU/Click episode, I posted a simple application (it’s still there), to show the extent to which people who had recently used the #bbcClickRadio hashtag on Twitter were connected.)

As far as diversity goes, if you follow people who only follow each other, then it might be that the only ideas you come across are ideas that keep getting recycled by the same few people… Or it might be the case that a highly connected group of people shows a well defined special interest group on a particular topic….

To get a feel for what we can learn about our own filter bubbles in Twitterspace, I had a quick look at Gareth Mitchell’s context (@garethm on Twitter). One of the dangers of using public apps is that anyone can do this sort of analysis of course, but the ethics around my using Gareth as a guinea pig in this example is maybe the topic of another programme…!

So, to start with, let’s see how tightly connected Gareth’s Twitter friends are (that is, to what extent do the people Gareth follows on Twitter follow each other?):

@garethm Twitter friendsThe social graph showing how @garethm’s friends follow each other

The nodes represent people Gareth follows, and they have been organised into coloured groups based on a social network analysis measure that tries to identify groups of tightly interconnected individuals. The nodes are sized according to a metric known as “Authority”, which reflects the extent to which people are followed by other members of the network.

A crude first glance at the graph suggests a technology (purple) and science (fluorine-y yellowy green) cluster to me, but Gareth might be able to label those groups differently.

Something else I’ve started to explore is the extent to which other people might see us on Twitter. One way of doing this is to look at who follows you; another is to have a peek at what lists you’ve been included on, along with who else is on those lists. Here’s a snapshot of some of the lists (that actually have subscribers!) that Gareth is listed on:

@garethm listspace

The flowers are separate lists. People who are on several lists are caught on the spiderweb threads connecting the list flowers… In a sense, the lists are filter bubbles defined by other people into which Gareth has been placed. To the left in the image above, we see there are a few lists that appear to share quite a few members: convergent filters?!

In order to try to looking outside these filter bubbles, we can get an overview of the people that Gareth’s friends follow that Gareth doesn’t follow (these are the people Gareth is likely to encounter via retweets from his friends):

WHo @garethm's friends follow that he doesn't..
Who @garethm’s friends follow that @garethm doesn’t follow…

My original inspiration for this was to see whether or not this group of people would make sense as recommendations for who to follow, but if we look at the most highly followed people, we see this may not actually make sense (unless you want to follow celebrities!;-)

Recommnendations based on friends of @Garethm's friends
Popular friends of Gareth’s that he doesn’t follow…

By way of a passing observation, it’s also worth noting that the approach I have taken to constructing the “my friends friends who aren’t my friends” graph tends to place “me” at the centre of the universe, surrounded by folk who are a just a a friend of a friend away…

For extended interviews and additional material relating to the OU/Click series on openness, make sure you visit Click (#bbcClickRadio) on OpenLearn.

Twitter Makes a Move Towards Social Search… Time for some Twitter Gardening?

From the number of tweets that are starting to appear in my Google search results, it’s maybe surprising that Twitter’s own search offering has never really been the subject of much attention. A recent update sees the introduction of personalisation into the Twitter search experience, as described on the Twitter Engineering blog: The Engineering Behind Twitter’s New Search Experience.

A couple of things that jumped out at me from that report:

To support relevance filtering and personalization, we needed three types of signals:
Static signals, added at indexing time
Resonance signals, dynamically updated over time
Information about the searcher, provided at search time

At query time, a Blender server parses the user’s query and passes it along with the user’s social graph to multiple Earlybird servers. These servers use a specialized ranking function that combines relevance signals and the social graph to compute a personalized relevance score for each Tweet. The highest-ranking, most-recent Tweets are returned to the Blender, which merges and re-ranks the results before returning them to the user.

Twitter is most powerful when you personalize it by choosing interesting accounts to follow, so why shouldn’t your search results be more personalized too? They are now! Our ranking function accesses the social graph and uses knowledge about the relationship between the searcher and the author of a Tweet during ranking. Although the social graph is very large, we compress the meaningful part for each user into a Bloom filter, which gives us space-efficient constant-time set membership operations. As Earlybird scans candidate search results, it uses the presence of the Tweet’s author in the user’s social graph as a relevance signal in its ranking function.

I don’t know what the social graph includes, but if you’re an indiscriminate follower of folk on the one hand, and/or you don’t curate your followers to any significant extent (for example, blocking spambots, and not doing your twitter gardening), then your personalised search results may not be as highly tuned as they might be… (Although on the other hand, maybe the diversity of search results that might result from a very, err, diverse follower network is a Good Thing? (The tension between diversity and relevance in search results was something we were chatting over yesterday as preparation for the next OU/BBC co-produced episode of Click (BBC World Service radio)

See also: Brand Association and Your Twitter Followers, Could Librarians Be Influential Friends? And Who Owns Your Search Persona?

PS Here’s another handy tool in a search curation context that I don’t think I’ve blogged about before: trunk.ly (search over links you’ve tweeted, posted to delicious, shared on Facebook etc).

Google Correlate: What Search Terms Does Your Time Series Data Correlate With?

Just a few days over three years ago, I blogged about a site I’d put together to try to crowdsource observations about correlated searchtrends: TrendSpotting.

One thing that particularly interested me then, as it still does now, was the way that certain search trends they reveal rhythmic behaviour over the course of weeks, months or years.

At the start of this year, I revisited the topic with a post on Identifying Periodic Google Trends, Part 1: Autocorrelation (followd by Improving Autocorrelation Calculations on Google Trends Data).

Anyway today it seems that Google has cracked the scaling issues with discovering correlations between search trends (using North American search trend data), as well as opening up a service that will identify what search trends correlate most closely with your own uploaded time series data: Correlate (announcement: Mining patterns in search data with Google Correlate)

For the quick overview, check out the Google Correlate Comic.

So what’s on offer? First, enter a search term and see what it’s correlated with:

As well as the line chart, correlations can also be plotted as a scatterplot:

You can also run “spatial correlations”, though at the moment this appears to be limited to US states. (I *think* this works by looking for search terms that are popular in the requested areas and not popular in the other listed areas. To generalise this, I guess you need three things: the total list of areas that work for the spatial correlation query; the areas you want the search volume for the “to be discovered correlated phrase” to be high; the areas you want to the search volume for the “to be discovered correlated phrase” to be low?)

At this point it’s maybe worth remembering that correlation does not imply causation…

A couple of other interesting things to note: firstly, you can offset the data (so shift it a few weeks forwards or backwards in time, as you might do if you were looking for lead/lag behaviour); secondly, you can export/download the data.

You can also upload your own data to see what terms correlate with it:

(I wonder if they’ll start offering time series analysis features on uploaded, as well as other trend data, too? For example, frequency analysis or trend analysis? This is presumably going on in the background (though I haven’t read the white paper [PDF] yet…)

As if that’s not enough, you can also draw a curve/trendline and then see what correlates with it (so this a weak alternative to uploading your own data, right? Just draw something that looks like it… (h/t to Mike Ellis for first point this out to me).

I’m not convinced that search trends map literally onto the well known “hype cycle” curve, but I thought I’d try out a hype cycle reminiscent curve where the hype was a couple of years ago, and we’re now maybe seeing start to reach mainstream maturity, with maybe the first inklings of a plateau…

Hmmm… the pr0n industry is often identified as a predictor of certain sorts of technology adoption… maybe the 5ex searchers are too?! (Note that correlated hand-drawn charts are linkable).

So – that’s Google Correlate; nifty, eh?

PS Here’s another reason why I blog… my blog history helps me work out how far i the future I live;-) So currently between about three years in the future.. how about you?!;-)

PPS I can imagine Google’s ThinkInsights (insight marketing) loving the thought that folk are going to check out their time series data against Google Trends so the Goog can weave that into it’s offerings… A few additional thoughts leading on from that: 1) when will correlations start to appear in Google AdWords support tools to help you pick adwords based on your typical web traffic patterns or even sales patterns? 2) how far are we off seeing a Google Insights box to complement the Google Search Appliances, that will let you run correlations – as well as Google Prediction type services – onsite without feeling as if you have to upload your data to Google’s servers, and instead, becoming part of Google’s out-kit-in-your-racks offering; 3) when is Google going to start buying up companies like Prism and will it then maybe go after the likes of Experian and Dunnhumby to become a company that organises information about the world of people, as well as just the world’s information…?!)

PPPS Seems like as well as “traditional” link sharing offerings, you can share the link via your Google Reader account…

Interesting…