For all their success in attracting universities to adopt Google Apps (Tradition meets technology: top universities using Apps for Education), it’s not obvious to me how – or even if – Google is actually doing much around search signal detection and innovation in an educational context?
I’ve floated this a couple of times before (eg Could Librarians Be Influential Friends? And Who Owns Your Search Persona? and Integrating Course Related Search and Bookmarking?), but with yet another announcement from Google about how they’re incorporating social signals into search rankings (Hide sites from anywhere in the world: “We’ve … started incorporating data about sites people have blocked into our general search ranking algorithms to help users find more high quality sites.”), I’m going to raise it again…
To what extent are course and subject librarians setting up course/subject personas that engage in recommending and sharing high quality links in an appropriate social content, and encouraging students to follow those accounts in order to benefit from personalisation of search results based on social signals?
Furthermore, to what extent might the development of search personas represent the creation of a “scholarly agent” that can be used to offer “search assist” to followers of that agent/persona?
I don’t find it that hard to imagine myself taking a course, following the course recommender on a social network (an account that might send out course related reminders as well as relevant links), with an icon depicting my university and the associated course, that on occasion appeared to “recommend” links to me when I was searching for topics relating to my course. (In the normal scheme of things, it wouldn’t actively be recommending links to me, of course. For that, I’d need to subscribe to something like Subscribed Links, as mentioned in Integrating Course Related Search and Bookmarking?.)
A long time ago, I started running with the idea that an organisation’s homepage on the web was the dominant search engine’s search results page for the most common search term associated with that organisation. So for example, the OU’s effective home page is not http://www.open.ac.uk, but the Google search results page for open university. (You can test the extent to which this claim is supported by checking out your website logs, and comparing how many direct visitors you get to the “official” homepage for your website compared to the amount of traffic referred to your site from google for common search terms used to find your site. You do know what those search terms are, right?!;-)
At one time, the results page might include several links to different institutional web pages, each listed as a separate individual search result item. Then five years or so ago, Google started introducing sitelinks into the search results page, a display technique that would include a list of several links to specific pages within the same domain within the context of a single result item, headed by the top level domain.
Or maybe how about the library?
[For an overview, see Anatomy Of A Google Snippet and maybe also Meta Description Mutiny! Take Control of Your Text Snippets; if you want to take some responsibility for what appears, see the Google Webmaster Tools posts on sitelinks, Changing a site title and description and Removing snippets and Instant Preview]
A recent (August 2011) update sees the Goog placing even more focus on the display of sitelinks (The evolution of sitelinks: expanded and improved).
Here are few things about this update that I think are worth noting, particularly in light of recommendations emerging from the JISC “Linking You” project, which offers best practice guidance on the design of top-level URI schemes for university websites:
Sitelinks will now be full-size links with a URL and one line of snippet text—similar to regular results—making it even easier to find the section of the site you want. We’re also increasing the maximum number of sitelinks per query from eight to 12. …
In addition, we’re making a significant improvement to our algorithms by combining sitelink ranking with regular result ranking to yield a higher-quality list of links. This reduces link duplication and creates a better organized search results page. Now, all results from the top-ranked site will be nested within the first result as sitelinks, and all results from other sites will appear below them. The number of sitelinks will also vary based on your query—for example, [museum of art nyc] shows more sitelinks than [the met] because we’re more certain you want results from http://www.metmuseum.org.
So what do we learn from this?
- Sitelinks will now be full-size links with a URL and one line of snippet text—similar: so check your results listing and see if the “one line of snippet text” for each displayed result makes sense. (For some ideas about how to influence snippet text, see e.g. the Google Webmaster Tools links above.)
- We’re … increasing the maximum number of sitelinks per query from eight to 12: what links would you like to appear in the sitelinks list, compared to what links actually do appear? Would consensus in how UK HEIs architect top-level URLs (as, for example, recommended by Linking You, provide uniformity of display of results in the Google SERPs space? Would consensus open Google up to discussion relating to the most effective way of displaying search results for UK HEI sitelink results?
- we’re making a significant improvement to our algorithms by combining sitelink ranking with regular result ranking to yield a higher-quality list of links: which is to say – SEO, and URI path design, may play a role in determining what links get displayed as sitelinks.
- This reduces link duplication and creates a better organized search results page. That is, better orgainised, as defined by Google. If you want to influence the way links to your site are displayed as sitelinks, you need to figure how – you don’t control how Google provides this top level navigation to your site as sitelinks, but you may be able to influence the display through good website design….
- The number of sitelinks will also vary based on your query
If you have pages that mainly contain lists of items, these may also be handled differently in the context of snippets: New snippets for list pages
– Rich snippets microdata – if Google handled edu microdata, what would it describe…?
– Google Webmaster tools: Rich Snippets testing tool – “enter a web page URL to see how it may appear in search results”
PS I wonder how sitelink displays interact with rich snippets…?
PPS There’s a great write up of the Linking You project that I’ve just come across here: Lend Me Your Ears Dear University Web Managers!. Go and read it…. now… Hmmm.. thinks… what would a similar exercise for local council websites look like?
As a long time fan of custom search engine offerings, I keep wondering why Google doesn’t seem to have much active interest in this area? Google Custom Search updates are few and far between, and typically go unreported by the tech blogs. Perhaps more surprisingly, Custom Search Engines don’t appear to have much, if any, recognition in the Google Apps for Education suite, although I think they are available with a Google Apps for education ID?
One of the things I’ve been mulling over for years is the role that automatically created course related search engines might have to play as part of a course’s VLE offering. The search engine would offer search results either over a set of web domains linked to from the actual course materials, or simply boost results from those domains in the context of a “normal” set of search results. I’ve recently started thinking that we could also make use “promoted” results to highlight specific required or recommended readings when a particular topic is searched for (for example, Integrating Course Related Search and Bookmarking?).
During an informal “technical” meeting around three JISC funded reseource discovery projects at Cambridge yesterday (Comet, Jerome, SALDA; disclaimer: I didn’t work on any of them, but I was in the area over the weekend…), there were a few brief mentions of how various university libraries were opening up their catalogues to the search engine crawlers. So for example, if you do a site: limited search on the following paths:
you can get (partial?) search results, with a greater or lesser degree of success, from the Sussex, Lincoln, Huddersfield and Cambridge catalogues respectively.
In a Google custom search engine context, we can tunnel in a little deeper in an attempt to returns results limited to actual records:
I’m not sure how useful or interesting this is at the moment, except to the library systems developers maybe, who can compare how informatively their library catalogue content is indexed and displayed in Google search results compared to other libraries… (so for example, I noticed that Google appears to be indexing the “related items” that Huddersfield publishes on a record page, meaning that if a search term appears in a related work, you might get a record that at first glance appears to have little to do with your search term, in effect providing a “reverse related work” search (that is, search on related works and return items that have the search term as the related work)).
But it’s a start… and with the addition of customised rankings, might provide a jumping off point for experimenting with novel ways of searching across UK HE catalogues using Google indexed content. (For example, a version of the CSE on the cam.ac.uk domain might boost the Cambridge results; within an institution, works related to a particular course through mention on a reading list might get a boost if a student on that course runs a search… and so on…
PS A couple of other things that may be worth pondering… could Google Apps for Education account holders be signed up to to Subscribed Links offering customised search results in the main Google domain relating to a particular course. (That is, define subscribed link profiles for a each course, and automatically add those subscriptions to an Apps for Edu user’s account based on the courses they’re taking?) Or I wonder if it would be possible to associate subscribed links to public access browsers in some way?
And how about finding some way of working with Google to open up “professional” search profiles, where for example students are provided with “read only” versions of the personalised search results of an expert in a particular area who has tuned, through personalisation, a search profile that is highly specialised in a particular subject area, e.g. as mentioned in Google Personal Custom Search Engines? (see also Could Librarians Be Influential Friends? And Who Owns Your Search Persona?).
If anyone out there is working on ways of using Google customised and personalised search as a way of delivering “improved” search results in an educational context, I’d love to hear more about what you’re getting up to…
I spotted this for the first time last night:
I had actually read the post in the Google Reader context (so Google knew that), but I wonder: if I hadn’t read the post, would it still have shown up like that?
As far as personalised ranking signals go:
– does the fact that I subscribe to the feed in Google Reader affect the rank of items from that feed in my personalised search results?
– if I have read the post in Google reader, does that also affect the rank of that specific post in my personalised search results?
If I have shared a link – through Google+, or Twitter, for example – are the ranking of those links positively affected in my personalised search results. That is, might social search actually be most useful when the Goog picks up on things I have shared myself, and then “reminds” me of them via a ranking boost in my personalised search results when I’m searching on a related topic?
Maybe tweeting and sharing into the void is actually yet another way of invisibly building search refinements into your personalised search context?
Not surprisingly, I’m way behind on the two eSTEeM projects I put proposals in for – my creative juices don’t seem to have been flowing in those areas for a bit:-( – but as a marking avoidance strategy I thought I’d jot down some thoughts that have been coming to mind about how the custom search project at least might develop (eSTEeM Project: Custom Course Search Engines).
The original idea was to provide a custom search engine that indexes pages and domains that are referenced within a course in order to provide a custom search engine for that course. The OU course T151 is structured as a series of topic explorations using the structure:
– topic overview
– framing questions
– suggested resources
– my reflections on the topic, guided by the questions, drawing on the suggested resources and a critique of them
One original idea for the course was that rather than give an explicit list of suggested resources, we provide a set of links pulled in live from a predefined search query. The list would look as if it was suggested by the course team but it would actually be created dynamically. As instructors, we wouldn’t be specifying particular readings, instead we would be trusting the search algorithm to return relevant resources. (You might argue this is a neglectful approach… a more realistic model might be to have specifically recommended items as well as a dynamically created list of “Possibly related resources”.)
At this point it’s maybe worth stepping back a moment to consider what goes into producing a set of search results. Essentially, there are three key elements:
– the index, the set of content that the search engine has “searched” and from which it can return a set of results;
– the search query; this is run against the index to identify a set of candidate search results;
– a presentation algorithm that determines how to order the search results as presented to the user.
If the search engine and the presentation algorithm are fixed, then for a given set of search terms, and a given index, we can specify a search term and get a known set of results back. So in this case, we could use a fixed custom search engine, with know search terms, and return a known list of suggested readings. The search engine would provide some sort of “ground truth” – same answer for the same query, always.
If we trust the sources and the presentation algorithm, and we trust that we have written an effective search query, then if the index is not fixed, or if a personalised ranking algorithm (that we trust) is used as part of the search engine, we would potentially be returning search results that the instructor has not seen before. For example, the resources may be more recent than the last time the instructor searched for resources to recommend, or they better fit the personalisation criteria for the user under the ranking algorithm used as part of the presentation algorithm.
In this case, the instructor is not saying: “I want you to read this particular resource”. They are saying something more along the lines of: “these are potentially the sorts of resource I might suggest you look at in order to study this topic”. (Lots of caveats in there… If you believe in content led instruction, with students referring to to specifically referenced resources, I imagine that you would totally rile against this approach!)
At times, we might want to explicitly recommend one or two particular resources, but also open up some other recommendations to “the algorithm”. It struck me that it might be possible to do this within the context of a Google Custom Search approach using “special results” (e.g. Google CSEs: Creating Special Results/Promotions).
For example, Google CSEs support:
– promotions: “A promotion is simply an association between a pre-defined set of query terms and a link to a webpage. When a user types a search that exactly matches one of your query terms, the promotion appears at the top of the page.” So by using a specific search term, we can force the return of a specific result as the top result. In the context of a topic exploration, we could thus prepopulate the search form of an embedded search engine with a known search phrase, and use a promotion to force a “recommend reading” link to the top of the results listing.
Promotion links are stored in a separate config file and have the form:
<Promotions> <Promotion id="1" queries="wanderer, the wanderer" title="Groo the Wanderer" url="http://www.groo.com/" description="Comedy. American series illustrated by Sergio Aragonés." image_url="http://www.newsfromme.com/images5/groo11.jpg" /> </Promotions>
– subscribed links: subscribed links allow you to return results in a specific format (such as text, or text and a link, or other structured results) based on a perfect match with a specific search term. In a sense, subscribed links represent a generalised version of promotions. Subscribed links are also available to users outside the context of a CSE. If a user subscribes to a particular subscribed link file, then if there is an exact match against of one the search phrases in the subscribed link file and a search phrase used by a subscribing user on Google web search (i.e. on google.com or google.co.uk), the subscribed link will be returned in the results listing.
In the simplest case, subscribed links can be defined at the individual link level:
If your search term is an exact match for the term in the subscribed link definition, it will appear in the main search results page:
It’s also possible to define subscribed link definition files, either as simple tab separated docs or RSS/Atom feeds, or using a more formal XML document structure. One advantage of creating subscribed links files for use within in custom search engine is that users (i.e. students) can subscribe to them as a way of augmenting or enhancing their own Google search results. This has the joint effect of increasing the surface area of the course, so that course related recommendations can be pushed to the student for relevant queries made through the Google search engine, as well as providing a legacy offering: students can potentially take away a subscription when then finish the course to continue to receive “academically credible” results on relevant search topics. (By issuing subscription links on a per course presentation basis (or even on a personalised, unique feed per student basis), feeds to course alumni might be customised, or example by removing links to subscription content (or suggesting how such content might be obtained through a subscription to the university library), or occasionally adding in advertising related links (so if a student searches using a “course” keyword, make recommendations around that via a subscribed links feed; in the limit, this could even take on the form of a personalised, subscription based advertising channel).
Another way in which “recommended” links can be boosted in a custom search result listing is through boosting search results via their ranking factors (Changing the Ranking of Your Search Results).
In the case of both subscribed links and boosted search results, it’s possible to create a configuration file dynamically. Where students are bookmarking search results relating to a course, it would therefore be possible to feed these into a course related custom search engine definition file, or a subscribed link file. If subscribed link files are maintained at a personal level, it would also be possible to integrate a student’s bookmarked links in to their subscribed links feed, at least for use on Google websearch (probably not in the custom search engine context?). This would support rediscovery of content bookmarked by the student through subscribed link recommendations.
The model here is based around defining search contexts that one or more users can contribute to, and then saving out results from a search into a topic based bookmark area. The video suggests that particular results can also be blocked (and maybe boosted? The greyed plus on the left hand side?) – presumably this is a persistent feature, so if you, or another member of your “search team” runs the search, the blocked result doesn’t appear? (Is a list of blocked results and their corresponding search terms available anywhere I wonder?) In common with the clipping blog model used by sites such as posterous, it’s possible to post links and short blog posts into a topic area. Commenting is also supported.
To say that search was Google’s initial big idea, it’s surprising that it seems to play no significant role in Google’s offerings for education through Google Apps. Thinking back, search related topics were what got me into blogging and quick hacks; maybe it’s time to return to that area…
As you’re probably aware by now, yesterday Google announced its Google+ social network. A key part of every social network is a user’s personal profile page, the “social object” that other people can actually connect to.
Google has offered personal profile pages for some time, (here’s my rather basic Google Profile: Tony Hirst), but they’ve never really been a part of anything, and they’re not really linkable to – which means there’s little reason for PageRank based search algorithms such as Google’s to return Google Profile pages in the top results for you if anyone ever searches for you.
(PageRank is the algorithm that gave Google its early edge in the search engine wars; links from one page to another count as “votes” regarding the quality of the page that is linked to. Crudely put, if people link to you, those links contribute to your PageRank and you’re more likely to make it to the top of a search results page.)
Until now, that is (or at least, until a couple of weeks ago… I missed this announcement at the time it was made…): Authorship markup and web search, a technique for “supporting markup that enables websites to publicly link within their site from content to author pages”.
The method is described as follows:
To identify the author of an article, Google checks for a connection between the content page (such as an article), an author page, and a Google Profile.
A content page can be any piece of content with an author: a news article, blog post, short story …
An author page is a page about a specific author, on the same domain as the content page.
A Google Profile is Google’s version of an author page. It’s how you present yourself to the web and to Google.
In confirming authorship, Google looks for:
Links from the content page to the author page (if the path of links continues to a Google Profile, we can also show Profile information in search results)
A path of links back from your Google Profile to your content.
These reciprocal links are important: without them, anyone could attribute content to you, or you could take credit for any content on the web.
The rel=”author” link indicates the author of an article [so for example: <a rel=”author” href=”https://profiles.google.com/tony.hirst/”>Google Profile: Tony Hirst</a>]
Here’s why you might be tempted to do this…:
Many of you create great content on the web, and we work hard to make that content discoverable on Google. Today, we will start highlighting the people creating this content in Google.com search results.
As you can see …, certain results will display an author’s picture and name — derived from and linked to their Google Profile — next to their content on the Google Search results page.
Source: Highlighting content creators in search results; [my emphasis]
So… if you want to assert authorship and be recognised as the author in the Google search results, you need to start linking all your content back to your Google Profile Page…
…and so start feeding PageRank juice to your Google profile page…
…so that when folk search for you on the web, they’re more likely to see that page…
This is a harsh reading, of course: authorship can also be asserted by linking within a domain to a page that you have asserted to Google that represents you: The rel=”author” link indicates the author of an article, and can point to .. an author page on the same domain as the content page: Written by <a rel="author" href="../authors/mattcutts">Matt Cutts</a>. The author page should link to your Google Profile using rel=”me”.
(I wonder why <link rel=”author” href=”../authors/mattcutts”/> isn’t supported? Or maybe it is?)
Algorithmically, the assertion of authorship might also help in Google’s fight against spamblogs, which republish content blindly from original sources. That is, by asserting authorship of a page, if someone reposts your content, google will be able to identify you as the original author and return a link back to your page in the search results listing, rather than the republished page.
I imagine there might also be personal reputation benefits – for example, if people +1 a page you have claimed authorship of, it might give you a “Reputation Rank” boost for the subject area associated with that page?
One thing that particularly interested me then, as it still does now, was the way that certain search trends they reveal rhythmic behaviour over the course of weeks, months or years.
At the start of this year, I revisited the topic with a post on Identifying Periodic Google Trends, Part 1: Autocorrelation (followd by Improving Autocorrelation Calculations on Google Trends Data).
Anyway today it seems that Google has cracked the scaling issues with discovering correlations between search trends (using North American search trend data), as well as opening up a service that will identify what search trends correlate most closely with your own uploaded time series data: Correlate (announcement: Mining patterns in search data with Google Correlate)
For the quick overview, check out the Google Correlate Comic.
So what’s on offer? First, enter a search term and see what it’s correlated with:
As well as the line chart, correlations can also be plotted as a scatterplot:
You can also run “spatial correlations”, though at the moment this appears to be limited to US states. (I *think* this works by looking for search terms that are popular in the requested areas and not popular in the other listed areas. To generalise this, I guess you need three things: the total list of areas that work for the spatial correlation query; the areas you want the search volume for the “to be discovered correlated phrase” to be high; the areas you want to the search volume for the “to be discovered correlated phrase” to be low?)
At this point it’s maybe worth remembering that correlation does not imply causation…
A couple of other interesting things to note: firstly, you can offset the data (so shift it a few weeks forwards or backwards in time, as you might do if you were looking for lead/lag behaviour); secondly, you can export/download the data.
You can also upload your own data to see what terms correlate with it:
(I wonder if they’ll start offering time series analysis features on uploaded, as well as other trend data, too? For example, frequency analysis or trend analysis? This is presumably going on in the background (though I haven’t read the white paper [PDF] yet…)
As if that’s not enough, you can also draw a curve/trendline and then see what correlates with it (so this a weak alternative to uploading your own data, right? Just draw something that looks like it… (h/t to Mike Ellis for first point this out to me).
I’m not convinced that search trends map literally onto the well known “hype cycle” curve, but I thought I’d try out a hype cycle reminiscent curve where the hype was a couple of years ago, and we’re now maybe seeing start to reach mainstream maturity, with maybe the first inklings of a plateau…
Hmmm… the pr0n industry is often identified as a predictor of certain sorts of technology adoption… maybe the 5ex searchers are too?! (Note that correlated hand-drawn charts are linkable).
So – that’s Google Correlate; nifty, eh?
PS Here’s another reason why I blog… my blog history helps me work out how far i the future I live;-) So currently between about three years in the future.. how about you?!;-)
PPS I can imagine Google’s ThinkInsights (insight marketing) loving the thought that folk are going to check out their time series data against Google Trends so the Goog can weave that into it’s offerings… A few additional thoughts leading on from that: 1) when will correlations start to appear in Google AdWords support tools to help you pick adwords based on your typical web traffic patterns or even sales patterns? 2) how far are we off seeing a Google Insights box to complement the Google Search Appliances, that will let you run correlations – as well as Google Prediction type services – onsite without feeling as if you have to upload your data to Google’s servers, and instead, becoming part of Google’s out-kit-in-your-racks offering; 3) when is Google going to start buying up companies like Prism and will it then maybe go after the likes of Experian and Dunnhumby to become a company that organises information about the world of people, as well as just the world’s information…?!)
PPPS Seems like as well as “traditional” link sharing offerings, you can share the link via your Google Reader account…