Archive for the ‘Search’ Category
If you live by pop tech feed or Twitter, you’ve probably heard that Google is rolling out a new style of socially powered search results. If not, or if you’re still not clear about what it entails, read Phil Bradley’s post on the matter: Why Google Search Plus is a disaster for search.
Done that? If not, why not? This post isn’t likely to make much sense at all if you don’t know the context. Here’s the link again: Why Google Search Plus is a disaster for search
So the starting point for this post is this: Google is in the process of rolling out a new web search service that (optionally) offers very personal search results that contains content from folk that Google thinks you’re associated with, and that Google is willing to show you based on license agreements and corporate politics.
Think about this for a minute…. in e the totally personalised view, folk will only see content that their friends have published or otherwise shared…
In Could Librarians Be Influential Friends?, I wondered aloud whether it made sense for librarians and other folk involved with providing support relating to resource discovery and recommendation to start a) creating social network profiles and encouraging their patrons to friend them, and b) start recommending resources using those profiles in order to start influencing the ordering/ranking of results in patrons’ search results based on those personal recommendations. The idea here was that you could start to make
invisible frictionless recommendations by influencing the search engine results returned to your patrons (the results aren’t invisible because your profile picture may appear by the result showing that you recommend it. They’re frictionless in the sense that having made the original recommendation, you no longer have to do any work in trying to bring it to the attention of your patron – the search engines take care of that for you (okay, I know that’s a simplistic view;-). [Hmm.. how about referring to it as recommendation mode support?]
(Note that there is an complementary form of support to the approach which I’ve previously referred to as Invisible Library Tech Support (responsive mode support?; which I guess is also frictionless, at least from the perspective of the patron) in which librarians friend their patrons or monitor generic search terms/tags on Q&A sites and then proactively respond to requests that users post into their social networks more generally.)
With the aggressive stance Google now seems to be taking towards pushing social circle powered results, I think we need to face up to the fact – as Phil Bradley pointed out – that if librarians want to make sure they’re heard by their patrons, they’re going to need to start setting up social profiles, getting their patrons to friend them, and start making content and resource recommendations just anyway in order to make them available as resources that are indexed by patrons’ personal search engines. The same goes for publishers of OERs, academic teaching staff, and “courses”.
If we think of Google social search as searching over custom search engines bound by resources created and recommended by members of a users social circle, if you want to make (invisible) recommendations to a user via their (personalised) web search results, you’re going to need to make sure that the resources/content you want to recommend is indexed by their personal search engines. Which means: a) you need to friend them; and b) you need to share that content/those resources in that social context.
(Hmmm…this makes me think there may be something in the course custom search engine approach after all… Specifically, if the course has a social profile, and recommends the links contained within the course via that profile, they become part of the personalised search index of student’s following that course profile?)
Just by the by, as another example of Google completely messing things up at the moment, I notice that when I share links to posts on this blog via Google+, they don’t appear as trackbacks to the post in question. Which means that if someone refers to a post on this blog on Google+, I don’t know about it… whereas if they blog the link, I do…
See also my chronologically ordered posts on the eroding notion of “Google Ground Truth”.
[Invisible vs frictionless (and various notions of that word) is all getting a bit garbled; see eg @briankelly's Should Higher Education Welcome Frictionless Sharing and my comments to it for a little more on this...]
PS I’ve been getting increasingly infuriated by the clutter around, and lack of variation within, Google search results lately, so I changed my default search engine to Bing. The results are a bit all over the place compared to the Google results I tend to get, but this may be down in part to personalisation/training. I am still making occasional forays to Google, but for now, Bing is it… (because Bing is not Google…)
PPS Hah – just noticed: Google Search Plus doesn’t mean plus in the sense of search more, it means search Google+, which is less, or minus the wider world view…;-)
PPPS I keep meaning to blog this, and keep forgetting: Turn[ing] off [Google] search history personalization, in particular: “If you’ve disabled signed-out search history personalization, you’ll need to disable it again after clearing your browser cookies. Clearing your Google cookie clears your search settings, thereby turning history-based customizations back on.” WHich is to say, when you disable personalisation, you don’t disable personalisation against your Google account, you disable it only insofar as it relates to your current cookie ID?
In Search Engine Powered Courses…, I took an initial, baby step to demonstrate one way in which a promoted link might be used be within a course specific custom search engine. In the next post in this series, I will describe how to influence the positioning of results within a Google custom search engine by boosting their ranking, as well as how results may be ‘faceted’ into different results sets through the use of labels.
In this post, I thought it would be worth taking a step back and reviewing the three configuration files we have access to when defining a Google custom search engine: the configuration file, the promotions file, and the annotations file. If you create a minimal Google custom search engine using the CSE management tools, and then go to the Advanced page, you will see options that allow you to upload the configuration and annotations file. The promotions file can be imported via the Promotions page.
So what do each of these file do?
- The configuration file defines the top level configuration of the search engine. The easiest way of obtaining a template for a CSE is to create a minimal search engine using the CSE management tools, and then export the configuration file from the Advanced page. The configuration file defines, among other things: whether the search engine will search over the whole web, prioritising (or ‘BOOSTing’) sites and pages indexed explicitly by the CSE, or whether it will just return resuts from the explicilty indexed pages (a FILTER style search engine); a definition of the labels, or facets, that allow different search refinements to be applied as different search strategy contexts within the CSE; some styling information; and information relating to Subscribed Links (more of them in another post, if they’re still supported by then..)..
- The promotions file allows you do define promoted links within a CSE; in Search Engine Powered Courses…, I give an example of how these might be used in a course search engine.
- The annotations file identifies the sites and pages that are specific members of the CSE index, as well as how they should be handled (eg the extent to which they should be positively or negatively boosted in the search engine results listing, whether they should appear in the top few results, and what labels or facets should apply to them).
It’s also possible to customise the styling/presentation of the search engine, but that’s a shiny, shiny feature, so probably not something I’ll be looking at…
PS I just noticed you can now manage Google Analytics settings for custom search engines (which allows you to log search queries) from within the CSE control panel… I’m still not sure how easy it is to track which results get clicked through, though?
How can we use customised search engines to support uncourses, or the course models used to support MOOC style offerings?
To set the scene, here’s what Stephen Downes wrote recently on the topic of How to partcipate in a MOOC:
You will notice quickly that there is far too much information being posted in the course for any one person to consume. We tried to start slowly with just a few resources, but it quickly turns into a deluge.
You will be provided with summaries and links to dozens, maybe hundreds, maybe even thousands of web posts, articles from journals and magazines, videos and lectures, audio recordings, live online sessions, discussion groups, and more. Very quickly, you may feel overwhelmed.
Don’t let it intimidate you. Think of it as being like a grocery store or marketplace. Nobody is expected to sample and try everything. Rather, the purpose is to provide a wide selection to allow you to pick and choose what’s of interest to you.
This is an important part of the connectivist model being used in this course. The idea is that there is no one central curriculum that every person follows. The learning takes place through the interaction with resources and course participants, not through memorizing content. By selecting your own materials, you create your own unique perspective on the subject matter.
It is the interaction between these unique perspectives that makes a connectivist course interesting. Each person brings something new to the conversation. So you learn by interacting rather than by mertely consuming.
When I put together the the OU course T151, the original vision revolved around a couple of principles:
1) the course would be built in part around materials produced in public as part of the Digital Worlds uncourse;
2) each week’s offering would follow a similar model: one or two topic explorations, plus an activity and forum discussion time.
In addition, the topic explorations would have a standard format: scene setting, and maybe a teaser question with answer reveal or call to action in the forums; a set of topic exploration questions to frame the topic exploration; a set of resources related to the topic at hand, organised by type (academic readings (via a libezproxy link for subscription content so no downstream logins are required to access the content), Digital Worlds resources, weblinks (industry or well informed blogs, news sites etc), audio and video resources); and a reflective essay by the instructor exploring some of the themes raised in the questions and referring to some of the resources. The aim of the reflective essay was to model the sort of exploration or investigation the student might engage in.
(I’d probably just have a mixed bag of resources listed now, along with a faceting option to focus in on readings, videos, etc.)
The idea behind designing the course in this way was that it would be componentised as much as possible, to allow flexibility in swapping resources or even topics in and out, as well as (though we never managed this), allowing the freedom to study the topics in an arbitrary order. Note: I realised today that to make the materials more easily maintainable, a set of ‘Recent links’ might be identified that weren’t referred to in the ‘My Reflections’ response. That is, they could be completely free standing, and would have no side effects if replaced.
As far as the provision of linked resources went, the original model was that the links should be fed into the course materials from an instructor maintained bookmark collection (for an early take on this, see Managing Bookmarks, with a proof of concept demo at CourseLinks Demo (Hmmm, everything except the dynamic link injection appears to have rotted:-().
The design of the questions/resources page was intended to have the scoping questions at the top of the page, and then the suggested resources presented in a style reminiscent of a search engine results listing, the idea being that we would present the students with too many resources for them to comfortably read in the allocated time, so that they would have to explore the resources from their own perspective (eg given their current level of understanding/knowledge, their personal interests, and so on). In one of my more radical moments, I suggested that the resources would actually be pulled in from a curated/custom search engine ‘live’, according to search terms specially selected around the current topic and framing questions, but I was overruled on that. However, the course does have a Google custom search engine associated with it which searches over materials that are linked to from the course.
So that’s the context…
Where I’m at now is pondering how we can use an enhanced custom search engine as a delivery platform for a resource based uncourse. So here’s my first thought: using a Google Custom Search Engine populated with curated resources in a particular area, can we use Google CSE Promotions to help scaffold a topic exploration?
Here’s my first promotions file:
<Promotions> <Promotion id="t151_1a" queries="topic 1a, Topic 1A, topic exploration 1a, topic exploration 1A, topic 1A, what is a game, game definition" title="T151 Topic Exploration 1A - So what is a game?" url="http://digitalworlds.wordpress.com/2008/03/05/so-what-is-a-game/" description="The aim of this topic is to think about what makes a game a game. Spend a minute or two to come up with your own definition. If you're stuck, read through the Digital Worlds post 'So what is a game?'" image_url="http://kmi.open.ac.uk/images/ou-logo.gif" /> </Promotions>
It’s running on the Digital Worlds Search Engine, so if you want to try it out, try entering the search phrase what is a game or game definition.
(This example suggests to me that it would also make sense to use result boosting to boost the key readings/suggested resources I proposed in the topic materials so that they appear nearer the top of the results (that’ll be the focus of a future post;-))
The promotion displays at the top of the results listing if the specified queries match the search terms the user enters. My initial feeling is that to bootstrap the process, we need to handle:
- queries that allow a user to call on a starting point for a topic exploration by specifically identifying that topic;
- “naive queries”: one reason for using the resource-search model is to try to help students develop effective information skills relating to search. Promotions (and result boosting) allow us to pick up on anticipated naive queries (or popular queries identified from search logs), and suggest a starting point for a sensible way in to the topic. Alternatively, they could be used to offer suggestions for improved or refined searches, or search strategy hints. (I’m reminded of Dave Pattern’s work with guided searches/keyword refinements in the University of Huddersfield Library catalogue in this context).
Here’s another example using the same promotion, but on a different search term:
Of course, we could also start to turn the search engine into something like an adventure game engine. So for example, if we type: start or about, we might get something like:
(The link I associated with start should really point to the course introduction page in the VLE…)
We can also use the search context to provide pastoral or study skills support:
These sort of promotions/enhancements might be produced centrally and rolled out across course search engines, leaving the course and discipline related customisations to the course team and associated subject librarians.
Just a final note: ignoring resource limitations on Google CSEs for a moment, we might imagine the following scenarios for their role out:
1) course wide: bespoke CSEs are commissioned for each course, although they may be supplemented by generic enhancements (eg relating to study skills);
2) qualification based: the CSE is defined at the qualification level, and students call on particular course enhancements by prefacing the search with the course code; it might be that students also see a personalised view of the qualification CSE that is tuned to their current year of study.
3) university wide: the CSE is defined at the university level, and students students call on particular course or qualification level enhancements by prefacing the search with the course or qualification code.
In passing, I noticed I had a broken link to a Google CSE documentation page:
Searching a little, I found the page had moved to
A cached version of the originally linked page is still available, so I did a side-by-side comparison:
For all their success in attracting universities to adopt Google Apps (Tradition meets technology: top universities using Apps for Education), it’s not obvious to me how – or even if – Google is actually doing much around search signal detection and innovation in an educational context?
I’ve floated this a couple of times before (eg Could Librarians Be Influential Friends? And Who Owns Your Search Persona? and Integrating Course Related Search and Bookmarking?), but with yet another announcement from Google about how they’re incorporating social signals into search rankings (Hide sites from anywhere in the world: “We’ve … started incorporating data about sites people have blocked into our general search ranking algorithms to help users find more high quality sites.”), I’m going to raise it again…
To what extent are course and subject librarians setting up course/subject personas that engage in recommending and sharing high quality links in an appropriate social content, and encouraging students to follow those accounts in order to benefit from personalisation of search results based on social signals?
Furthermore, to what extent might the development of search personas represent the creation of a “scholarly agent” that can be used to offer “search assist” to followers of that agent/persona?
I don’t find it that hard to imagine myself taking a course, following the course recommender on a social network (an account that might send out course related reminders as well as relevant links), with an icon depicting my university and the associated course, that on occasion appeared to “recommend” links to me when I was searching for topics relating to my course. (In the normal scheme of things, it wouldn’t actively be recommending links to me, of course. For that, I’d need to subscribe to something like Subscribed Links, as mentioned in Integrating Course Related Search and Bookmarking?.)
A long time ago, I started running with the idea that an organisation’s homepage on the web was the dominant search engine’s search results page for the most common search term associated with that organisation. So for example, the OU’s effective home page is not http://www.open.ac.uk, but the Google search results page for open university. (You can test the extent to which this claim is supported by checking out your website logs, and comparing how many direct visitors you get to the “official” homepage for your website compared to the amount of traffic referred to your site from google for common search terms used to find your site. You do know what those search terms are, right?!;-)
At one time, the results page might include several links to different institutional web pages, each listed as a separate individual search result item. Then five years or so ago, Google started introducing sitelinks into the search results page, a display technique that would include a list of several links to specific pages within the same domain within the context of a single result item, headed by the top level domain.
Or maybe how about the library?
[For an overview, see Anatomy Of A Google Snippet and maybe also Meta Description Mutiny! Take Control of Your Text Snippets; if you want to take some responsibility for what appears, see the Google Webmaster Tools posts on sitelinks, Changing a site title and description and Removing snippets and Instant Preview]
A recent (August 2011) update sees the Goog placing even more focus on the display of sitelinks (The evolution of sitelinks: expanded and improved).
Here are few things about this update that I think are worth noting, particularly in light of recommendations emerging from the JISC “Linking You” project, which offers best practice guidance on the design of top-level URI schemes for university websites:
Sitelinks will now be full-size links with a URL and one line of snippet text—similar to regular results—making it even easier to find the section of the site you want. We’re also increasing the maximum number of sitelinks per query from eight to 12. …
In addition, we’re making a significant improvement to our algorithms by combining sitelink ranking with regular result ranking to yield a higher-quality list of links. This reduces link duplication and creates a better organized search results page. Now, all results from the top-ranked site will be nested within the first result as sitelinks, and all results from other sites will appear below them. The number of sitelinks will also vary based on your query—for example, [museum of art nyc] shows more sitelinks than [the met] because we’re more certain you want results from http://www.metmuseum.org.
So what do we learn from this?
- Sitelinks will now be full-size links with a URL and one line of snippet text—similar: so check your results listing and see if the “one line of snippet text” for each displayed result makes sense. (For some ideas about how to influence snippet text, see e.g. the Google Webmaster Tools links above.)
- We’re … increasing the maximum number of sitelinks per query from eight to 12: what links would you like to appear in the sitelinks list, compared to what links actually do appear? Would consensus in how UK HEIs architect top-level URLs (as, for example, recommended by Linking You, provide uniformity of display of results in the Google SERPs space? Would consensus open Google up to discussion relating to the most effective way of displaying search results for UK HEI sitelink results?
- we’re making a significant improvement to our algorithms by combining sitelink ranking with regular result ranking to yield a higher-quality list of links: which is to say – SEO, and URI path design, may play a role in determining what links get displayed as sitelinks.
- This reduces link duplication and creates a better organized search results page. That is, better orgainised, as defined by Google. If you want to influence the way links to your site are displayed as sitelinks, you need to figure how – you don’t control how Google provides this top level navigation to your site as sitelinks, but you may be able to influence the display through good website design….
- The number of sitelinks will also vary based on your query
If you have pages that mainly contain lists of items, these may also be handled differently in the context of snippets: New snippets for list pages
- Rich snippets microdata – if Google handled edu microdata, what would it describe…?
- Google Webmaster tools: Rich Snippets testing tool – “enter a web page URL to see how it may appear in search results”
PS I wonder how sitelink displays interact with rich snippets…?
PPS There’s a great write up of the Linking You project that I’ve just come across here: Lend Me Your Ears Dear University Web Managers!. Go and read it…. now… Hmmm.. thinks… what would a similar exercise for local council websites look like?
As a long time fan of custom search engine offerings, I keep wondering why Google doesn’t seem to have much active interest in this area? Google Custom Search updates are few and far between, and typically go unreported by the tech blogs. Perhaps more surprisingly, Custom Search Engines don’t appear to have much, if any, recognition in the Google Apps for Education suite, although I think they are available with a Google Apps for education ID?
One of the things I’ve been mulling over for years is the role that automatically created course related search engines might have to play as part of a course’s VLE offering. The search engine would offer search results either over a set of web domains linked to from the actual course materials, or simply boost results from those domains in the context of a “normal” set of search results. I’ve recently started thinking that we could also make use “promoted” results to highlight specific required or recommended readings when a particular topic is searched for (for example, Integrating Course Related Search and Bookmarking?).
During an informal “technical” meeting around three JISC funded reseource discovery projects at Cambridge yesterday (Comet, Jerome, SALDA; disclaimer: I didn’t work on any of them, but I was in the area over the weekend…), there were a few brief mentions of how various university libraries were opening up their catalogues to the search engine crawlers. So for example, if you do a site: limited search on the following paths:
you can get (partial?) search results, with a greater or lesser degree of success, from the Sussex, Lincoln, Huddersfield and Cambridge catalogues respectively.
In a Google custom search engine context, we can tunnel in a little deeper in an attempt to returns results limited to actual records:
I’m not sure how useful or interesting this is at the moment, except to the library systems developers maybe, who can compare how informatively their library catalogue content is indexed and displayed in Google search results compared to other libraries… (so for example, I noticed that Google appears to be indexing the “related items” that Huddersfield publishes on a record page, meaning that if a search term appears in a related work, you might get a record that at first glance appears to have little to do with your search term, in effect providing a “reverse related work” search (that is, search on related works and return items that have the search term as the related work)).
But it’s a start… and with the addition of customised rankings, might provide a jumping off point for experimenting with novel ways of searching across UK HE catalogues using Google indexed content. (For example, a version of the CSE on the cam.ac.uk domain might boost the Cambridge results; within an institution, works related to a particular course through mention on a reading list might get a boost if a student on that course runs a search… and so on…
PS A couple of other things that may be worth pondering… could Google Apps for Education account holders be signed up to to Subscribed Links offering customised search results in the main Google domain relating to a particular course. (That is, define subscribed link profiles for a each course, and automatically add those subscriptions to an Apps for Edu user’s account based on the courses they’re taking?) Or I wonder if it would be possible to associate subscribed links to public access browsers in some way?
And how about finding some way of working with Google to open up “professional” search profiles, where for example students are provided with “read only” versions of the personalised search results of an expert in a particular area who has tuned, through personalisation, a search profile that is highly specialised in a particular subject area, e.g. as mentioned in Google Personal Custom Search Engines? (see also Could Librarians Be Influential Friends? And Who Owns Your Search Persona?).
If anyone out there is working on ways of using Google customised and personalised search as a way of delivering “improved” search results in an educational context, I’d love to hear more about what you’re getting up to…
This is a note-to-self as much as anything, relating to the Course Detective custom search engine that searches over UK HE course prospectus web pages about the extent to which we might be able to use data such as the student satisfaction survey results (as made available via Unistats) to boost search results around particular subjects in line with student satisfaction ratings, or employment prospects, for particular universities?
It’s possible to tweak rankings in Google CSEs in a variety of ways. On the one hand, we can BOOST (improve the ranking), FILTER (limit results to members of a given set) or ELIMINATE (exclude) sites appearing in the search results listing. In the simplest case, we assign a BOOST, FILTER or ELIMINATE weight to a label, and then apply labels to annotations so that they benefit from the corresponding customisation. We can further refine the effect of the modification by applying a score to each annotation. The product of score and weight values determines the overall ranking modification that is applied to each result for a label applied to an annotation.
So here’s what I’m thinking:
- define labels for things like achievement or satisfaction that apply a boost to a result;
- allow uses to apply a label to a search;
- for each university annotation (that is, the listing that identifies the path to the pages for a particular university’s online prospectus), add a label with a score modifier determined by the achievement or satisfaction value, for example, for that institution;
- for refinement labels that tweak search rankings within a particular subject area, define labels corresponding to those subject areas and apply score modifiers to each institution based on, for example, the satisfaction level with that subject area. (Note: I’m not sure if the same path can have several different annotations provided to it with different scores?
For example, an annotation file typically contains a fragment that looks like:
<Annotations> <Annotation about="webcast.berkeley.edu/*" score="1"> <Label name="university_boost_highest"/> <Label name="lectures"/> </Annotation> <Annotation about="www.youtube.com/ucberkeley/*" score="1"> <Label name="university_boost_highest"/> <Label name="videos_boost_mid"/> <Label name="lectures"/> </Annotation> </Annotations>
I don’t know if this would work:
<Annotations> <Annotation about="example.com/prospectus/*" score="1"> <Label name="chemistry"/> </Annotation> <Annotations> <Annotation about="example.com/prospectus/*" score="0.5"> <Label name="physics"/> </Annotation> </Annotations>
That said, if the URLs are nicely structured, we might be able to do something like:
<Annotations> <Annotation about="example.com/prospectus/chemistry/*" score="1"> <Label name="chemistry"/> </Annotation> <Annotations> <Annotation about="example.com/prospectus/physics/*" score="0.5"> <Label name="physics"/> </Annotation> </Annotations>
albeit at the cost of having to do a lot more work in terms of identifying appropriate URI paths.
I also need to start thinking a bit more about how to apply refinements and ranking adjustments in course based CSEs.
I spotted this for the first time last night:
I had actually read the post in the Google Reader context (so Google knew that), but I wonder: if I hadn’t read the post, would it still have shown up like that?
As far as personalised ranking signals go:
- does the fact that I subscribe to the feed in Google Reader affect the rank of items from that feed in my personalised search results?
- if I have read the post in Google reader, does that also affect the rank of that specific post in my personalised search results?
If I have shared a link – through Google+, or Twitter, for example – are the ranking of those links positively affected in my personalised search results. That is, might social search actually be most useful when the Goog picks up on things I have shared myself, and then “reminds” me of them via a ranking boost in my personalised search results when I’m searching on a related topic?
Maybe tweeting and sharing into the void is actually yet another way of invisibly building search refinements into your personalised search context?