Archive for the ‘Search’ Category
Integrating Course Related Search and Bookmarking?
Not surprisingly, I’m way behind on the two eSTEeM projects I put proposals in for – my creative juices don’t seem to have been flowing in those areas for a bit:-( – but as a marking avoidance strategy I thought I’d jot down some thoughts that have been coming to mind about how the custom search project at least might develop (eSTEeM Project: Custom Course Search Engines).
The original idea was to provide a custom search engine that indexes pages and domains that are referenced within a course in order to provide a custom search engine for that course. The OU course T151 is structured as a series of topic explorations using the structure:
- topic overview
- framing questions
- suggested resources
- my reflections on the topic, guided by the questions, drawing on the suggested resources and a critique of them
One original idea for the course was that rather than give an explicit list of suggested resources, we provide a set of links pulled in live from a predefined search query. The list would look as if it was suggested by the course team but it would actually be created dynamically. As instructors, we wouldn’t be specifying particular readings, instead we would be trusting the search algorithm to return relevant resources. (You might argue this is a neglectful approach… a more realistic model might be to have specifically recommended items as well as a dynamically created list of “Possibly related resources”.)
At this point it’s maybe worth stepping back a moment to consider what goes into producing a set of search results. Essentially, there are three key elements:
- the index, the set of content that the search engine has “searched” and from which it can return a set of results;
- the search query; this is run against the index to identify a set of candidate search results;
- a presentation algorithm that determines how to order the search results as presented to the user.
If the search engine and the presentation algorithm are fixed, then for a given set of search terms, and a given index, we can specify a search term and get a known set of results back. So in this case, we could use a fixed custom search engine, with know search terms, and return a known list of suggested readings. The search engine would provide some sort of “ground truth” – same answer for the same query, always.
If we trust the sources and the presentation algorithm, and we trust that we have written an effective search query, then if the index is not fixed, or if a personalised ranking algorithm (that we trust) is used as part of the search engine, we would potentially be returning search results that the instructor has not seen before. For example, the resources may be more recent than the last time the instructor searched for resources to recommend, or they better fit the personalisation criteria for the user under the ranking algorithm used as part of the presentation algorithm.
In this case, the instructor is not saying: “I want you to read this particular resource”. They are saying something more along the lines of: “these are potentially the sorts of resource I might suggest you look at in order to study this topic”. (Lots of caveats in there… If you believe in content led instruction, with students referring to to specifically referenced resources, I imagine that you would totally rile against this approach!)
At times, we might want to explicitly recommend one or two particular resources, but also open up some other recommendations to “the algorithm”. It struck me that it might be possible to do this within the context of a Google Custom Search approach using “special results” (e.g. Google CSEs: Creating Special Results/Promotions).
For example, Google CSEs support:
- promotions: “A promotion is simply an association between a pre-defined set of query terms and a link to a webpage. When a user types a search that exactly matches one of your query terms, the promotion appears at the top of the page.” So by using a specific search term, we can force the return of a specific result as the top result. In the context of a topic exploration, we could thus prepopulate the search form of an embedded search engine with a known search phrase, and use a promotion to force a “recommend reading” link to the top of the results listing.
Promotion links are stored in a separate config file and have the form:
<Promotions>
<Promotion id="1"
queries="wanderer, the wanderer"
title="Groo the Wanderer"
url="http://www.groo.com/"
description="Comedy. American series illustrated by Sergio Aragonés."
image_url="http://www.newsfromme.com/images5/groo11.jpg" />
</Promotions>
- subscribed links: subscribed links allow you to return results in a specific format (such as text, or text and a link, or other structured results) based on a perfect match with a specific search term. In a sense, subscribed links represent a generalised version of promotions. Subscribed links are also available to users outside the context of a CSE. If a user subscribes to a particular subscribed link file, then if there is an exact match against of one the search phrases in the subscribed link file and a search phrase used by a subscribing user on Google web search (i.e. on google.com or google.co.uk), the subscribed link will be returned in the results listing.
In the simplest case, subscribed links can be defined at the individual link level:
If your search term is an exact match for the term in the subscribed link definition, it will appear in the main search results page:
It’s also possible to define subscribed link definition files, either as simple tab separated docs or RSS/Atom feeds, or using a more formal XML document structure. One advantage of creating subscribed links files for use within in custom search engine is that users (i.e. students) can subscribe to them as a way of augmenting or enhancing their own Google search results. This has the joint effect of increasing the surface area of the course, so that course related recommendations can be pushed to the student for relevant queries made through the Google search engine, as well as providing a legacy offering: students can potentially take away a subscription when then finish the course to continue to receive “academically credible” results on relevant search topics. (By issuing subscription links on a per course presentation basis (or even on a personalised, unique feed per student basis), feeds to course alumni might be customised, or example by removing links to subscription content (or suggesting how such content might be obtained through a subscription to the university library), or occasionally adding in advertising related links (so if a student searches using a “course” keyword, make recommendations around that via a subscribed links feed; in the limit, this could even take on the form of a personalised, subscription based advertising channel).
Another way in which “recommended” links can be boosted in a custom search result listing is through boosting search results via their ranking factors (Changing the Ranking of Your Search Results).
In the case of both subscribed links and boosted search results, it’s possible to create a configuration file dynamically. Where students are bookmarking search results relating to a course, it would therefore be possible to feed these into a course related custom search engine definition file, or a subscribed link file. If subscribed link files are maintained at a personal level, it would also be possible to integrate a student’s bookmarked links in to their subscribed links feed, at least for use on Google websearch (probably not in the custom search engine context?). This would support rediscovery of content bookmarked by the student through subscribed link recommendations.
Just by the by, a PR mailing in my inbox today threw up another example of how search and bookmarking might be brought more closely together: SearchTeam (screenshots [pdf]).
The model here is based around defining search contexts that one or more users can contribute to, and then saving out results from a search into a topic based bookmark area. The video suggests that particular results can also be blocked (and maybe boosted? The greyed plus on the left hand side?) – presumably this is a persistent feature, so if you, or another member of your “search team” runs the search, the blocked result doesn’t appear? (Is a list of blocked results and their corresponding search terms available anywhere I wonder?) In common with the clipping blog model used by sites such as posterous, it’s possible to post links and short blog posts into a topic area. Commenting is also supported.
To say that search was Google’s initial big idea, it’s surprising that it seems to play no significant role in Google’s offerings for education through Google Apps. Thinking back, search related topics were what got me into blogging and quick hacks; maybe it’s time to return to that area…
Google Playing the SEO Link Building Game to Drive Uptake Of Google Profiles?
As you’re probably aware by now, yesterday Google announced its Google+ social network. A key part of every social network is a user’s personal profile page, the “social object” that other people can actually connect to.
Google has offered personal profile pages for some time, (here’s my rather basic Google Profile: Tony Hirst), but they’ve never really been a part of anything, and they’re not really linkable to – which means there’s little reason for PageRank based search algorithms such as Google’s to return Google Profile pages in the top results for you if anyone ever searches for you.
(PageRank is the algorithm that gave Google its early edge in the search engine wars; links from one page to another count as “votes” regarding the quality of the page that is linked to. Crudely put, if people link to you, those links contribute to your PageRank and you’re more likely to make it to the top of a search results page.)
Until now, that is (or at least, until a couple of weeks ago… I missed this announcement at the time it was made…): Authorship markup and web search, a technique for “supporting markup that enables websites to publicly link within their site from content to author pages”.
The method is described as follows:
To identify the author of an article, Google checks for a connection between the content page (such as an article), an author page, and a Google Profile.
A content page can be any piece of content with an author: a news article, blog post, short story …
An author page is a page about a specific author, on the same domain as the content page.
A Google Profile is Google’s version of an author page. It’s how you present yourself to the web and to Google.
…
In confirming authorship, Google looks for:Links from the content page to the author page (if the path of links continues to a Google Profile, we can also show Profile information in search results)
A path of links back from your Google Profile to your content.
These reciprocal links are important: without them, anyone could attribute content to you, or you could take credit for any content on the web.
….
The rel=”author” link indicates the author of an article [so for example: <a rel="author" href="https://profiles.google.com/tony.hirst/">Google Profile: Tony Hirst</a>]
Source: Authorship
Here’s why you might be tempted to do this…:
Many of you create great content on the web, and we work hard to make that content discoverable on Google. Today, we will start highlighting the people creating this content in Google.com search results.
As you can see …, certain results will display an author’s picture and name — derived from and linked to their Google Profile — next to their content on the Google Search results page.
Source: Highlighting content creators in search results; [my emphasis]
So… if you want to assert authorship and be recognised as the author in the Google search results, you need to start linking all your content back to your Google Profile Page…
…and so start feeding PageRank juice to your Google profile page…
…so that when folk search for you on the web, they’re more likely to see that page…
This is a harsh reading, of course: authorship can also be asserted by linking within a domain to a page that you have asserted to Google that represents you: The rel=”author” link indicates the author of an article, and can point to .. an author page on the same domain as the content page: Written by <a rel="author" href="../authors/mattcutts">Matt Cutts</a>. The author page should link to your Google Profile using rel=”me”.
(I wonder why <link rel=”author” href=”../authors/mattcutts”/> isn’t supported? Or maybe it is?)
Algorithmically, the assertion of authorship might also help in Google’s fight against spamblogs, which republish content blindly from original sources. That is, by asserting authorship of a page, if someone reposts your content, google will be able to identify you as the original author and return a link back to your page in the search results listing, rather than the republished page.
I imagine there might also be personal reputation benefits – for example, if people +1 a page you have claimed authorship of, it might give you a “Reputation Rank” boost for the subject area associated with that page?
Filter Bubbles, Google Ground Truth and Twitter EchoChambers
As the focus for this week’s episode [airs live Tues 21/6/11 at 19.32 UK time, or catch it via the podcast] in the OU co-produced season of programmes on openness with Click (radio), the BBC World Service radio programme formerly known as Digital Planet, we’re looking at one or two notions of diversity…
If you’re a follower of pop technology, over the last week or two you will probably have already come across Eli Pariser’s new book, The Filter Bubble: What The Internet Is Hiding From You, or his TED Talk on the subject:
Eli Pariser, :The Filter Bubble”, TED Talks
It could be argued that this is the Filter Bubble in action… how likely is it, for example, that a randomly selected person on the street would have heard of this book?
To support the programme, presenter Gareth Mitchell has been running an informal experiment on the programmes Facebook page: Help us with our web personalisation experiment!! The idea? To see what effect changing personalisation settings on Google has on a Google search for the word “Platform”. (You can see results of the experiment from Click listeners around the world on the Facebook group wall… Maybe you’d like to contribute too?)
It might surprise you to learn that Google results pages – even for the same search word – do not necessarily always give the same results, something I’ve jokingly referred to previously as “the end of Google Ground Truth”, but is there maybe a benefit to having very specifically focussed web searches (that is, very specific filter bubbles)? I think in certain circumstances there may well be…
Take education, or research, for example. Sometimes, we want to get the right answer to a particular question. In times gone by, we might have asked a librarian for help, if not to such a particular book or reference source, at least to help us find one that might be appropriate for our needs. Nowadays, it’s often easier to turn to a web search engine than it is to find a librarian, but there are risks in doing that: after all, no-one really knows what secret sauce is used in the Google search ranking algorithm that determines which results get placed where in response to a particular search request. The results we get may be diverse in the sense that they are ranked in part by the behaviour of millions of other search engine users, but from that diversity do we just get – noise?
As part of the web personalisation/search experiment, we found that for many people, the effects of changing personalisation settings had no noticeable effect on the first page of results returned for a search on the word “platform”. But for some people, there were differences… From my own experience of making dozens of technology (and Formula One!) related searches a day, the results I get back for those topics hen I’m logged in to Google are very different to when I have disabled the personalised reslults. As far as my job goes, I have a supercharged version of Google that is tuned to return particular sorts of results – code snippets, results from sources I trust, and so on. In certain respects, the filter bubble is akin to my own personal librarian. In this particular case, the filter bubble (I believe), works to my benefit.
Indeed, I’ve even wondered before whether a “trained” Google account might actually be a valuable commodity: Could Librarians Be Influential Friends? And Who Owns Your Search Persona?. Being able to be an effective searcher requires several skills, including the phrasing of the search query itself, the ability to skim results and look for signals that suggest a result is reliable, and the ability to refine queries. (For a quick – and free – mini-course on how to improve your searching, check out the OU Library’s Safari course.) But I think it will increasingly rely on personalisation features…which means you need to have some idea about how the personalisation works in order to make the most of its benefits and mitigate the risks.
To take a silly example: if Google search results are in part influenced by the links you or your friends share on Twitter, and you follow hundreds of spam accounts, you might rightly expect your Google results to be filled with spam (because your friends have recommended them, and you trust your friends, right? That’s one of the key principles of why social search is deemed to be attractive.)
As well as the content we discover through search engines, content discovered through social networks is becoming of increasing importance. Something I’ve been looking at for some time is the structure of social networks on Twitter, in part as a “self-reflection” tool to help us see where we might be situated in a professional social sense based on the people we follow and who follow us. Of course, this can sometimes lead to incestuous behaviour, where the only people talking about a subject are people who know each other.
For example, when I looked at the connection of people chatting on twitter about Adam Curtis’ All Watched Over By Machines of Loving Grace documentary, I was surpised to see it defined a large part of the UK’s “technology scene” that I am familiar with from my own echochamber…
So what do I mean by echochamber? In the case of Twitter, I take it to refer to a group of people chatting around a topic (as for example, identified by a hashtag) who are tightly connected in a social sense because they all follow one another anyway… (To see an example of this, for a previous OU/Click episode, I posted a simple application (it’s still there), to show the extent to which people who had recently used the #bbcClickRadio hashtag on Twitter were connected.)
As far as diversity goes, if you follow people who only follow each other, then it might be that the only ideas you come across are ideas that keep getting recycled by the same few people… Or it might be the case that a highly connected group of people shows a well defined special interest group on a particular topic….
To get a feel for what we can learn about our own filter bubbles in Twitterspace, I had a quick look at Gareth Mitchell’s context (@garethm on Twitter). One of the dangers of using public apps is that anyone can do this sort of analysis of course, but the ethics around my using Gareth as a guinea pig in this example is maybe the topic of another programme…!
So, to start with, let’s see how tightly connected Gareth’s Twitter friends are (that is, to what extent do the people Gareth follows on Twitter follow each other?):
The social graph showing how @garethm’s friends follow each other
The nodes represent people Gareth follows, and they have been organised into coloured groups based on a social network analysis measure that tries to identify groups of tightly interconnected individuals. The nodes are sized according to a metric known as “Authority”, which reflects the extent to which people are followed by other members of the network.
A crude first glance at the graph suggests a technology (purple) and science (fluorine-y yellowy green) cluster to me, but Gareth might be able to label those groups differently.
Something else I’ve started to explore is the extent to which other people might see us on Twitter. One way of doing this is to look at who follows you; another is to have a peek at what lists you’ve been included on, along with who else is on those lists. Here’s a snapshot of some of the lists (that actually have subscribers!) that Gareth is listed on:
The flowers are separate lists. People who are on several lists are caught on the spiderweb threads connecting the list flowers… In a sense, the lists are filter bubbles defined by other people into which Gareth has been placed. To the left in the image above, we see there are a few lists that appear to share quite a few members: convergent filters?!
In order to try to looking outside these filter bubbles, we can get an overview of the people that Gareth’s friends follow that Gareth doesn’t follow (these are the people Gareth is likely to encounter via retweets from his friends):

Who @garethm’s friends follow that @garethm doesn’t follow…
My original inspiration for this was to see whether or not this group of people would make sense as recommendations for who to follow, but if we look at the most highly followed people, we see this may not actually make sense (unless you want to follow celebrities!;-)

Popular friends of Gareth’s that he doesn’t follow…
By way of a passing observation, it’s also worth noting that the approach I have taken to constructing the “my friends friends who aren’t my friends” graph tends to place “me” at the centre of the universe, surrounded by folk who are a just a a friend of a friend away…
For extended interviews and additional material relating to the OU/Click series on openness, make sure you visit Click (#bbcClickRadio) on OpenLearn.
Twitter Makes a Move Towards Social Search… Time for some Twitter Gardening?
From the number of tweets that are starting to appear in my Google search results, it’s maybe surprising that Twitter’s own search offering has never really been the subject of much attention. A recent update sees the introduction of personalisation into the Twitter search experience, as described on the Twitter Engineering blog: The Engineering Behind Twitter’s New Search Experience.
A couple of things that jumped out at me from that report:
To support relevance filtering and personalization, we needed three types of signals:
Static signals, added at indexing time
Resonance signals, dynamically updated over time
Information about the searcher, provided at search time
…
At query time, a Blender server parses the user’s query and passes it along with the user’s social graph to multiple Earlybird servers. These servers use a specialized ranking function that combines relevance signals and the social graph to compute a personalized relevance score for each Tweet. The highest-ranking, most-recent Tweets are returned to the Blender, which merges and re-ranks the results before returning them to the user.
…
Twitter is most powerful when you personalize it by choosing interesting accounts to follow, so why shouldn’t your search results be more personalized too? They are now! Our ranking function accesses the social graph and uses knowledge about the relationship between the searcher and the author of a Tweet during ranking. Although the social graph is very large, we compress the meaningful part for each user into a Bloom filter, which gives us space-efficient constant-time set membership operations. As Earlybird scans candidate search results, it uses the presence of the Tweet’s author in the user’s social graph as a relevance signal in its ranking function.
I don’t know what the social graph includes, but if you’re an indiscriminate follower of folk on the one hand, and/or you don’t curate your followers to any significant extent (for example, blocking spambots, and not doing your twitter gardening), then your personalised search results may not be as highly tuned as they might be… (Although on the other hand, maybe the diversity of search results that might result from a very, err, diverse follower network is a Good Thing? (The tension between diversity and relevance in search results was something we were chatting over yesterday as preparation for the next OU/BBC co-produced episode of Click (BBC World Service radio)
See also: Brand Association and Your Twitter Followers, Could Librarians Be Influential Friends? And Who Owns Your Search Persona?
PS Here’s another handy tool in a search curation context that I don’t think I’ve blogged about before: trunk.ly (search over links you’ve tweeted, posted to delicious, shared on Facebook etc).
Google Correlate: What Search Terms Does Your Time Series Data Correlate With?
Just a few days over three years ago, I blogged about a site I’d put together to try to crowdsource observations about correlated searchtrends: TrendSpotting.
One thing that particularly interested me then, as it still does now, was the way that certain search trends they reveal rhythmic behaviour over the course of weeks, months or years.
At the start of this year, I revisited the topic with a post on Identifying Periodic Google Trends, Part 1: Autocorrelation (followd by Improving Autocorrelation Calculations on Google Trends Data).
Anyway today it seems that Google has cracked the scaling issues with discovering correlations between search trends (using North American search trend data), as well as opening up a service that will identify what search trends correlate most closely with your own uploaded time series data: Correlate (announcement: Mining patterns in search data with Google Correlate)
For the quick overview, check out the Google Correlate Comic.
So what’s on offer? First, enter a search term and see what it’s correlated with:
As well as the line chart, correlations can also be plotted as a scatterplot:
You can also run “spatial correlations”, though at the moment this appears to be limited to US states. (I *think* this works by looking for search terms that are popular in the requested areas and not popular in the other listed areas. To generalise this, I guess you need three things: the total list of areas that work for the spatial correlation query; the areas you want the search volume for the “to be discovered correlated phrase” to be high; the areas you want to the search volume for the “to be discovered correlated phrase” to be low?)
At this point it’s maybe worth remembering that correlation does not imply causation…
A couple of other interesting things to note: firstly, you can offset the data (so shift it a few weeks forwards or backwards in time, as you might do if you were looking for lead/lag behaviour); secondly, you can export/download the data.
You can also upload your own data to see what terms correlate with it:
(I wonder if they’ll start offering time series analysis features on uploaded, as well as other trend data, too? For example, frequency analysis or trend analysis? This is presumably going on in the background (though I haven’t read the white paper [PDF] yet…)
As if that’s not enough, you can also draw a curve/trendline and then see what correlates with it (so this a weak alternative to uploading your own data, right? Just draw something that looks like it… (h/t to Mike Ellis for first point this out to me).
I’m not convinced that search trends map literally onto the well known “hype cycle” curve, but I thought I’d try out a hype cycle reminiscent curve where the hype was a couple of years ago, and we’re now maybe seeing start to reach mainstream maturity, with maybe the first inklings of a plateau…
Hmmm… the pr0n industry is often identified as a predictor of certain sorts of technology adoption… maybe the 5ex searchers are too?! (Note that correlated hand-drawn charts are linkable).
So – that’s Google Correlate; nifty, eh?
PS Here’s another reason why I blog… my blog history helps me work out how far i the future I live;-) So currently between about three years in the future.. how about you?!;-)
PPS I can imagine Google’s ThinkInsights (insight marketing) loving the thought that folk are going to check out their time series data against Google Trends so the Goog can weave that into it’s offerings… A few additional thoughts leading on from that: 1) when will correlations start to appear in Google AdWords support tools to help you pick adwords based on your typical web traffic patterns or even sales patterns? 2) how far are we off seeing a Google Insights box to complement the Google Search Appliances, that will let you run correlations – as well as Google Prediction type services – onsite without feeling as if you have to upload your data to Google’s servers, and instead, becoming part of Google’s out-kit-in-your-racks offering; 3) when is Google going to start buying up companies like Prism and will it then maybe go after the likes of Experian and Dunnhumby to become a company that organises information about the world of people, as well as just the world’s information…?!)
PPPS Seems like as well as “traditional” link sharing offerings, you can share the link via your Google Reader account…

Interesting…
Googling the Future – from the Present and the Past
An XKCD cartoon today described Googling the future using search terms such as “in <year>” and “by <year>”:
So I tried it:

Hmm – results from the future?
So I had a play in Google News… could this be a good way of searching forecasts?

By searching the past, we can search for old forecasts of the future…

I leave it as an exercise for the reader to search results from 2006, 2001, and 1991 for the 5, 10 and 20 years forecasts respectively for this year… let me know in the comments if anything interesting turns up;-)
See also: Google Impact…? The “Google Suggest” Factor
PS ANd this: Quantifying the Advantage of Looking Forward, which looks for different countries at the ratio of searches for year+1 and year-1 over the course of a year, then plots the resulting quotient against GDP. The results appear to suggest that there is a correlation between GDP and the forward looking tendency of the population. (But is this right? Do the search volumes get normalised (on Google Trends) by the volume of the first term at the start of the trend period? If the user numbers are growing over the course of the year, might we be skewing the future looking component because of loaded terms at the end of the year?)
OER Hack Day: UK Universities Degree Course Prospectus Search – Course Detective
If you want to search across all university prospectuses, what do you do? Suffer, that’s what…
Until now…

Course Detective: design concept
At the CETIS/DevCSI hackday, a group of undevelopers* came together to pull together a Google Custom Search Engine that would search over all the undergraduate prospectus pages on all the UK university websites.
If you want to try it out, there’s a basic version running at: CourseDetective.co.uk
The first thing we had to do was grab a list of HEIs – @dkernohan grabbed one off the HEFCE website (I think? Do you have a link to the one you grabbed, DK?) and popped it into a Google Spreadsheet so we could all work on it.
A quick first pass meant Googling each university (e.g. for Undergraduate course foo university) and then finding the prospectus home page. A bit of digging then took us to an example of an actual course page. The intention was to find as deep a path as possible into the website that would still return individual pages for each course.
The number of URL patterns is, unsurprisingly, as large as the number of institutions, with few commonalities. For example, here are the first five (alphabetical order):
http://www.anglia.ac.uk/ruskin/en/home/prospectus/
http://www1.aston.ac.uk/study/undergraduate/courses/
http://www.bathspa.ac.uk/courses/undergraduate/
http://www.bath.ac.uk/study/ug/courses/
http://www.beds.ac.uk/courses
Some URLs embed the year of entry:
http://www.bbk.ac.uk/study/ug2011/
http://www2.hull.ac.uk/ug/11
Sometimes course identifiers appear as variables:
http://www.northumbria.ac.uk/?view=CourseDetail&code=*
http://www.nottingham.ac.uk/ugstudy/course.php?inc=course&code=*
And so on…
Having got paths into the prospectus, we used them to define a custom search engine. The first pass was to paste the links directly into the search engine definition wizard. (We maybe need to check we’ve done this correctly: we actually have two link types – one where we do “URL contains”, such as http://www.beds.ac.uk/courses, the other where we should be checking against a pattern e.g. http://www.nottingham.ac.uk/ugstudy/course.php?inc=course&code=*. I wonder, how would we cope with capturing both ?inc=course&code=* and ?&code=*&inc=course)
Something we did notice that is a *huge* problem with Google Custom Search engines is that if you collaborate with other people in populating the same CSE, you can only get a view over the links added by one person at a time. So I could look at the links David had added, and he could look at the links I had added, but we couldn’t look at all the links at the same time:-(
The next step was to generate a CSE definition file from the imported links, so that we could (in theory) start to craft a machine generated CSE definition file (see Transitioning a CSE). At least one copy of the CSE file is available on the coursedetective github site (look for the annotations.xml file).
To host the site, my first thought was to use Blogger – but this was a bit limiting in terms of possible site design – and secondly to use Google sites. However, Google sites seems to strip out the embedding that the Google custom search engine wizard generates, so instead we opted for Google App Engine using this template. (It would be really helpful if Google Sites provided a trivial way of embedding a Google custom search engine in a sites page…?)
To make the hack relevant to the OERhackday, David added some course OER links and an OER category to the search engine that would allow users to (ideally) locate topic related OERs. The longer term vision is that users should be able to discover courses via OERs, and also check out OERs associated with a course as part of the “what course should I do?” research process.
To enrich the search results further, we also started to collate the URLs for the official institutional Youtube pages so we could search into those videos as well as courses prospectus pages and OERs. I’m not sure if Youtube videos can be previewed in CSE results listings, but it’s something to explore…;-)
On the design side, we didn’t manage to get any CSS out, but James and Joel did come up with a stylish design, as you’ve seen above:-)
In terms of usage, the site is currently unstyled, but it is functional. The results can also be accessed via API calls (the current CSE ID is 006974165492396950327:xvnuayaygic). For universities wanting to compare “Google” searches against their online prospectuses and those of other HEIs, CourseDetective might be an appropriate tool for the SEO toolbox?
It struck me just now that by driving the CSE from a linked file, we could actually define multiple linked definition files for different flavours of website, for example boosting or suppressing course results according to user preferences (for example, geography, or other properties we can associate with or derive from, the course prospectus root URL or common search terms.)
A couple of other things I’d like to be able to do: search for foundation degrees; search for part-time degrees; search for distance education degrees; search for postgraduate taught courses.
I also managed to waste a bit of time (i.e. I still haven’t found a workaround) on the analytics side. What I wanted to do was use the AJAX version of the CSE and then use Google Analytics event tracking to track:
- queries;
- which results were clicked on.
Again, it would be really helpful if the Google Custom Search Engine and Google Analytics folk had a little sit down together to work out how to do at least the first of these, if not the second (they might be protective of folk knowing which links get clicked on in the CSE results? Although that’s not to say that someone else might not come up with a solution… Please feel free to let me know if you have just such a fix in the comments;-)
In terms of time and effort, I reckon it took about 6 person hours to collate the links. If anyone fancies helping develop the site further, I think we’re up for that… :-)
* i.e. folk who aren’t developers but aspire to doing developery things howsover they can;-) The team included: me, David Kernohan, designers James Roscoe and Joel Reed, Shelagh Finlay and Tracey Murray.
eSTEeM Project: Custom Course Search Engines
Preamble
If the desire for OU courses to make increased use of third party materials and open educational resources is realised, we are likely to see a shift in the pedagogy to one that is more resource based. This project seeks to explore the extent to which custom search engines tuned to particular courses may be used to support the discovery of appropriate resources published on the public web, and as indexed by Google, on any given course.Many courses now include links to third party resources that have been published on the public web. Discovering appropriate resources in terms of relevance and quality can be a time consuming affair. The Google Custom Search Engine service allows users to define custom search engines (CSEs) that search over a limited set of domains or web pages, rather than the whole web.
(Topic based links can be discovered in a wide variety of places. For example, it is possible to create custom search engines based around the homepages of people added to a Twitter list, or the nominated blogs in annual award listings.)
The ranking of particular resources may also be boosted in the definition of the CSE via a custom ranking configuration. For example, open educational resources published in support of the course may be boosted in the search result rankings.
Alternatively, CSEs may be used to exclude results from particular domains, or return resources from the whole web with the ranking of results from specified pages or domains boosted as required. By opening up results to the whole of the web, if recent, relevant resources from an unspecified domain are identified in response to a particular search query, they stand a chance of being presented to the user in the results listing.
Synonyms for common terms may also be explicitly declared and refinement labels used to offer facet based search limits. This might be used to limit results to resources identified as particularly relevant for a particular unit, or block within a course, for example, or to particular topic areas spread across a course.
“Promoted” results may also be used to emphasise particular results in response to particular queries. A good example here might be to display promoted results relating to resources explicitly referenced in an exercise, assignment or activity.
If any of the indexed pages are marked up with structured data, it may be possible to expose this data using an rich snippet/enhanced search listing. Whilst there are few examples to date, enhanced listings that display document types or media types might be appropriate.
Examples of Google CSEs in action can be found here:
- Digital Worlds Cusotm Search Engine (created by hand; as used in T151).
- faceted “HE CSE” metasearch engine over UK Higher Education Library websites, UK Parliamentary pages, OERs, video protocols for science experiments. This example demonstrates how the search engine may be embedded in a web page.
The Project
The project proposes the automated generation of custom search engines on a per course basis based on the resources linked to from any given course.The deliverables will be:
1) an automated way of generating Google CSE definition files through link scraping of Structured Authoring/XML versions of online course materials. If necessary, additional scraping of non-SA, VLE published resources may be required.
2) a resource template page and/or widget in the VLE providing access to the customised course search engine
Success will be based on the extent to which:
1) students on pilot courses use the search engine;
2) a survey of students on courses using the search engine about how useful they found itSearch engine metrics will also form part of the reporting chain. If appropriate, we will also explore the extent to which search engine analytics can be used to enhance the performance of the search engine (for example, by tuning custom ranking configurations), as well offering “recent searches” information to students.
The placement of the search box for the CSE will be an important factor and any evaluation should take this into account, e.g. through A/B testing on course web pages.
Another variable relating to the extent to which a CSE is used by students is whether the CSE performs a whole web search with declared resources prioritised, or whether it just searches over declared resources. Again, an A/B test may be appropriate.
For activities that include a resource discovery component, it would be interesting to explore what effect embedding the search engine with the activity description page might have?
If course team members on any OU courses presenting over the next 9 months are interested in trying out a course based custom search engine, please get in touch. If academics on courses outside the OU would like to discuss the creation and use of course search engines for use on their own courses, I’d love to hear from you too:-)
eSTEeM is joint initiative between the Open University’s Faculty of Science and Faculty of Maths, Computing and Technology to develop new approaches to teaching and learning both within existing and new programmes.
Google Social search results – but only on Google.com for now?
Last week, Google announced a series of updates to their social search feature that highlights links that have in some sense been “shared” by members of your social circle (An update to Google Social Search):

I’ve been wondering why I couldn’t see the social results whenever I ran a search, but today found out why: they only appear in results pages from google.com…
… and I typically use google.co.uk….
Google Books Library Shelves
It’s been some time since I last had a look at the “My Library” service in Google Books, but with the announcement of Google eBooks store (currently US only, except for out-of-copyright free downloads) I popped over to my Google Books account to see whether anything else had changed…
One of the little known (I think?) features of Google Books is the “My Library” personalisation which allows you to create a collection of books and search over them. Searching your library finds all the books in your library collection that contain the search phrase; if a preview of the book is available returns deep links into the book to the point(s) at which the search terms appear:
I’ve previously commented on the My Library aspect of Google Books in the context of its possible use by libraries for providing a full-text search option over books in their collection (e.g. Complementing the OPAC With a Full Text Search Book Catalogue where I describe the use of the service by Wiltshire Heritage Library (example) and the Penn State University Press booksearch (example)).
(At the moment I don’t think you can get statistics back on the searches carried out on a My Library profile, though Google books can do stats for publishers e.g. Google Books for Publishers).
Anyway – one of the problems I originally had with My Library was that you could only maintain a single collection. But it seems that it’s now possible to create separate collections by tagging books in your Library onto “shelves”:
(Shelves appeared at the start of 2010, it seems: Updated Books Home Page and My Library.)
So what immediately comes to mind is that if you’re running several courses, you could add the books used in the course to a My Library shelf, and then publish a link to a search context for that shelf to give a full text searchable version of the books on the list (assuming they’ve been scanned by the Goog, of course). Where previews are available, deep links into books will be available as part of the search results.
I haven’t really populated any shelves yet, but here’s the idea:
I haven’t explored the Book Search Data API yet, bit it does seem to offer the ability to search over a particular user’s public library, as well as retrieve lists of books from the library. API options also exist for adding books to a library, though the API seems to only support adding labels, rather than updating shelves (or maybe legacy handlers map labelled books onto shelves?). With a bit of digging, it might be possible to find a route to automate the creation of a library shelf from a list of books. (Hmmm, maybe I should try this with the OU Set books list?!;-)
Google Books shelves thus seem to provide a way of creating different lists of books within a single user library, although I’m not sure if there is a limit on the number of books contained within a shelf, or in the library as a whole. Another nice feature is that it’s possible to select a shelf based filter to just display books from a similar shelf (click on the label in the left-hand sidebar to filter by shelf); this search facet also seems to be passed through to a bookmarkable URL for the filtered search via the as_coll argument (I think?). (Which is to say: you can share a link for a search within a particular shelf in a particular user’s library.)
I’m not sure if Google Books is available through Google Apps for Education, but it could be a useful component of a full text book search context around books on a reading list?
PS As Google Scholar appears to be improving its coverage, it strikes me that the Goog still doesn’t offer a Google service for building searchable reference lists, although it does let you customise the addition of links that will bookmark a reference to a service for you:
Here’s how the links are displayed:
Given you can build weblink search contexts using Google custom search engines, full text book search contexts using the Books My Library service, search over content from bundled feeds in Google Reader and even run things like video search by user on Youtube*, the Goog must surely be looking to offer a collection building and searching over service for Google Scholar? So I wonder… could Google end up taking over a service like CiteULike or Mendeley to complement and bootstrap personalisation of their Google Scholar offering? Or would they just build their own (cut down) version of these services?
* Hmm… I wonder if there’s a Youtube API switch that lets you search playlists? It’s definitely possible to get a playlist feed out…
PPS the Goog is also lacking a way of exposing all these personal search contexts to a logged in user through the same interface. If it were down to me, I’d start to expose them in the left hand sidebar of Google websearch, so I’m guessing this will be a labs/experimental service in the new year, if it isn’t already so…
…maybe…?;-)





















