Posts Tagged ‘google cse’
Search Engine Powered Courses…
How can we use customised search engines to support uncourses, or the course models used to support MOOC style offerings?
To set the scene, here’s what Stephen Downes wrote recently on the topic of How to partcipate in a MOOC:
You will notice quickly that there is far too much information being posted in the course for any one person to consume. We tried to start slowly with just a few resources, but it quickly turns into a deluge.
You will be provided with summaries and links to dozens, maybe hundreds, maybe even thousands of web posts, articles from journals and magazines, videos and lectures, audio recordings, live online sessions, discussion groups, and more. Very quickly, you may feel overwhelmed.
Don’t let it intimidate you. Think of it as being like a grocery store or marketplace. Nobody is expected to sample and try everything. Rather, the purpose is to provide a wide selection to allow you to pick and choose what’s of interest to you.
This is an important part of the connectivist model being used in this course. The idea is that there is no one central curriculum that every person follows. The learning takes place through the interaction with resources and course participants, not through memorizing content. By selecting your own materials, you create your own unique perspective on the subject matter.
It is the interaction between these unique perspectives that makes a connectivist course interesting. Each person brings something new to the conversation. So you learn by interacting rather than by mertely consuming.
When I put together the the OU course T151, the original vision revolved around a couple of principles:
1) the course would be built in part around materials produced in public as part of the Digital Worlds uncourse;
2) each week’s offering would follow a similar model: one or two topic explorations, plus an activity and forum discussion time.
In addition, the topic explorations would have a standard format: scene setting, and maybe a teaser question with answer reveal or call to action in the forums; a set of topic exploration questions to frame the topic exploration; a set of resources related to the topic at hand, organised by type (academic readings (via a libezproxy link for subscription content so no downstream logins are required to access the content), Digital Worlds resources, weblinks (industry or well informed blogs, news sites etc), audio and video resources); and a reflective essay by the instructor exploring some of the themes raised in the questions and referring to some of the resources. The aim of the reflective essay was to model the sort of exploration or investigation the student might engage in.
(I’d probably just have a mixed bag of resources listed now, along with a faceting option to focus in on readings, videos, etc.)
The idea behind designing the course in this way was that it would be componentised as much as possible, to allow flexibility in swapping resources or even topics in and out, as well as (though we never managed this), allowing the freedom to study the topics in an arbitrary order. Note: I realised today that to make the materials more easily maintainable, a set of ‘Recent links’ might be identified that weren’t referred to in the ‘My Reflections’ response. That is, they could be completely free standing, and would have no side effects if replaced.
As far as the provision of linked resources went, the original model was that the links should be fed into the course materials from an instructor maintained bookmark collection (for an early take on this, see Managing Bookmarks, with a proof of concept demo at CourseLinks Demo (Hmmm, everything except the dynamic link injection appears to have rotted:-().
The design of the questions/resources page was intended to have the scoping questions at the top of the page, and then the suggested resources presented in a style reminiscent of a search engine results listing, the idea being that we would present the students with too many resources for them to comfortably read in the allocated time, so that they would have to explore the resources from their own perspective (eg given their current level of understanding/knowledge, their personal interests, and so on). In one of my more radical moments, I suggested that the resources would actually be pulled in from a curated/custom search engine ‘live’, according to search terms specially selected around the current topic and framing questions, but I was overruled on that. However, the course does have a Google custom search engine associated with it which searches over materials that are linked to from the course.
So that’s the context…
Where I’m at now is pondering how we can use an enhanced custom search engine as a delivery platform for a resource based uncourse. So here’s my first thought: using a Google Custom Search Engine populated with curated resources in a particular area, can we use Google CSE Promotions to help scaffold a topic exploration?
Here’s my first promotions file:
<Promotions>
<Promotion id="t151_1a"
queries="topic 1a, Topic 1A, topic exploration 1a, topic exploration 1A, topic 1A, what is a game, game definition"
title="T151 Topic Exploration 1A - So what is a game?"
url="http://digitalworlds.wordpress.com/2008/03/05/so-what-is-a-game/"
description="The aim of this topic is to think about what makes a game a game. Spend a minute or two to come up with your own definition. If you're stuck, read through the Digital Worlds post 'So what is a game?'"
image_url="http://kmi.open.ac.uk/images/ou-logo.gif" />
</Promotions>
It’s running on the Digital Worlds Search Engine, so if you want to try it out, try entering the search phrase what is a game or game definition.
(This example suggests to me that it would also make sense to use result boosting to boost the key readings/suggested resources I proposed in the topic materials so that they appear nearer the top of the results (that’ll be the focus of a future post;-))
The promotion displays at the top of the results listing if the specified queries match the search terms the user enters. My initial feeling is that to bootstrap the process, we need to handle:
- queries that allow a user to call on a starting point for a topic exploration by specifically identifying that topic;
- “naive queries”: one reason for using the resource-search model is to try to help students develop effective information skills relating to search. Promotions (and result boosting) allow us to pick up on anticipated naive queries (or popular queries identified from search logs), and suggest a starting point for a sensible way in to the topic. Alternatively, they could be used to offer suggestions for improved or refined searches, or search strategy hints. (I’m reminded of Dave Pattern’s work with guided searches/keyword refinements in the University of Huddersfield Library catalogue in this context).
Here’s another example using the same promotion, but on a different search term:
Of course, we could also start to turn the search engine into something like an adventure game engine. So for example, if we type: start or about, we might get something like:
(The link I associated with start should really point to the course introduction page in the VLE…)
We can also use the search context to provide pastoral or study skills support:
These sort of promotions/enhancements might be produced centrally and rolled out across course search engines, leaving the course and discipline related customisations to the course team and associated subject librarians.
Just a final note: ignoring resource limitations on Google CSEs for a moment, we might imagine the following scenarios for their role out:
1) course wide: bespoke CSEs are commissioned for each course, although they may be supplemented by generic enhancements (eg relating to study skills);
2) qualification based: the CSE is defined at the qualification level, and students call on particular course enhancements by prefacing the search with the course code; it might be that students also see a personalised view of the qualification CSE that is tuned to their current year of study.
3) university wide: the CSE is defined at the university level, and students students call on particular course or qualification level enhancements by prefacing the search with the course or qualification code.
Tweaking Ranking Factors in the Course Detective Custom Search Engine
This is a note-to-self as much as anything, relating to the Course Detective custom search engine that searches over UK HE course prospectus web pages about the extent to which we might be able to use data such as the student satisfaction survey results (as made available via Unistats) to boost search results around particular subjects in line with student satisfaction ratings, or employment prospects, for particular universities?
It’s possible to tweak rankings in Google CSEs in a variety of ways. On the one hand, we can BOOST (improve the ranking), FILTER (limit results to members of a given set) or ELIMINATE (exclude) sites appearing in the search results listing. In the simplest case, we assign a BOOST, FILTER or ELIMINATE weight to a label, and then apply labels to annotations so that they benefit from the corresponding customisation. We can further refine the effect of the modification by applying a score to each annotation. The product of score and weight values determines the overall ranking modification that is applied to each result for a label applied to an annotation.
So here’s what I’m thinking:
- define labels for things like achievement or satisfaction that apply a boost to a result;
- allow uses to apply a label to a search;
- for each university annotation (that is, the listing that identifies the path to the pages for a particular university’s online prospectus), add a label with a score modifier determined by the achievement or satisfaction value, for example, for that institution;
- for refinement labels that tweak search rankings within a particular subject area, define labels corresponding to those subject areas and apply score modifiers to each institution based on, for example, the satisfaction level with that subject area. (Note: I’m not sure if the same path can have several different annotations provided to it with different scores?
For example, an annotation file typically contains a fragment that looks like:
<Annotations>
<Annotation about="webcast.berkeley.edu/*" score="1">
<Label name="university_boost_highest"/>
<Label name="lectures"/>
</Annotation>
<Annotation about="www.youtube.com/ucberkeley/*" score="1">
<Label name="university_boost_highest"/>
<Label name="videos_boost_mid"/>
<Label name="lectures"/>
</Annotation>
</Annotations>
I don’t know if this would work:
<Annotations>
<Annotation about="example.com/prospectus/*" score="1">
<Label name="chemistry"/>
</Annotation>
<Annotations>
<Annotation about="example.com/prospectus/*" score="0.5">
<Label name="physics"/>
</Annotation>
</Annotations>
That said, if the URLs are nicely structured, we might be able to do something like:
<Annotations>
<Annotation about="example.com/prospectus/chemistry/*" score="1">
<Label name="chemistry"/>
</Annotation>
<Annotations>
<Annotation about="example.com/prospectus/physics/*" score="0.5">
<Label name="physics"/>
</Annotation>
</Annotations>
albeit at the cost of having to do a lot more work in terms of identifying appropriate URI paths.
I also need to start thinking a bit more about how to apply refinements and ranking adjustments in course based CSEs.
eSTEeM Project: Custom Course Search Engines
Preamble
If the desire for OU courses to make increased use of third party materials and open educational resources is realised, we are likely to see a shift in the pedagogy to one that is more resource based. This project seeks to explore the extent to which custom search engines tuned to particular courses may be used to support the discovery of appropriate resources published on the public web, and as indexed by Google, on any given course.Many courses now include links to third party resources that have been published on the public web. Discovering appropriate resources in terms of relevance and quality can be a time consuming affair. The Google Custom Search Engine service allows users to define custom search engines (CSEs) that search over a limited set of domains or web pages, rather than the whole web.
(Topic based links can be discovered in a wide variety of places. For example, it is possible to create custom search engines based around the homepages of people added to a Twitter list, or the nominated blogs in annual award listings.)
The ranking of particular resources may also be boosted in the definition of the CSE via a custom ranking configuration. For example, open educational resources published in support of the course may be boosted in the search result rankings.
Alternatively, CSEs may be used to exclude results from particular domains, or return resources from the whole web with the ranking of results from specified pages or domains boosted as required. By opening up results to the whole of the web, if recent, relevant resources from an unspecified domain are identified in response to a particular search query, they stand a chance of being presented to the user in the results listing.
Synonyms for common terms may also be explicitly declared and refinement labels used to offer facet based search limits. This might be used to limit results to resources identified as particularly relevant for a particular unit, or block within a course, for example, or to particular topic areas spread across a course.
“Promoted” results may also be used to emphasise particular results in response to particular queries. A good example here might be to display promoted results relating to resources explicitly referenced in an exercise, assignment or activity.
If any of the indexed pages are marked up with structured data, it may be possible to expose this data using an rich snippet/enhanced search listing. Whilst there are few examples to date, enhanced listings that display document types or media types might be appropriate.
Examples of Google CSEs in action can be found here:
- Digital Worlds Cusotm Search Engine (created by hand; as used in T151).
- faceted “HE CSE” metasearch engine over UK Higher Education Library websites, UK Parliamentary pages, OERs, video protocols for science experiments. This example demonstrates how the search engine may be embedded in a web page.
The Project
The project proposes the automated generation of custom search engines on a per course basis based on the resources linked to from any given course.The deliverables will be:
1) an automated way of generating Google CSE definition files through link scraping of Structured Authoring/XML versions of online course materials. If necessary, additional scraping of non-SA, VLE published resources may be required.
2) a resource template page and/or widget in the VLE providing access to the customised course search engine
Success will be based on the extent to which:
1) students on pilot courses use the search engine;
2) a survey of students on courses using the search engine about how useful they found itSearch engine metrics will also form part of the reporting chain. If appropriate, we will also explore the extent to which search engine analytics can be used to enhance the performance of the search engine (for example, by tuning custom ranking configurations), as well offering “recent searches” information to students.
The placement of the search box for the CSE will be an important factor and any evaluation should take this into account, e.g. through A/B testing on course web pages.
Another variable relating to the extent to which a CSE is used by students is whether the CSE performs a whole web search with declared resources prioritised, or whether it just searches over declared resources. Again, an A/B test may be appropriate.
For activities that include a resource discovery component, it would be interesting to explore what effect embedding the search engine with the activity description page might have?
If course team members on any OU courses presenting over the next 9 months are interested in trying out a course based custom search engine, please get in touch. If academics on courses outside the OU would like to discuss the creation and use of course search engines for use on their own courses, I’d love to hear from you too:-)
eSTEeM is joint initiative between the Open University’s Faculty of Science and Faculty of Maths, Computing and Technology to develop new approaches to teaching and learning both within existing and new programmes.
Initial Thoughts on Profiling @dirdigeng’s Friends Network on Twitter
Last week, Andrew Stott, Director of Digital Engagement in the Cabinet Office, announced his retirement date over Twitter:
At the time of writing, @dirdigeng follows slightly over two thousand folk on Twitter, so I thought I’d have a quick look at who the “players” are…
The network described is constructed as follows:
- nodes represent the people followed by @dirdigeng on Twitter;
- a directed edge from A to B means that A is following B.
In the first view (randomly layed out, using Gephi), we plot node size as linearly proportional to the number of dirdigeng’s friends who are following each of the other friends (that is, the in-degree of each node), and colour proportional to their total number of followers (including people not friended by @dirdigeng).
The colour mapping is non-linear – @Number10gov, @guardiantach and @mashable have significantly more followers that the other nodes – and is set via the spline control:
If we run the betweenness centrality statistic, and size nodes accordingly, we can see how the various parts of the network may be connected. (“Betweenness centrality is a measure based on the number of shortest paths between any two nodes that pass through a particular node. Nodes around the edge of the network would typically have a low betweenness centrality. A high betweenness centrality might suggest that the individual is connecting various different parts of the network together.”)
We can also run the modularity class statistic to try to partition the friends into small networks with a high degree of internal connections. Here’s what we get (click through on the image to see it in more detail):
Modularity groups help us understand the structure of the network in a bit more detail. I’ve started to think they might also be used to automatically generate a seeding set of people who form a highly interconnected community with an interest in a particular topic and from a particular stance.
As well as looking at the structure of the network, we can also create a search engine over the home pages declared in the Twitter bios of @dirdigeng’s friends. My thinking here is that this might provide a useful constrained search engine over sites engaged in social media and with an interest in “Digital Britain”.
The simplest custem search engine simply uses the URLs from the Twitter bios of folk followedd by @dirdigeng and adds them to a “Digital Britain” Google Custom search engine. However, one attractive feature of the Google CSEs is that you can also tweak the rankings by weighting results from different domains differently to give a “weighted” custom search engine.
As a quick experiment, I produced one weighted search engine where I set the score for each domain to be the normalised number of followers amongst @dirdigeng’s friends community. (That is, the domain score equalled the indegree of a node in the @dirdigEng friends network, divided by the total number of people in that network).
As you can see from the above, the results differ… Whether there is any improvement in the ranking of results is another thing. (There is also the question of how best to score, or boost, rankings based on networks stastics, and the extent to which rankings should be determined by friends network factors…)
It also strikes me that the modularity groups might also be used to inform the setup of a CSE. For example, separate modularity groups/classes may be used to define refinement label, allowing users to just search pages from members of a particular modularity class, or boost the results from those people.
And finally, I wonder whether we can mine the tweets of @dirdigeng’s friends, as well as those of @dirdigeng, to provide raw material for additional advice for searchers?

















