Archive for the ‘Search’ Category
Towards the end of last week I attended a two day symposium on Statistics in Journalism Practice and Education at the University of Sheffield. The programme was mixed, with several reviews of data journalism is or could be, and the occasional consideration of what stats might go into a statistics curriculum for students, but it got me thinking again about the way that content gets created and shunted around the news world.
Take polls, for example. At one point a comment got me idly wondering about the percentage of news copy that is derived from polls or surveys, and how it might be possible to automate the counting of such things. (My default position in this case is usually to wonder what might be possible be with the Guardian open platform content API. But I also started to wonder about how we could map the fan out from independent or commissioned polls or surveys as they get reported in the news media, then maybe start to find their way into other reports and documents by virtue of having been reported in the news.
This sort of thing is a corollary to tracking the way in which news stories might make their way from the newswires and into the papers via a bit of cut-and-pasting, as Nick Davies wrote so damningly about several years ago now in Flat Earth News, his indictment of churnalism and all that goes with it; it also reminds me of this old, old piece of Yahoo Pipes pipework where I tried to support the discovery of Media Release Related News Stories by putting university press release feeds into the same timeline view as news stories about that university.
I don’t remember whether I also built a custom search engine at the time for searching over press releases and news sites for mentions of universities, but that was what came immediately to mind this time round.
So for starters, here’s a quick Google Custom Search Engine that searches over a variety of polling organisation and news media websites looking for polls and surveys – Churnalism Times (Polls & Surveys Edition).
Here’s part of the setup, showing the page URL patterns to be search over.
I added additional refinements to the tab that searches over the news organisations so only pull out pages where “poll” or “survey” is mentioned. Note that if these words are indexed in the chrome around the news story (eg in a banner or sidebar), then we can get a false positive hit on the page (i.e. pull back a page where an irrelevant story is mentioned because a poll is linked to in the sidebar).
From way back when when I took an interest in search more than I do now, I thought Google was trying to find ways of distinguishing content from furniture, but I’m not so sure any more…
Anyway, here’s an example of a search into polls and surveys published by some of the big pollsters:
And an example of results from the news orgs:
For what it’s worth I also put together a custom search engine for searching over press releases – Churnalism Times (PR wires edition):
The best way of using this is to just past in a quote, or part of a quote, from a news story, in double quotes, to see which PR notice it came from…
To make life easier, an old bookmarklet generator I produced way back when on an Arcadia fellowship at the Cambridge University Library, can be used to knock up a simple bookmarklet that will let you highlight a chunk of text and then search for it – get-selection bookmarklet generator.
Give it a sensible title; then this is the URL chunk you need to add:
Sigh.. I used to have so much fun…
PS it actually makes more sense to enclose the selected quote in quotes. Here’s a tweaked version of the bookmarklet code I grabbed from my installation of it in Chrome:
PPS I’ve started to add additional search domains to the PR search engine to include political speeches.
If you live by pop tech feed or Twitter, you’ve probably heard that Google is rolling out a new style of socially powered search results. If not, or if you’re still not clear about what it entails, read Phil Bradley’s post on the matter: Why Google Search Plus is a disaster for search.
Done that? If not, why not? This post isn’t likely to make much sense at all if you don’t know the context. Here’s the link again: Why Google Search Plus is a disaster for search
So the starting point for this post is this: Google is in the process of rolling out a new web search service that (optionally) offers very personal search results that contains content from folk that Google thinks you’re associated with, and that Google is willing to show you based on license agreements and corporate politics.
Think about this for a minute…. in e the totally personalised view, folk will only see content that their friends have published or otherwise shared…
In Could Librarians Be Influential Friends?, I wondered aloud whether it made sense for librarians and other folk involved with providing support relating to resource discovery and recommendation to start a) creating social network profiles and encouraging their patrons to friend them, and b) start recommending resources using those profiles in order to start influencing the ordering/ranking of results in patrons’ search results based on those personal recommendations. The idea here was that you could start to make
invisible frictionless recommendations by influencing the search engine results returned to your patrons (the results aren’t invisible because your profile picture may appear by the result showing that you recommend it. They’re frictionless in the sense that having made the original recommendation, you no longer have to do any work in trying to bring it to the attention of your patron – the search engines take care of that for you (okay, I know that’s a simplistic view;-). [Hmm.. how about referring to it as recommendation mode support?]
(Note that there is an complementary form of support to the approach which I’ve previously referred to as Invisible Library Tech Support (responsive mode support?; which I guess is also frictionless, at least from the perspective of the patron) in which librarians friend their patrons or monitor generic search terms/tags on Q&A sites and then proactively respond to requests that users post into their social networks more generally.)
With the aggressive stance Google now seems to be taking towards pushing social circle powered results, I think we need to face up to the fact – as Phil Bradley pointed out – that if librarians want to make sure they’re heard by their patrons, they’re going to need to start setting up social profiles, getting their patrons to friend them, and start making content and resource recommendations just anyway in order to make them available as resources that are indexed by patrons’ personal search engines. The same goes for publishers of OERs, academic teaching staff, and “courses”.
If we think of Google social search as searching over custom search engines bound by resources created and recommended by members of a users social circle, if you want to make (invisible) recommendations to a user via their (personalised) web search results, you’re going to need to make sure that the resources/content you want to recommend is indexed by their personal search engines. Which means: a) you need to friend them; and b) you need to share that content/those resources in that social context.
(Hmmm…this makes me think there may be something in the course custom search engine approach after all… Specifically, if the course has a social profile, and recommends the links contained within the course via that profile, they become part of the personalised search index of student’s following that course profile?)
Just by the by, as another example of Google completely messing things up at the moment, I notice that when I share links to posts on this blog via Google+, they don’t appear as trackbacks to the post in question. Which means that if someone refers to a post on this blog on Google+, I don’t know about it… whereas if they blog the link, I do…
See also my chronologically ordered posts on the eroding notion of “Google Ground Truth”.
[Invisible vs frictionless (and various notions of that word) is all getting a bit garbled; see eg @briankelly's Should Higher Education Welcome Frictionless Sharing and my comments to it for a little more on this...]
PS I’ve been getting increasingly infuriated by the clutter around, and lack of variation within, Google search results lately, so I changed my default search engine to Bing. The results are a bit all over the place compared to the Google results I tend to get, but this may be down in part to personalisation/training. I am still making occasional forays to Google, but for now, Bing is it… (because Bing is not Google…)
PPS Hah – just noticed: Google Search Plus doesn’t mean plus in the sense of search more, it means search Google+, which is less, or minus the wider world view…;-)
PPPS I keep meaning to blog this, and keep forgetting: Turn[ing] off [Google] search history personalization, in particular: “If you’ve disabled signed-out search history personalization, you’ll need to disable it again after clearing your browser cookies. Clearing your Google cookie clears your search settings, thereby turning history-based customizations back on.” WHich is to say, when you disable personalisation, you don’t disable personalisation against your Google account, you disable it only insofar as it relates to your current cookie ID?
In Search Engine Powered Courses…, I took an initial, baby step to demonstrate one way in which a promoted link might be used be within a course specific custom search engine. In the next post in this series, I will describe how to influence the positioning of results within a Google custom search engine by boosting their ranking, as well as how results may be ‘faceted’ into different results sets through the use of labels.
In this post, I thought it would be worth taking a step back and reviewing the three configuration files we have access to when defining a Google custom search engine: the configuration file, the promotions file, and the annotations file. If you create a minimal Google custom search engine using the CSE management tools, and then go to the Advanced page, you will see options that allow you to upload the configuration and annotations file. The promotions file can be imported via the Promotions page.
So what do each of these file do?
- The configuration file defines the top level configuration of the search engine. The easiest way of obtaining a template for a CSE is to create a minimal search engine using the CSE management tools, and then export the configuration file from the Advanced page. The configuration file defines, among other things: whether the search engine will search over the whole web, prioritising (or ‘BOOSTing’) sites and pages indexed explicitly by the CSE, or whether it will just return resuts from the explicilty indexed pages (a FILTER style search engine); a definition of the labels, or facets, that allow different search refinements to be applied as different search strategy contexts within the CSE; some styling information; and information relating to Subscribed Links (more of them in another post, if they’re still supported by then..)..
- The promotions file allows you do define promoted links within a CSE; in Search Engine Powered Courses…, I give an example of how these might be used in a course search engine.
- The annotations file identifies the sites and pages that are specific members of the CSE index, as well as how they should be handled (eg the extent to which they should be positively or negatively boosted in the search engine results listing, whether they should appear in the top few results, and what labels or facets should apply to them).
It’s also possible to customise the styling/presentation of the search engine, but that’s a shiny, shiny feature, so probably not something I’ll be looking at…
PS I just noticed you can now manage Google Analytics settings for custom search engines (which allows you to log search queries) from within the CSE control panel… I’m still not sure how easy it is to track which results get clicked through, though?
How can we use customised search engines to support uncourses, or the course models used to support MOOC style offerings?
To set the scene, here’s what Stephen Downes wrote recently on the topic of How to partcipate in a MOOC:
You will notice quickly that there is far too much information being posted in the course for any one person to consume. We tried to start slowly with just a few resources, but it quickly turns into a deluge.
You will be provided with summaries and links to dozens, maybe hundreds, maybe even thousands of web posts, articles from journals and magazines, videos and lectures, audio recordings, live online sessions, discussion groups, and more. Very quickly, you may feel overwhelmed.
Don’t let it intimidate you. Think of it as being like a grocery store or marketplace. Nobody is expected to sample and try everything. Rather, the purpose is to provide a wide selection to allow you to pick and choose what’s of interest to you.
This is an important part of the connectivist model being used in this course. The idea is that there is no one central curriculum that every person follows. The learning takes place through the interaction with resources and course participants, not through memorizing content. By selecting your own materials, you create your own unique perspective on the subject matter.
It is the interaction between these unique perspectives that makes a connectivist course interesting. Each person brings something new to the conversation. So you learn by interacting rather than by mertely consuming.
When I put together the the OU course T151, the original vision revolved around a couple of principles:
1) the course would be built in part around materials produced in public as part of the Digital Worlds uncourse;
2) each week’s offering would follow a similar model: one or two topic explorations, plus an activity and forum discussion time.
In addition, the topic explorations would have a standard format: scene setting, and maybe a teaser question with answer reveal or call to action in the forums; a set of topic exploration questions to frame the topic exploration; a set of resources related to the topic at hand, organised by type (academic readings (via a libezproxy link for subscription content so no downstream logins are required to access the content), Digital Worlds resources, weblinks (industry or well informed blogs, news sites etc), audio and video resources); and a reflective essay by the instructor exploring some of the themes raised in the questions and referring to some of the resources. The aim of the reflective essay was to model the sort of exploration or investigation the student might engage in.
(I’d probably just have a mixed bag of resources listed now, along with a faceting option to focus in on readings, videos, etc.)
The idea behind designing the course in this way was that it would be componentised as much as possible, to allow flexibility in swapping resources or even topics in and out, as well as (though we never managed this), allowing the freedom to study the topics in an arbitrary order. Note: I realised today that to make the materials more easily maintainable, a set of ‘Recent links’ might be identified that weren’t referred to in the ‘My Reflections’ response. That is, they could be completely free standing, and would have no side effects if replaced.
As far as the provision of linked resources went, the original model was that the links should be fed into the course materials from an instructor maintained bookmark collection (for an early take on this, see Managing Bookmarks, with a proof of concept demo at CourseLinks Demo (Hmmm, everything except the dynamic link injection appears to have rotted:-().
The design of the questions/resources page was intended to have the scoping questions at the top of the page, and then the suggested resources presented in a style reminiscent of a search engine results listing, the idea being that we would present the students with too many resources for them to comfortably read in the allocated time, so that they would have to explore the resources from their own perspective (eg given their current level of understanding/knowledge, their personal interests, and so on). In one of my more radical moments, I suggested that the resources would actually be pulled in from a curated/custom search engine ‘live’, according to search terms specially selected around the current topic and framing questions, but I was overruled on that. However, the course does have a Google custom search engine associated with it which searches over materials that are linked to from the course.
So that’s the context…
Where I’m at now is pondering how we can use an enhanced custom search engine as a delivery platform for a resource based uncourse. So here’s my first thought: using a Google Custom Search Engine populated with curated resources in a particular area, can we use Google CSE Promotions to help scaffold a topic exploration?
Here’s my first promotions file:
<Promotions> <Promotion id="t151_1a" queries="topic 1a, Topic 1A, topic exploration 1a, topic exploration 1A, topic 1A, what is a game, game definition" title="T151 Topic Exploration 1A - So what is a game?" url="http://digitalworlds.wordpress.com/2008/03/05/so-what-is-a-game/" description="The aim of this topic is to think about what makes a game a game. Spend a minute or two to come up with your own definition. If you're stuck, read through the Digital Worlds post 'So what is a game?'" image_url="http://kmi.open.ac.uk/images/ou-logo.gif" /> </Promotions>
It’s running on the Digital Worlds Search Engine, so if you want to try it out, try entering the search phrase what is a game or game definition.
(This example suggests to me that it would also make sense to use result boosting to boost the key readings/suggested resources I proposed in the topic materials so that they appear nearer the top of the results (that’ll be the focus of a future post;-))
The promotion displays at the top of the results listing if the specified queries match the search terms the user enters. My initial feeling is that to bootstrap the process, we need to handle:
- queries that allow a user to call on a starting point for a topic exploration by specifically identifying that topic;
– “naive queries”: one reason for using the resource-search model is to try to help students develop effective information skills relating to search. Promotions (and result boosting) allow us to pick up on anticipated naive queries (or popular queries identified from search logs), and suggest a starting point for a sensible way in to the topic. Alternatively, they could be used to offer suggestions for improved or refined searches, or search strategy hints. (I’m reminded of Dave Pattern’s work with guided searches/keyword refinements in the University of Huddersfield Library catalogue in this context).
Here’s another example using the same promotion, but on a different search term:
Of course, we could also start to turn the search engine into something like an adventure game engine. So for example, if we type: start or about, we might get something like:
(The link I associated with start should really point to the course introduction page in the VLE…)
We can also use the search context to provide pastoral or study skills support:
These sort of promotions/enhancements might be produced centrally and rolled out across course search engines, leaving the course and discipline related customisations to the course team and associated subject librarians.
Just a final note: ignoring resource limitations on Google CSEs for a moment, we might imagine the following scenarios for their role out:
1) course wide: bespoke CSEs are commissioned for each course, although they may be supplemented by generic enhancements (eg relating to study skills);
2) qualification based: the CSE is defined at the qualification level, and students call on particular course enhancements by prefacing the search with the course code; it might be that students also see a personalised view of the qualification CSE that is tuned to their current year of study.
3) university wide: the CSE is defined at the university level, and students students call on particular course or qualification level enhancements by prefacing the search with the course or qualification code.
In passing, I noticed I had a broken link to a Google CSE documentation page:
Searching a little, I found the page had moved to
A cached version of the originally linked page is still available, so I did a side-by-side comparison:
For all their success in attracting universities to adopt Google Apps (Tradition meets technology: top universities using Apps for Education), it’s not obvious to me how – or even if – Google is actually doing much around search signal detection and innovation in an educational context?
I’ve floated this a couple of times before (eg Could Librarians Be Influential Friends? And Who Owns Your Search Persona? and Integrating Course Related Search and Bookmarking?), but with yet another announcement from Google about how they’re incorporating social signals into search rankings (Hide sites from anywhere in the world: “We’ve … started incorporating data about sites people have blocked into our general search ranking algorithms to help users find more high quality sites.”), I’m going to raise it again…
To what extent are course and subject librarians setting up course/subject personas that engage in recommending and sharing high quality links in an appropriate social content, and encouraging students to follow those accounts in order to benefit from personalisation of search results based on social signals?
Furthermore, to what extent might the development of search personas represent the creation of a “scholarly agent” that can be used to offer “search assist” to followers of that agent/persona?
I don’t find it that hard to imagine myself taking a course, following the course recommender on a social network (an account that might send out course related reminders as well as relevant links), with an icon depicting my university and the associated course, that on occasion appeared to “recommend” links to me when I was searching for topics relating to my course. (In the normal scheme of things, it wouldn’t actively be recommending links to me, of course. For that, I’d need to subscribe to something like Subscribed Links, as mentioned in Integrating Course Related Search and Bookmarking?.)