Tweaking Ranking Factors in the Course Detective Custom Search Engine

This is a note-to-self as much as anything, relating to the Course Detective custom search engine that searches over UK HE course prospectus web pages about the extent to which we might be able to use data such as the student satisfaction survey results (as made available via Unistats) to boost search results around particular subjects in line with student satisfaction ratings, or employment prospects, for particular universities?

It’s possible to tweak rankings in Google CSEs in a variety of ways. On the one hand, we can BOOST (improve the ranking), FILTER (limit results to members of a given set) or ELIMINATE (exclude) sites appearing in the search results listing. In the simplest case, we assign a BOOST, FILTER or ELIMINATE weight to a label, and then apply labels to annotations so that they benefit from the corresponding customisation. We can further refine the effect of the modification by applying a score to each annotation. The product of score and weight values determines the overall ranking modification that is applied to each result for a label applied to an annotation.

So here’s what I’m thinking:

– define labels for things like achievement or satisfaction that apply a boost to a result;
– allow uses to apply a label to a search;
– for each university annotation (that is, the listing that identifies the path to the pages for a particular university’s online prospectus), add a label with a score modifier determined by the achievement or satisfaction value, for example, for that institution;
– for refinement labels that tweak search rankings within a particular subject area, define labels corresponding to those subject areas and apply score modifiers to each institution based on, for example, the satisfaction level with that subject area. (Note: I’m not sure if the same path can have several different annotations provided to it with different scores?

For example, an annotation file typically contains a fragment that looks like:

<Annotations>
  <Annotation about="webcast.berkeley.edu/*" score="1">
    <Label name="university_boost_highest"/>
    <Label name="lectures"/>
  </Annotation>

  <Annotation about="www.youtube.com/ucberkeley/*" score="1">
    <Label name="university_boost_highest"/>
    <Label name="videos_boost_mid"/>
    <Label name="lectures"/>
  </Annotation>
</Annotations>

I don’t know if this would work:

<Annotations>
  <Annotation about="example.com/prospectus/*" score="1">
    <Label name="chemistry"/>
  </Annotation>
<Annotations>
  <Annotation about="example.com/prospectus/*" score="0.5">
    <Label name="physics"/>
  </Annotation>
</Annotations>

That said, if the URLs are nicely structured, we might be able to do something like:

<Annotations>
  <Annotation about="example.com/prospectus/chemistry/*" score="1">
    <Label name="chemistry"/>
  </Annotation>
<Annotations>
  <Annotation about="example.com/prospectus/physics/*" score="0.5">
    <Label name="physics"/>
  </Annotation>
</Annotations>

albeit at the cost of having to do a lot more work in terms of identifying appropriate URI paths.

I also need to start thinking a bit more about how to apply refinements and ranking adjustments in course based CSEs.