OER Hack Day: UK Universities Degree Course Prospectus Search – Course Detective

If you want to search across all university prospectuses, what do you do? Suffer, that’s what…

Until now…


Course Detective: design concept

At the CETIS/DevCSI hackday, a group of undevelopers* came together to pull together a Google Custom Search Engine that would search over all the undergraduate prospectus pages on all the UK university websites.

If you want to try it out, there’s a basic version running at: CourseDetective.co.uk

The first thing we had to do was grab a list of HEIs – @dkernohan grabbed one off the HEFCE website (I think? Do you have a link to the one you grabbed, DK?) and popped it into a Google Spreadsheet so we could all work on it.

A quick first pass meant Googling each university (e.g. for Undergraduate course foo university) and then finding the prospectus home page. A bit of digging then took us to an example of an actual course page. The intention was to find as deep a path as possible into the website that would still return individual pages for each course.

The number of URL patterns is, unsurprisingly, as large as the number of institutions, with few commonalities. For example, here are the first five (alphabetical order):

http://www.anglia.ac.uk/ruskin/en/home/prospectus/
http://www1.aston.ac.uk/study/undergraduate/courses/
http://www.bathspa.ac.uk/courses/undergraduate/
http://www.bath.ac.uk/study/ug/courses/
http://www.beds.ac.uk/courses

Some URLs embed the year of entry:

http://www.bbk.ac.uk/study/ug2011/
http://www2.hull.ac.uk/ug/11

Sometimes course identifiers appear as variables:

http://www.northumbria.ac.uk/?view=CourseDetail&code=*
http://www.nottingham.ac.uk/ugstudy/course.php?inc=course&code=*

And so on…

Having got paths into the prospectus, we used them to define a custom search engine. The first pass was to paste the links directly into the search engine definition wizard. (We maybe need to check we’ve done this correctly: we actually have two link types – one where we do “URL contains”, such as http://www.beds.ac.uk/courses, the other where we should be checking against a pattern e.g. http://www.nottingham.ac.uk/ugstudy/course.php?inc=course&code=*. I wonder, how would we cope with capturing both ?inc=course&code=* and ?&code=*&inc=course)

Something we did notice that is a *huge* problem with Google Custom Search engines is that if you collaborate with other people in populating the same CSE, you can only get a view over the links added by one person at a time. So I could look at the links David had added, and he could look at the links I had added, but we couldn’t look at all the links at the same time:-(

The next step was to generate a CSE definition file from the imported links, so that we could (in theory) start to craft a machine generated CSE definition file (see Transitioning a CSE). At least one copy of the CSE file is available on the coursedetective github site (look for the annotations.xml file).

To host the site, my first thought was to use Blogger – but this was a bit limiting in terms of possible site design – and secondly to use Google sites. However, Google sites seems to strip out the embedding that the Google custom search engine wizard generates, so instead we opted for Google App Engine using this template. (It would be really helpful if Google Sites provided a trivial way of embedding a Google custom search engine in a sites page…?)

To make the hack relevant to the OERhackday, David added some course OER links and an OER category to the search engine that would allow users to (ideally) locate topic related OERs. The longer term vision is that users should be able to discover courses via OERs, and also check out OERs associated with a course as part of the “what course should I do?” research process.

To enrich the search results further, we also started to collate the URLs for the official institutional Youtube pages so we could search into those videos as well as courses prospectus pages and OERs. I’m not sure if Youtube videos can be previewed in CSE results listings, but it’s something to explore…;-)

On the design side, we didn’t manage to get any CSS out, but James and Joel did come up with a stylish design, as you’ve seen above:-)

In terms of usage, the site is currently unstyled, but it is functional. The results can also be accessed via API calls (the current CSE ID is 006974165492396950327:xvnuayaygic). For universities wanting to compare “Google” searches against their online prospectuses and those of other HEIs, CourseDetective might be an appropriate tool for the SEO toolbox?

It struck me just now that by driving the CSE from a linked file, we could actually define multiple linked definition files for different flavours of website, for example boosting or suppressing course results according to user preferences (for example, geography, or other properties we can associate with or derive from, the course prospectus root URL or common search terms.)

A couple of other things I’d like to be able to do: search for foundation degrees; search for part-time degrees; search for distance education degrees; search for postgraduate taught courses.

I also managed to waste a bit of time (i.e. I still haven’t found a workaround) on the analytics side. What I wanted to do was use the AJAX version of the CSE and then use Google Analytics event tracking to track:
– queries;
– which results were clicked on.

Again, it would be really helpful if the Google Custom Search Engine and Google Analytics folk had a little sit down together to work out how to do at least the first of these, if not the second (they might be protective of folk knowing which links get clicked on in the CSE results? Although that’s not to say that someone else might not come up with a solution… Please feel free to let me know if you have just such a fix in the comments;-)

In terms of time and effort, I reckon it took about 6 person hours to collate the links. If anyone fancies helping develop the site further, I think we’re up for that… :-)

* i.e. folk who aren’t developers but aspire to doing developery things howsover they can;-) The team included: me, David Kernohan, designers James Roscoe and Joel Reed, Shelagh Finlay and Tracey Murray.

6 comments

  1. Martin Hawksey

    I’m not sure if this muddies the water but when I was looking at implementing an instant style Google CSE, part of the solution required using jQuery to manipulate the results. So if you look at the source of http://mashe.hawksey.info/search/?cx=006974165492396950327:xvnuayaygic (which is an instant style version of CourseDetective), around line 75 of the code it manipulates each line of the search results. I think you could add click tracking if you added to the a href onclick=”javascript: pageTracker._trackPageview(‘#!/what_you_want_to identify_the_click_with’);” – I use #!/ make it easier to track in the content drill down

    [but you’re right, Larry needs to get his teams talking to each other]

    Martin

    • Tony Hirst

      @martin Thanks for that suggestion – I think I tried that but I’m not sure it worked? I seem to remember in the way I approached it that JQuery and the Google libraries weirded each other out and the tracking didn’t work? But as likely as anything, it was probably a mistake on my part. (I was also looking to capture the search term somewhere…)

      If anyone has a working proof of concept/demo/utility library, I’ll happily give it a go:-)

  2. Pingback: A First Quick Viz of UK University Fees « OUseful.Info, the blog…
  3. Pingback: Getting Access to University Course Code Data (or not… (yet…)) « OUseful.Info, the blog…