Using Google Fusion Tables for a Quick Look at GCSE/A’Level Certificate Awards Market Share by Examination Board
On my to do list for some time has been a quick peek at market share in the school exams market – does any one awarding body dominate at a particular level, for example, or within a particular subject area? Or how about dominating a particular subject at a particular level? (If you thing this might have anything to do with my idle thought around plurality, you wouldn’t be far wrong…;-) For additional context, see eg House of Commons Education Committee – The administration of examinations for 15–19 year olds in England.)
And so, a couple of months ago, I posted an FOI request to OfQual asking for a copy of relevant data. The request was politely declined at the time on the grounds that the data would seen be available in public anyway. The following question did come to mind though: if the data is public but in PDF (rather than machine readable dataset) form, and I specifically requested machine readable form, presumably an “It’s already publicly available” response wouldn’t wash, given the Protection of Freedoms amendment to FOI that enshrines the right to data in data form? The FOI response also gave a link to the Joint Council for Qualifications (JCQ), although the URL provided doesn’t seem to work now/any more – I’m guessing this: http://www.jcq.org.uk/examination-results/a-levels is the sort of thing they were trying to refer me to? Which is PDF doc with a load of data tables… Hmm… Looking through that data, I started to wonder about the existence of a more refined dataset, specific one that for each qualification body shows the number of people who took a particular qualification at a particular level and the break down of grades awards. In this way, we could look to see whether one board appeared to be “easier” than another in terms of the distribution of grades awarded within a particular qualification by a particular board.
Anyway… it now being after October 19th, the date by which the data was due to be released, I went to the OfQual site to find the data. The site appears to have had a redesign and I couldn’t find the data anywhere… Using the URL I’d discovered on the old site and included in my FOI request – http://www.ofqual.gov.uk/standards/statistics/general-data/ – I found the following “no but, yes but”, blink and you’ll take it for a 404, page, which has the page title (that appears in a browser tab) of Page not found - OfQual:
If you click through and visit the page on the “old” site – http://www2.ofqual.gov.uk/standards/statistics/general-data/ – you can get to a copy of the data… I also popped a copy onto Google Fusion Tables.
(If you can work out where on the new site this data is, along with a protocol/strategy for finding it from: 1) the OfQual homepage, and 2) Google, I’d appreciate it if you’d post a hint or too in the comments;-)
Using the “new look” interface to Google Fusion Tables, here are a few examples of different views we might generate over the data to get a feel for relative market shares.
To start with, the data looks something like this:
The first thing I did was to duplicate the data, and then collapse the results columns relating to years other than 2012 (for now, I’m just interested in 2012 numbers to get a feel for current market shares):
Here’s the result:
We’re not really interested in rows where no certificates were awarded in 2012, so we can filter those rows out:
This is just like selecting a numeric facet for a column in OpenRefine. The Fusion Tables panel that pops up, though, is perhaps not quite as, erm “refined” as the panel in OpenRefine (which shows a range slider). Instead, in Fusion Tables, we are presented with two limit boxes – the one on the left sets an inclusive lower boundary on the value displayed (a “greater than or equal to” limit) and the one on the right an inclusive upper bound (a “less than or equal to” limit). To view rows where the certs2012 value is greater than 0, we put a 1 in the lower bound box and leave the upper bound empty (which is to say, we filter to allow through values >=1).
So where are we at? We now have a set of data that describes for 2012 how many awards were made by each board for each GCE/GCSE certificate they offer. So what? What I was interested in was the market share for each board. The Summary view provides us with a straightforward way of doing this:
What we want to do is sum up the certificates awarded by each board:
This gives a report of the form:
So for example, we can see from the summary table that is generated that AQA awarded (?) almost three and a half million certificates, and OCR just over 1.5 million.
Knowing the total number of certificates awarded by each board is one thing, but the resolution isn’t great because it mixes levels – if one board dominated A’levels and another GCSEs, the order(s) of magnitude more people taking GCSEs would mask the dominant share at A’level, where far fewer certificates are awarded.
To summarise the certificates awarded by each board at each qualification level, we can refine the Summary view:
Here’s what this particular summary view looks like – note that we can sort the rows according to the values in a particular column:
If you are more interested in looking at market share across a particular subject area, we can use a Filter to limit the search results:
The filter panel contains the different factor levels (R), or text facets (OpenRefine) of the elements contained within the selected column.
As well as filtering by a particular facet value, we can also filter results based on a full text string match (I don’t thing Boolean search is possible – just a literal string match on whatever appears in the search box):
To look at data for a particular qualification, we can bring in another filter from the Filter menu, or we can “find” particular values:
This is very much like the text facet view in Google Refine – here’s what the filtered, summarised and found view looks like:
As far as working out dominant market share, we haven’t really got very far – the above data conversation suggests that there is a whole load of context we need to be clear about when we count up the number of certificates awarded by each body and then compare them (are we making subject based comparisons, level based comparisons, subject and level based comparisons, etc.) What we do have, though, is a conversational strategy for starting to ask particular questions of the data. For example, how do the boards compare in the award of Math (string fragment that should match both Maths and Mathematics in a certificate title) across the levels:
And at A’level?
This is fine insofar as it goes, but it would be a bit laborious trying to get a macroscopic view for market shares over all subject areas and levels separately… To do that, I’d probably opt for another tool with powerful support for grouping and visualisation. R maybe…?;-) But that’ll have to wait for another post…
PS You’ll notice that I haven’t actually made any comments about which boards have what market share…And that’s part of the point of OUseful.info as a “howto” resource – it’s a place for coming up with questions and strategies for starting to answer them, as well as sharing process ideas rather than any particular outcomes from applying those processes…