Whenever a new open data dataset is released, the #opendata wires hum a little more. More open data is a Good Thing, right? Why? Haven’t we got enough already?
In a blog post a few weeks ago, Alan Levine, aka @cogdog, set about Stalking the Mythical OER Reuse: Seeking Non-Blurry Videos. OERs are open educational resources, openly licensed materials produced by educators and released to the world so others could make use of them. Funding was put into developing and releasing them and then, … what?
OERs. People build them. People house them in repositories. People do journal articles, conference presentations, research on them. I doubt never their existence.
But the ultimate thing they are supposed to support, maybe their raison d’être – the re use by other educators, what do we have to show for that except whispered stories, innuendo, and blurry photos in the forest?
Alan went in search of the OER reuse in his own inimitable way…
… but came back without much success. He then used the rest of the post to put out all for stories about how OERs have actually been used in the world… Not just mythical stories, not coulds and mights: real examples.
So what about opendata – is there much use, or reuse, going on there?
It seems as is more datasets get opened every day, but is there more use every day, first day use of newly released datasets, incremental reuse of the datasets that are already out, linkage between the new datasets and the previously released ones.
Yesterday, I spotted via @owenboswarva the release of a dataset that aggregated and normalised data relating to charitable grant awards: A big day for charity data. Interesting… The supporting website – 360 Giving – (self-admittedly in it’s early days) allows you to search by funder, recipient or key word. You have to search using the right keywords, though, and the right capitalisation of keywords…
And you may have to add in white space.. so *University of Oxford * as well as *University of Oxford*.
I don’t want to knock the site, but I am really interested to know how this data might be used. Really. Genuinely. I am properly interested. How would someone working in the charitable sector use that website to help them do something? What thing? How would it support them? My imagination may be able to go off on crazy flights of fancy in certain areas, but my lack of sector knowledge or a current headful of summer cold leaves me struggling to work out what this website would tangibly help someone to do. (I tried to ask a similar question around charities data before, giving the example of Charities Commission data grabbed from OpenCharities, but drew a blank then.) Like @cogdog in his search for real OER use case stories, I’d love to hear examples of real questions – no matter how trivial – that the 360 Giving site could help answer.
As well as the website, 360 Giving folk provide a data download as a CSV file containing getting on for a quarter of a million records. The date stamp on the file I grabbed is 5th June 2014. Skimming through the data quickly – my own opening conversation with it can be found here: 360 Giving Grant Navigator – Initial Data Conversation – I noticed through comparison with the data on the website some gaps…
- this item doesn’t seem to appear in the CSV download, perhaps because it doesn’t appear to have a funder?
- this item on the website has an address for the recipient organisation, but the CSV document doesn’t have any address fields. In fact, on close inspection, the record relates to a grant by the Northern Rock Foundation, and I see no records from that body in the CSV file?
- Although there is a project title field in the CSV document, no project titles are supplied. Looking through a sample of grants on the website, are any titles provided?
- The website lists the following funders:
Arts Council England
Arts Council Wales
Heritage Lottery Fund
Northern Rock Foundation
Paul Hamlyn Foundation
Sport Northern Ireland
The CSV file has data from these funders:
Arts Council England
Arts Council Wales
Sport Northern Ireland
That is, the CSV contains a subset of the data on the website; data from Heritage Lottery Fund, Indigo Trust, Northern Rock Foundation, Paul Hamlyn Foundation doesn’t seem to have made it into the data download? I also note that data from the Research Councils’ Gateway to Research (aside from the TSB data) doesn’t seem to have made it into either dataset. For anyone researching grants to universities, this could be useful information. (Could?! Why?!;-)
- No company numbers or Charity Numbers are given. Using opendata from Companies House a quick join on recipient names and company names from the Companies House register (without any attempts at normalising out things like LTD and LIMITED – that is, purely looking for an exact match) gives me just over 15,000 matched company names (which means I now have their address, company number, etc. too). And presumably if I try to match on names from the OpenCharities data, I’ll be able to match some charity numbers. Now both these annotations will be far from complete, but they’d be more than we have at the moment. A question to then ask is – is this better or worse? Does the dataset only have value if it is in some way complete? One of the clarion calls for open data initiatives has been to ‘just get the data out there’ so that it can be started to be worked on, and improved on. So presumably having some company numbers of charity numbers matched is a plus?
Now I know there is a risk to this. Funders may want to not release details about the addresses of the charities of they are funding because that data may be used to plot maps to say “this is where the money’s going” when it isn’t. The charity may have a Kensington address and the received funding for an initiative in Oswaldtwistle, but the map might see all the money sinking into Kensington; which would be wrong. But that’s where you have to start educating the data users. Or releasing data fields like “address of charity” and “postcode area of point of use”, or whatever, even if the latter is empty. As it is, if you give me a charity or company name, I can look up it’s address. And its company or charity number if it has one.
As I mentioned, I don’t want to knock the work 360 Giving have done, but I’m keen to understand what it is they have done, what they haven’t done, and what the opendata they have aggregated and re-presented could – practically, tractably, tangibly – be used for. Really used for.
Time to pack my bags and head out into the wood, maybe…
8 thoughts on “More OpenData Published – So What?”
I take your point, but …
I think the publication of grants data, and the work 360 Giving have done to get that data released and to present it on their website, is mainly about transparency and accountability. It supports the principle that donors and the general public should have ready access to this information.
That doesn’t necessarily mean there is any great potential for building applications or analytic products from the data. The data might be consulted as a resource, by interested people within the charity sector and members of the public, or used by businesses for internal purposes — and it might not. It’s inherently difficult to track the full extent of use and reuse of any open dataset.
Of course it’s fair to ask how much use, or reuse, is going on. But for this type of data, how much does that matter? If we agree in principle that the data should be available, and as long as the costs of release are not too onerous, that can be sufficient unto itself. It’s not as if the internet is running out of space. I don’t think we need a notional minimum level of use as a metric for the “success” of an open data release.
You say the release is “mainly about transparency and accountability. It supports the principle that donors and the general public should have ready access to this information.”
What does that mean? Like @cogdog’s search for reuse of OERs, what does it actually mean to use such data to support accountablity and transparency? And what does the released data have to contain/what level of quality or linkage/linkability does it need to attain to support that use for accountability and transparency?
I want to know why it’s useful – not least so I can learn from that and try similar things myself…
I really think I’m missing something – everyone seems to know what ‘using data for accountability and transparency” means, but I don’t. What can I practically do to make use of this data in a way that operationalises the use of the data for ‘holding to account’ purposes? Even if (most of the time) it shows that there is nothing nefarious going on and organisations are acting responsibly/appropriately, or even well?
(I am similarly confused when folk talk about demonstrating “impact”!;-)
Well, I’m probably the wrong person to make that argument. There are some in the community who think open data is a huge boon to transparency and accountability. That has been a major emphasis of the current Government’s open data policy. But personally I think it’s a minor effect. (My own focus is on release of information infrastructure and technical datasets, which I think have more potential for practical reuse.)
So I agree it’s all a bit murky and amorphous. I think the idea, as with public spending data, is that “armchair auditors” (or journalists perhaps) will sift through the data, raise questions about anything that looks funny, and thereby keep the data publishers “honest”. Of course that model is unrealistic; there’s no real evidence that the necessary pool of enthusiasts even exists. However the model is amicable to those who support cuts to proper regulatory oversight.
I’m struggling to find people who will properly make that argument… ;-)
Hi Tony it’s William here from 360giving – thanks for having a poke at the data, I’ve been on holiday so couldn’t reply sooner.
First off you are right to point out a load of inconsistencies in the data itself and the way it is presented. This is a very early publication of data and presentation in a web product. That’s why we slap the words ‘prototype’, ‘demonstrator’, ‘rudimentary’ etc all over it and are very tentative in our presentation and label it as a beta (Toby Blume in his article you cite also alludes to our low key approach). Our blog post tries to put it in context http://threesixtygiving.com/2014/08/04/grantnav/ . Some of your observations we knew about already and others are helpful and will be tackled in the next sprint. We ran out of development time with Aptivate before we could knock all the bugs on the head and I was keen to publish first and tidy up issues later otherwise you can end up never publishing.
You query the ‘just get it out there’ approach but my long experience of bureaucracies (and most grant makers are such) is that without this you never get to publication – see Tim Berner’s Lee’s excellent article that covers ‘just do it’ among other things http://www.w3.org/DesignIssues/GovData.html
You also observe that there is some weirdness in the data itself – particularly in the lack of grantee addresses in many cases and simple things like company or charity numbers. This lack of addresses is very frustrating – i cover it in the FAQ on the site http://lin-360giving.aptivate.org/faq/ . Owen Borswarva and I have corresponded about the weird knock on effect of the ICO judgement on pig farmers (see the grantnav FAQ) and about my FOI requests to Sport England in particular who have dug in deep on this (see What do They Know https://www.whatdotheyknow.com/request/grants_for_last_five_financial_y#incoming-498022) and on company and charity numbers. Some grant makers though were able to furnish addresses (Northern Rock and Heritage Lottery Fund in particular on a large scale) so we could at least map these. We found that charity and company numbers were especially hard to secure from grant makers – in many cases they seem to harvest them in their application forms and then never do anything with them. I have worked with Chris Taggart over many years and can see the obvious potential of being able to link up in this way. Yes of course you can do the matching you have demonstrated with open corporates and open charities, but it isn’t ideal – it’s like that old Irish asking directions joke ‘I wouldn’t start from here’.
Overall though it is helpful to have critical articles such as yours, as it gives me something to take back to the data providers.
On the ‘what is the data for?’ question. A bit of background – in 2006/7 when i was working as a civil servant i commissioned a thing called the Power of Information Review which gave new impetus to open data in government in the UK. My judgement based on experience or working with decision makers is that being open in its own right (data, foi, answering questions etc) conditions people in power to expect scrutiny and thus modifies their behaviours. And analysis of published data supports better accountability and decision making. Like anything in the public sphere, this is not always a direct effect but is an important part of a wider system of accountability. I strongly believe that open data works best when done with a sense of purpose. I wrote on this a couple of years ago http://talkaboutlocal.org.uk/open-data-forward-strategy/
360giving aims to improve the quality of grant making. The grant-making sector is astonishingly opaque. If you run through the top fifty or so biggest grant makers (listed in the Pears Foundation/Cass Business School work) only about ten or so publish comprehensive information about who they are making grants to. Where data is published it’s very hard to compare or in a narrative form or in a pdf. Grant makers have to expend a lot of time and energy working out who is active in making grants and to whom. At a local and smaller grant maker level the situation is worse. For grant makers who want to work in new areas or new grant makers there is a very high information cost when seeking to make decisions.
Our judgement, based on extensive discussion with grant makers large and small is that better information about who is making grants for what to who and how successful they are will support better grant making. Indeed when we demonstrated pre-release versions of grantnav to grant makers their first step was always to look up who else was funding organisations they fund (subject to the data cleansing issues you note).
Some people will seek to hold grant makers to account for decisions they have made using this information. Especially given the prevalence of statutory grant makers from government and the lottery in this sector. And the academics who work in the sector and trade bodies such as NCVO will use to the data to examine trends etc.
There is a flip side to this of course for people who receive grants (charities etc) in that it isn’t easy to find out who is granting in your area (of activity or geography). The same data that allows grant makers to understand the grant making picture will also inform grantees as they seek new funding.
The grant making sector is not the most technically advanced, by its own admission. At a basic level 360giving wants to provide both a simple technical means to publish, support for people to do so and a small campaign to drive publication of more, better data. It’s early days yet – we see this as a five year journey (see the blog) – as we work with publishers to improve the data we hope that others will produce better tools than Grantnav to aid analysis. But you have to start somewhere.
Thanks for the comment – I appreciate the prototype nature of the release but also know how this can be problematic – social media being such as it is, novelty often rules. So when sites are released they get one chance to interest folk enough to make them want to return… which is hard when trying to get started with a minimum viable thing (what makes for the minimum viability?)
Things like case sensitive search and non-normalised capitalisations makes the tool hard to use – and if folk don’t realise that case is important, it can lead to confusion/misapprehension of results by the user?
Lots of my comments (eg wrt addresses) weren’t so much about the data, they were more addressed (doh!) at the simple observation that data on the website is not the same as the data in the CSV. Again, this is common on many websites, but again, when not clarified it leads to confusion – I don’t know what is/isn’t in the downloaded dataset compared to the website, and I guess I don’t actually know what is/isn’t in the data on the site, so there’s a lot to not know about the data which might influence how I interpret it?
On the question of mapping addresses – I’d be really wary of doing that myself – registered address of charity, or it’s registered company address may be the address I’m given, which may not be the premises of the offices (and there may be multiple offices) which may have nothing to do with where the project or activity being funded takes place etc etc. I know I mistrust addresses enough to be really wary of trying to do anything with them.. (Though that doesn’t stop me doing some counting around postcodes in the linked to notebook;-) FWIW, I got addresses for some organisations from two sources – registered company address from an exact string match of recipient organisation name with company name in the Companies House register; and registered charity address from an exact string match on recipient organisation name with charity name from an (old) dataset downloaded from OpenCharities.
On the ‘what is such data good for?’ question I just remain thoroughly confused about how useful it is and how it is actually used. For all the “Armchair Auditor” talk around local government spending data, I don’t think I’ve seen a simple instruction manual for someone who wants to get started doing Armchair Auditing things (other than searching for the word “biscuits” or looking for amounts associated with the word “expenses”. Where are the notes from a one hour or half day workshop on ‘training for Armchair Auditors’?
Similarly, I’m really intrigued as to what and how data is actually used to influence decisions. As I think you comment, the transparency comes from better understanding the decision making process and how data plays a role in that. Typically this might be consideration of a single data point, but I’d really like to see some worked examples. (As an example, my fave book atm is http://www.cb-racing.com/book_Making-Sense-of-Squiggly-Lines.html which is a book length description of how to read about 5 types of chart, with page long commentaries interpreting pixel size squiggles in a single data chart.) Given that as one extreme, it’d be nice to see some folk giving occasional paragraph length descriptions about how they interpret one or more pieces of data and how it informs their decision making – the “so what?” of all this data around us…
I’m really keen to how your project develops – how the data comes together, and how it starts to be used. Particularly how it starts to be used…. ;-)
Some of the background work on developing a data standard (which is only partially applied in that prototype…) might be useful as well on thinking about use cases.
In particular there are links at http://threesixtygiving.github.io/standard/research/index.html to a design workshop that was used to explore some potential use-cases for this data. These influenced the shape of the data standard – (designed to be easy-to-flatten so that people can gather together data without needing any complex tools when dealing with reasonably small datasets) and I hope will be useful in shaping thoughts about tool-building that might be needed to serve some of the more complex use cases.
It still needs another iteration before it’s ready for widespread use… but so far I’ve been exploring the available data with a local community foundation who are interested in what they can learn about funding patterns in the local area in order to better target their work, and to address problems of double-funding (the same organisation getting the same project funded by many different donors) – as well as looking at how the underlying standard could help support local collaboration on filling in data gaps.
One of the big challenges I think many projects like this face is the gap between usefulness of aggregated data (only really becomes useful when there is lots available) and supply (hard to secure unless there is a shiny thing that people see their data going into), so, as I understand, the website recently launched was really intended as that advocacy tool to support growth of data supply.
(Disclosure/context: I worked on the prototype of the data standard. I wasn’t involved in the current 360 Website prototype. I’m hoping to do some more work soon though on iterations of the data standard, and particularly interested in ways of getting people good quality spreadsheet/CSV dumps of the data that they can work with using basic tools / with basic data literacy.)
Hope this is useful,
Really useful thanks. I suspect that the importance of the shape of the data in the data standard is one of those things that novice data users won’t fully appreciate until they try to use the data in a particular way or with a particular tool.
The step before that, cleaning rather than reshaping – is something that an aggregated source should do anyway (and which 360giving currently misses a trick on, e.g. with the lethal mix of arbitrary capitalisation and case sensitive search).
Re: using the 360giving site to support advocacy in respect of encouraging data supply, what does it need to offer to achieve what message? In the current guise, making it hard to reliably look up information about the lowerCamelCase Charity may actually be a Good Thing, because it makes various points, e.g. that there is: a variety of sources of data about giving to that charity; a need for standardisation of naming somewhere in the process. Putting too much cleaning/controlled vocabulary work onto data suppliers would possibly make them less willing to contribute (although there may be a reverse benefit to them in that it may encourage them to start looking at processes further up their own data pipeline to help them improve their own data quality). OTOH, if the aggregator does a lot of cleaning or normalisation there are opportunities for mistakes, you need to manage provenance/change tracking etc.
Thanks for the links to the design workshop; I liked the personas in the demand side use case doc [ https://docs.google.com/document/d/1bof0cK4fMsqJqN43EBcve70OreuH7XlkgluULzuUHAU/edit# ]
So for example, I pulled out the following and riffed around them a little:
– *who funds local organisations* [in a particular geographical area, perhaps of a particular type, e.g. sports related, arts related, age or demographic related]. Why/so what? e.g. so those organisations can be approached to fund a new project in that area, or a neighbouring area, with those aims etc. Alternatively, for a project in another area, to identify: possible funders are wider geographical scale, the “sorts of” funder that might be relevant at a very local scale (e.g. local trusts).
– *profiling activities of (other) funders* – identifying how they scope their giving, how they manage awards (size, number and distribution of awards; progress tracking; repeat giving profiles perhaps?) So what e.g. look for opportunities to match fund smaller funders from a bigger charity.
– *funding trends* – looking for opportunities to bid into trending areas…
– *charity healthcheck* – e.g. from a recruitment company perspective. Good place to place, good place to poach?
On the one hand, what I’m trying to get an idea of are the *questions* that folk might ask of data phrased in a way that maps onto questions that could be turned into database queries, for example, i.e. askable more or less directly of the data. The above questions are all quite big picture things that give an initial query into the data; but then what are the follow on questions (i.e. how do yo then start to deep read the answers to that initial query).
Another sort of question/use case are more detailed and relate to how to then read, interpret, use the answers to that original question. So I can show you a bar chart or a trend line or a ranked list, but so what? Application ideas often get this far (e.g. representing “answers” to a data question in a graphical way), but still the novice user is left with the question of how to read and interpret the charts. How do you read the data, what do you see in it, what questions do you apply to that data in order to learn what from it, and then so what? What action does it make you take or how exactly does it influence a decision you are trying to make (and how do you define the decision in such a way that having access to data that can be read in a particular way can contribute to the decision making process).
It’s getting easier to create all manner of charts in all manner of ways, but I’m not sure that most of us know how to read and interpret them, and extract meaningful information from them? (That is, identify the differences in them that are the differences that make the difference…)
Comments are closed.