OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for the ‘Open Data’ Category

Confused Fragments About Open Data Economics…

with 4 comments

Some fragments…

- the public paid for it so public has a right to it: the public presumably paid for it through their taxes. Companies that use open public data that don’t fully and fairly participate in the tax regime of the country that produced the data then they didn’t pay their fair share for access to it.

- data quality will improve: with open license conditions that allow users to take open (public) data and do what they want with it without the requirement to make derived data available in a bulk form under an open data license, how does the closed bit of the feedback loop work? I’ve looked at a lot of open public data releases on council and government websites and seen some companies making use of that data in presumably a cleaned form (if it hasn’t been cleaned, then they’re working with a lot of noise…) But if they have cleaned and normalised the data, have they provided this back ion an open form to the public body that gifted them access to it? Is there an open data quality improvement cycle working there? Erm… no… I suspect if anything, the open data users would try to sell the improved quality data back to the publisher. This may be their sole business model, or it may be a spin-off as a result of using the (cleaned and normalised) data fro some other commercial purpose.

Written by Tony Hirst

September 9, 2014 at 1:32 pm

Posted in oh_ffs, Open Data, Policy

Tagged with

Corporate Groupings in Care Provision – Finding the Data for GP Practices, Prequel…

leave a comment »

For some time I’ve been pondering the best way of trying to map the growth in the corporate GP care provision – the number of GP practices owned by Virgin Care, Care UK and so on. Listings about GP practices from the various HSCIC datasets don’t appear to identify corporate owners, so the stop gap solution I’d identified was to scrape lists of practices from the various corporate websites and then try to reconcile them against GP practice codes from the HSCIC as some sort of check.

However, today I stumbled across a dataset released by the Care Quality Commission (CQC) that provides a “complete directory of places where CQC regulated care is provided in England” [CQC information and data]. Two data files are provided – a simple register of locations, and “a second file … which contains details of registered managers and care home bed numbers. It also allows you to easily filter by the regulated activities, service types or service user bands.”

Both files contain fields that allow you to identify GP practices, but the second one also provides information about the actual provider (parent company owner) and any brand name associated with the service. Useful…:-)

What this means is it should be easy enough to pull the data into a report that identifies the practices associated with a particular brand or corporate group… (I’ll have a go at that as soon as I get a chance…)

Another thing that could be useful to do would be to match (that is, link) the location identifiers used by the CQC with the practice codes used by the HSCIC. [First attempt here…. Looks like work needs to be done…:-(] Then we could easily start to aggregate and analyse quality stats, referring and prescribing behaviour data, and so on, for the different corporate groupings and look to see if we can spot any meaningful differences between them (for example, signals that there might be corporate group level policies or behaviours being applied). We could probably also start to link in drug trial data, at least for trials that are registered, and that we can associate with a particular practice (eg Sketching Sponsor Partners Running UK Clinical Trials).

Finally, it’d possibly also be useful to reconcile companies against company registrations on Companies House, and perhaps charity registrations with the Charities Commission (cf. this quick data conversation with the 360 Giving Grant Navigator data).

PS more possible linkage:
- company names to company IDs on OpenCorporates (and from that we can look for additional linkage around registered company addresses, common directors etc)
- payments from local gov and NHS to the companies (from open spending data/transactions data)
- food hygiene inspection ratings (eg for care homes)

Written by Tony Hirst

September 9, 2014 at 12:12 pm

Posted in Open Data, Policy

More OpenData Published – So What?

Whenever a new open data dataset is released, the #opendata wires hum a little more. More open data is a Good Thing, right? Why? Haven’t we got enough already?

In a blog post a few weeks ago, Alan Levine, aka @cogdog, set about Stalking the Mythical OER Reuse: Seeking Non-Blurry Videos. OERs are open educational resources, openly licensed materials produced by educators and released to the world so others could make use of them. Funding was put into developing and releasing them and then, … what?

OERs. People build them. People house them in repositories. People do journal articles, conference presentations, research on them. I doubt never their existence.

But the ultimate thing they are supposed to support, maybe their raison d’être – the re use by other educators, what do we have to show for that except whispered stories, innuendo, and blurry photos in the forest?
Alan Levine

Alan went in search of the OER reuse in his own inimitable way…

… but came back without much success. He then used the rest of the post to put out all for stories about how OERs have actually been used in the world… Not just mythical stories, not coulds and mights: real examples.

So what about opendata – is there much use, or reuse, going on there?

It seems as is more datasets get opened every day, but is there more use every day, first day use of newly released datasets, incremental reuse of the datasets that are already out, linkage between the new datasets and the previously released ones.

Yesterday, I spotted via @owenboswarva the release of a dataset that aggregated and normalised data relating to charitable grant awards: A big day for charity data. Interesting… The supporting website – 360 Giving – (self-admittedly in it’s early days) allows you to search by funder, recipient or key word. You have to search using the right keywords, though, and the right capitalisation of keywords…

360giving-uniOxford

And you may have to add in white space.. so *University of Oxford * as well as *University of Oxford*.

I don’t want to knock the site, but I am really interested to know how this data might be used. Really. Genuinely. I am properly interested. How would someone working in the charitable sector use that website to help them do something? What thing? How would it support them? My imagination may be able to go off on crazy flights of fancy in certain areas, but my lack of sector knowledge or a current headful of summer cold leaves me struggling to work out what this website would tangibly help someone to do. (I tried to ask a similar question around charities data before, giving the example of Charities Commission data grabbed from OpenCharities, but drew a blank then.) Like @cogdog in his search for real OER use case stories, I’d love to hear examples of real questions – no matter how trivial – that the 360 Giving site could help answer.

As well as the website, 360 Giving folk provide a data download as a CSV file containing getting on for a quarter of a million records. The date stamp on the file I grabbed is 5th June 2014. Skimming through the data quickly – my own opening conversation with it can be found here: 360 Giving Grant Navigator – Initial Data Conversation – I noticed through comparison with the data on the website some gaps…

  • this item doesn’t seem to appear in the CSV download, perhaps because it doesn’t appear to have a funder?
  • this item on the website has an address for the recipient organisation, but the CSV document doesn’t have any address fields. In fact, on close inspection, the record relates to a grant by the Northern Rock Foundation, and I see no records from that body in the CSV file?
  • Although there is a project title field in the CSV document, no project titles are supplied. Looking through a sample of grants on the website, are any titles provided?
  • The website lists the following funders:

    Arts Council England
    Arts Council Wales
    Big Lottery
    Creative Scotland
    DSDNI
    Heritage Lottery Fund
    Indigo Trust
    Nesta
    Nominet Trust
    Northern Rock Foundation
    Paul Hamlyn Foundation
    Sport England
    Sport Northern Ireland
    Sport Wales
    TSB
    Wellcome Trust

    The CSV file has data from these funders:

    Arts Council England
    Arts Council Wales
    Big Lottery
    Creative Scotland
    DSDNI
    Nesta
    Nominet Trust
    Sport England
    Sport Northern Ireland
    Sport Wales
    TSB
    Wellcome Trust

    That is, the CSV contains a subset of the data on the website; data from Heritage Lottery Fund, Indigo Trust, Northern Rock Foundation, Paul Hamlyn Foundation doesn’t seem to have made it into the data download? I also note that data from the Research Councils’ Gateway to Research (aside from the TSB data) doesn’t seem to have made it into either dataset. For anyone researching grants to universities, this could be useful information. (Could?! Why?!;-)

  • No company numbers or Charity Numbers are given. Using opendata from Companies House a quick join on recipient names and company names from the Companies House register (without any attempts at normalising out things like LTD and LIMITED – that is, purely looking for an exact match) gives me just over 15,000 matched company names (which means I now have their address, company number, etc. too). And presumably if I try to match on names from the OpenCharities data, I’ll be able to match some charity numbers. Now both these annotations will be far from complete, but they’d be more than we have at the moment. A question to then ask is – is this better or worse? Does the dataset only have value if it is in some way complete? One of the clarion calls for open data initiatives has been to ‘just get the data out there’ so that it can be started to be worked on, and improved on. So presumably having some company numbers of charity numbers matched is a plus?

    Now I know there is a risk to this. Funders may want to not release details about the addresses of the charities of they are funding because that data may be used to plot maps to say “this is where the money’s going” when it isn’t. The charity may have a Kensington address and the received funding for an initiative in Oswaldtwistle, but the map might see all the money sinking into Kensington; which would be wrong. But that’s where you have to start educating the data users. Or releasing data fields like “address of charity” and “postcode area of point of use”, or whatever, even if the latter is empty. As it is, if you give me a charity or company name, I can look up it’s address. And its company or charity number if it has one.

As I mentioned, I don’t want to knock the work 360 Giving have done, but I’m keen to understand what it is they have done, what they haven’t done, and what the opendata they have aggregated and re-presented could – practically, tractably, tangibly – be used for. Really used for.

Time to pack my bags and head out into the wood, maybe…

Written by Tony Hirst

August 15, 2014 at 9:56 am

Posted in Open Data, Policy

Tagged with

Follow

Get every new post delivered to your Inbox.

Join 842 other followers