Archive for the ‘Anything you want’ Category
By chance, a couple of days ago I stumbled across a spreadsheet summarising awarded PFI contracts as of 2013 (private finance initiative projects, 2013 summary data).
The spreadsheet has 800 or so rows, and a multitude of columns, some of which are essentially grouped (although multi-level, hierarchical headings are not explicitly used) – columns relating to (estimated) spend in different tax years, for example, or the equity partners involved in a particular project.
As part of the OU course we’re currently developing on “data”, we’re exploring an assessment framework based on students running small data investigations, documented using IPython notebooks, in which we will expect students to demonstrate a range of technical and analytical skills, as well as some understanding of data structures and shapes, data management technology and policy choices, and so on. In support of this, I’ve been trying to put together one or two notebooks a week over the course of a few hours to see what sorts of thing might be possible, tractable and appropriate for inclusion in such a data investigation report within the time constraints allowed.
To this end, the Quick Look at UK PFI Contracts Data notebook demonstrates some of the ways in which we might learn something about the data contained within the PFI summary data spreadsheet. The first thing to note is that it’s not really a data investigation: there is no specific question I set out to ask, and no specific theme I set out to explore. It’s more exploratory (that is, rambling!) than that. It has more of the form what I’ve started referring to as a conversation with data.
In a conversation with data, we develop an understanding of what the data may have to say, along with a set of tools (that is, recipes, or even functions) that allow us to talk to it more easily. These tools and recipes are reusable within the context of the data set or datasets selected for the conversation, and may be transferrable to other data conversations. For example, one recipe might be how to filter and summarise a dataset in a particular way, or generate a particular view or reshaping of it that makes it easy to ask a particular sort of question, or generate a particular sort of a chart (creating a chart is like asking a question where the surface answer is returned in a graphical form).
If we are going to use IPython notebook documented conversations with data as part of an assessment process, we need to tighten up a little more how we might expect to see them structured and what we might expect to see contained within them.
Quickly skimming through the PFI conversation, elements of the following skills are demonstrated:
- downloading a dataset from a remote location;
– loading it into an appropriate data representation;
– cleaning (and type casting) the data;
– spotting possibly erroneous data;
– subsetting/filtering the data by row and column, including the use of partial string matching;
– generating multiple views over the data;
– reshaping the data;
– using grouping as the basis for the generation of summary data;
– sorting the data;
– joining different data views;
– generating graphical charts from derived data using “native” matplotlib charts and a separate charting library (ggplot);
– generating text based interpretations of the data (“data2text” or “data textualisation”).
The notebook also demonstrates some elements of reflection about how the data might be used. For example, it might be used in association with other datasets to help people keep track of the performance of facilities funded under PFI contracts.
Note: the notebook I link to above does not include any database management system elements. As such, it represents elements that might be appropriate for inclusion in a report from the first third of our data course – which covers basic principles of data wrangling using pandas as the dominant toolkit. Conversations held later in the course should also demonstrate how to get data into an out of an appropriately chosen database management system, for example.
An article in the Observer declares Academy chain accused of ‘privatisation by stealth’ over plan to outsource jobs, a report on how the Academy Enterprise Trust (AET) has “selected” PricewaterhouseCoopers (PwC) to partner with it in establishing an LLP “to take over responsibility for providing school business managers, IT staff, secretarial staff, and finanicial expertise, along with speech and language therapy provision, education psychology, education welfare, curriculum development and professional development”.
Here’s a quick chart of academy groupings with at least 15 academies in the group:
As you might expect, when talking about their performance, the group in part assumes you must be referring to their financial rather than just educational performance:
(By the by, they really need to talk to their webteam about the domain mapping…)
As well as outsourcing everything, it looks like the AET is of a size where they can start raking it in as a provider of teacher training courses:
(I don’t know if this means they can use their own academies for placements, getting cheap labour and perhaps a training grant bung, as well as the more general student fees, for each placement?)
School Direct places are offered by the AET National Teaching School Alliance in partnership with an good or outstanding Ofsted accredited teacher training provider.
You will spend the majority of your training in the classroom, building your confidence in an academy environment whilst receiving excellent mentoring support.
You will complete placements in academies who are experienced in initial teacher training and who excel at training and developing their staff.
All School Direct places will lead to Qualified Teacher Status (QTS). In addition, some providers also accredit a Postgraduate Certificate in Education (PGCE)
Participating AET academies recruit and select their own trainees with the expectation that they will then go on to work within the Academies Enterprise Trust, although there is no absolute guarantee of employment.
You may apply directly to the Academies Enterprise Trust National Teaching School Alliance or directly to the AET academy that you wish to be trained in or contact the AET National Teaching School Alliance.
Scale-up, plug-in, live-off public money and private student debt, partner with “trusted auditors” to disburse funds to private providers (because they previously got rumbled when making disbursements to themselves…?) – is that how it works?
By the by, the following schools are part of the AET group:
- primary: Westerings Primary Academy (Rayleigh and Wickford), Plumberow Primary School (Rayleigh and Wickford), Ashingdon School (Rayleigh and Wickford), Weston Academy (Isle of Wight), Hamford Primary Academy (Clacton), Oaks Academy (Faversham and Mid Kent), St James the Great Academy (Tonbridge and Malling), Tree Tops Academy (Faversham and Mid Kent), Langer Primary Academy (Suffolk Coastal), Molehill Copse Primary School (Faversham and Mid Kent), Percy Shurmer Academy (Birmingham, Hall Green), Brockworth Primary Academy (Tewkesbury), Offa’s Mead Academy (Forest of Dean), Severn View Academy (Stroud), Noel Park Primary School (Hornsey and Wood Green), Trinity Primary Academy (Hornsey and Wood Green), Hall Road Academy (Kingston upon Hull North), Newington Academy (Kingston upon Hull West and Hessle), The Green Way Academy (Kingston upon Hull North), Charles Warren Academy (Milton Keynes South), Barton Hill Academy (Torbay), North Ormesby Primary Academy (Middlesbrough), Montgomery Primary Academy (Birmingham, Hall Green), Feversham Primary Academy (Bradford East), Shafton Primary Academy (Barnsley East), St Helen’s Primary School (Barnsley Central), Lea Forest Academy (Birmingham, Hodge Hill), Cottingley Primary Academy (Leeds Central), Beacon Academy (Loughborough), Anglesey Primary Academy (Burton), Four Dwellings Primary Academy (Birmingham, Edgbaston), Caldicotes Primary Academy (Middlesbrough), Meadstead Primary Academy (Barnsley Central), Hazelwood Academy (South Swindon), North Thoresby Primary School (Louth and Horncastle), The Utterby Primary School (Louth and Horncastle);
– secondary: Unity City Academy (Middlesbrough), New Rickstones Academy (Witham), Greensward Academy (Rayleigh and Wickford), Maltings Academy (Witham), Clacton Coastal Academy (Clacton), Aylward Academy (Edmonton), Nightingale Academy (Edmonton), Richmond Park Academy (Richmond Park), Tendring Technology College (Clacton), Bexleyheath Academy (Bexleyheath and Crayford), Everest Community Academy (Basingstoke), Ryde Academy (Isle of Wight), Sandown Bay Academy (Isle of Wight), East Point Academy (Waveney), Felixstowe Academy (Suffolk Coastal), Millbrook Academy (Tewkesbury), The Duston School (South Northamptonshire), The Rawlett School (An Aet Academy) (Tamworth), The New Forest Academy (New Forest East), Childwall Sports & Science Academy (Liverpool, Wavertree), Sir Herbert Leon Academy (Milton Keynes South), Tamworth Enterprise College and AET Academy (Tamworth), Winton Community Academy (North West Hampshire), Broadlands Academy (North East Somerset), Greenwood Academy (Birmingham, Erdington), Cordeaux Academy (Louth and Horncastle), Four Dwellings Academy (Birmingham, Edgbaston), Kingsley Academy (Brentford and Isleworth), Kingswood Academy (Kingston upon Hull North), Swallow Hill Community College (Leeds West), Firth Park Academy (Sheffield, Brightside and Hillsborough), Hillsview Academy (Redcar);
– special: Columbus School and College (Chelmsford), The Pioneer School (Basildon and Billericay), Wishmore Cross Academy (Surrey Heath), Greenfield Academy (Stroud), Peak Academy (Stroud), The Ridge Academy (Cheltenham), Newlands School (Camberwell and Peckham).
Via one of my feeds from TheyWorkForYou, I noticed this written answer to a question my MP, Andrew Turner, asked of the Secretary of State for International Development:
I’ve been playing around with development data lately, trying to sketch together some pieces for an OpenLearn course on data visualisation for development (hopefully!), so I thought this would be a good test of how quickly I could find the data and confirm the results.
Working backwards, GDP data (in various adjusted forms) is available from the World Bank API, which I’ve been accessing via the remote data interface calls in pandas (for example, Easy Access to World Bank and UN Development Data from IPython Notebooks).
So where do get the aid ranking from?
There are two ways of doing this – one to look for local UK sources (eg from DFID perhaps), the other to look for international sources of data. The advantage of the former is that these are presumably the sources that whoever answered the question went to. The advantage of the latter is that we should be able to generalise the question to query similar rankings for aid distributed by other countries.
“Official Development Assistance” seems to be a key phrase, with a quick websearch for that phrase and the term “data” turning up this Aid statistics – charts, tables and databases resource page, which in turn points to a whole raft of datatables as Excel files detailing statistics on resource flows to developing countries; the International Development Statistics (IDS) online databases page links to several more general online databases. (There’s also a beta data.oecd.org site.)
Forsaking the raw data files for a minute, the site claims that “the Query Wizard for International Development Statistics [QWIDS] is the easiest way to search our database as it automatically extracts the most appropriate dataset from OECD.Stat to match your search” – so let’s try that… QWIDS.
Nice and simple then…?!
A bit of tinkering (setting the donor, unticking recipients so only countries – rather than countries and groupings are included) gives what I think is the data for the aid disbursements from the UK to other countries, data I could export as a CSV file; but there are no tools onsite to help me look at the top 10.
Poking around, it looks like the data’s also there to allow us to look at disbursements (or perhaps just allocations) by donor country and sector into a particular country? Maybe?! This would then let us see how aid was being allocated from the UK to the top 10 recipients, broken down by sector, which might be more illuminating? I also wonder if there are any relationships between aid paid by donors into a particular sector, and imports into the recipient country from the donor country within the same sectors? For this, we need trade data breakdowns. (We can get total flows between countries (I think?!) but I’m not sure how to find the data broken down by sector?)
The stats.oecd.org site does let us sort, but I couldn’t find an easy or clean way to limit results to countries, and exclude groupings:
The order (of aid disbursements from the UK in 2012) has the same rank order as the response to my MP’s question.
For the GDP and GDP per capita data, we can go to the World Bank:
Note a couple of things – units tend to be given in US dollars rather than Sterling; there are all sorts of US dollars… (see for example Accounting for Inflation – Deflators, or “What Does ‘Prices in Real Terms’ Actually Mean?”).
Hmm… maybe it would have been easier to find the data on the DFID site instead…
PS Indeed it was – Statistics on International Development 2013 – Tables has a link to a dataset that contains the league table: “Table 4: Top Twenty Recipients UK Net Bilateral ODA 2010 – 2012″.
The front page of this week’s Isle of Wight County Press describes a tragic incident relating to a particular care home on the Island earlier this year:
(Unfortunately, the story doesn’t seem to appear on the County Press’ website? Must be part of a “divide the news into print and online, and never the twain shall meet” strategy?!)
As I recently started pottering around the CQC website and various datasets they publish, I thought I’d jot down a few notes about what I could find. The clues from the IWCP article were the name of the care home – Waxham House, High Park Road, Ryde – and the proprietor – Sanjay Ramdany.
Using the “CQC Care Directory – With Filters” from the CQC data and information page, I found a couple of homes registered to that provider.
1-120578256, 19/01/2011, Waxham House, 1 High Park Road, Ryde, Isle of Wight, PO33 1BP 1-120578313, 19/01/2011, Cornelia Heights 93 George Street, Ryde, Isle of Wight, PO33 2JE 1-101701588, Mr Sanjay Prakashsingh Ramdany & Mrs Sandhya Kumari Ramdany
Looking up “Waxham House” on the CQC website gives us a copy of the latest report outcome:
Looking at the breadcrumb navigation, it seems we can directly get a list of other homes operated by the same proprietors:
I wonder if we can search the site by proprietor name too?
Looks like it…
So how did their other home fare?
By the by, according to the Food Standards Agency, how’s the food?
And how much money is the local council paying these homes?
[Click through on the image to see the app - hit Search to remove the error message and load the data!]
Why the refunds?
A check on OpenCorporates for director names turned up nothing.
I’m not trying to offer any story here about the actual case reported by the County Press, more a partial story about how we can start to look for data around a story to see if there may be more to the story we can find from open data sources.
I’ve no idea… Because there aren’t any, apparently: Poor data quality hindering government open data programme. And as I try to make sense of that article, it seems there aren’t any because of UTF-8, I think? Erm…
For my own council, the local hyperlocal, OnTheWight, publish a version of Adrian Short’s Armchair Auditor app at armchairauditor.onthewight.com. OnTheWight have turned a few stories from this data, I think, so they obviously have a strategy for making use of the data.
My own quirky skillset, such as it is, meant that it wasn’t too hard for me to start working with the original council published data to build an app showing spend in different areas, by company etc – Local Council Spending Data – Time Series Charts – although the actual application appears to have rotted (pound signs are not liked by the new shiny library and I can’t remember how to log in to the glimmer site:-(
I also tried to make sense of the data by trying to match it up to council budget areas, but that wasn’t too successful: What Role, If Any, Does Spending Data Have to Play in Local Council Budget Consultations?
But I still don’t know what questions to ask, what scripts to run? Some time ago, Ben Worthy asked Where are the Armchair Auditors? but I’m more interested in the question: what would they actually do? and what sort of question or series of questions might they usefully ask, and why?
Just having access to data is not really that very interesting. It’s the questions you ask for it, and the sorts of stories you look for in it, that count. So what stories might Armchair Auditors go looking for, what odd things might they seek out, what questions might they ask of the data?
Using browser based data analysis toolkits such as pandas in IPython notebooks, or R in RStudio, means you need to have access to python or R and the corresponding application server either on your own computer, or running on a remote server that you have access to.
When running occasional training sessions or workshops, this can cause several headaches: either a remote service needs to be set up that is capable of supporting the expected number of participants, security may need putting in place, accounts configured (or account management tools supported), network connections need guaranteeing so that participants can access the server, and so on; or participants need to install software on their own computers: ideally this would be done in advance of a training session, otherwise training time is spent installing, configuring and debugging software installs; some computers may have security policies that prevent users installing software, or require and IT person with admin privileges to install the software, and so on.
That’s why the coLaboratory Chrome extension looks like an interesting innovation – it runs an IPython notebook fork, with pandas and matplotlib as a Chrome Native Client application. I posted a quick walkthrough of the extension over on the School of Data blog: Working With Data in the Browser Using python – coLaboratory.
Via a Twitter exchange with @nativeclient, it seems that there’s also the possibility that R could run as a dependency free Chrome extension. Native Client seems to like things written in C/C++, which underpins R, although I think R also has some fortran dependencies. (One of the coLaboratory talks mentioned the to do list item of getting scipy (I think?) running in the coLaboratory extension, the major challenge there (or whatever the package was) being the fortran src; so there maybe be synergies in working the fortran components there?))
Within a couple of hours of the twitter exchange starting, Brad Nelson/@flagxor posted a first attempt at an R port to the Native Client. I don’t pretend to understand what’s involved in moving from this to an extension with some sort of useable UI, even if only a command line, but it represents an interesting possibility: of being able to run R in the browser (or at least, in Chrome). Package availability would be limited of course to packages compiled to run using PNaCl.
For training events, there is still the requirement that users install a Chrome browser on their computer and then install the extension into that. However, I think it is possible to run Chrome as a portable app – that is, from a flash drive such as a USB memory stick: Google Chrome Portable (Windows).
I’m not sure how fast it would be able to run, but it suggests there may be a way of carrying a portable, dependency free pandas environment around that you can run on a Windows computer from a USB key?! And maybe R too…?
Killer post title, eh?
Some time ago I put in an FOI request to the Isle of Wight Council for the transaction logs from a couple of ticket machines in the car park at Yarmouth. Since then, the Council made some unpopular decisions about car parking charges, got a recall and then in passing made the local BBC news (along with other councils) in respect of the extent of parking charge overpayments…
Here’s how hyperlocal news outlet OnTheWight reported the unfolding story…
- 11 new ways the council propose to make car parking more expensive
- Look again at parking and leisure centre charges, say Island Conservatives
- Increased car parking charges revealed
- Council could face legal action over car parking increases
- Council gives their view on the legal uses of car parking income
- Council claim they don’t yet know how many people wrote to them about parking changes
- Executive vote: Free parking in 24 car parks goes, including Appley and Puckpool and parking charges up
- Councillors ‘call-in’ decision on parking changes
- Date set for scrutiny of changes to parking charges
- Follow live coverage of parking changes being scrutinised (Updated) (includes a copy of the call-in notice)
- Isle of Wight car parkers overpaid £186,706.35 between 2011-13
I really missed a trick not getting involved in this process – because there is, or could be, a significant data element to it. And I had a sample of data that I could have doodled with, and then gone for the whole data set.
Anyway, I finally made a start on looking at the data I did have with a view to seeing what stories or insight we might be able to pull from it – the first sketch of my conversation with the data is here: A Conversation With Data – Car Parking Meter Data.
It’s not just the parking meter data that can be brought to bear in this case – there’s another set of relevant data too, and I also had a sample of that: traffic penalty charge notices (i.e. traffic warden ticket issuances…)
With a bit of luck, I’ll have a go at a quick conversation with that data over the next week or so… Then maybe put in a full set of FOI requests for data from all the Council operated ticket machines, and all the penalty notices issued, for a couple of financial years.
Several things I think might be interesting to look at Island-wide:
- in much the same was as Tube platforms suffer from loading problems, where folk surge around one entrance or another, do car parks “fill up” in some sort of order, eg within a car park (one meter lags the other in terms of tickets issued) or one car park lags another overall;
- do different car parks have a different balance of ticket types issued (are some used for long stay, others for short stay?) and does this change according to what day of the week it is?
- how does the issuance of traffic penalty charge notices compare with the sorts of parking meter tickets issued?
- from the timestamps of when traffic penalty charge notices tickets are issued, can we work out the rounds of different traffic warden patrols?
The last one might be a little bit cheeky – just like you aren’t supposed to share information about the mobile speed traps, perhaps you also aren’t supposed to share information that there’s a traffic warden doing the rounds…?!