Archive for the ‘Anything you want’ Category
Via one of my feeds from TheyWorkForYou, I noticed this written answer to a question my MP, Andrew Turner, asked of the Secretary of State for International Development:
I’ve been playing around with development data lately, trying to sketch together some pieces for an OpenLearn course on data visualisation for development (hopefully!), so I thought this would be a good test of how quickly I could find the data and confirm the results.
Working backwards, GDP data (in various adjusted forms) is available from the World Bank API, which I’ve been accessing via the remote data interface calls in pandas (for example, Easy Access to World Bank and UN Development Data from IPython Notebooks).
So where do get the aid ranking from?
There are two ways of doing this – one to look for local UK sources (eg from DFID perhaps), the other to look for international sources of data. The advantage of the former is that these are presumably the sources that whoever answered the question went to. The advantage of the latter is that we should be able to generalise the question to query similar rankings for aid distributed by other countries.
“Official Development Assistance” seems to be a key phrase, with a quick websearch for that phrase and the term “data” turning up this Aid statistics – charts, tables and databases resource page, which in turn points to a whole raft of datatables as Excel files detailing statistics on resource flows to developing countries; the International Development Statistics (IDS) online databases page links to several more general online databases. (There’s also a beta data.oecd.org site.)
Forsaking the raw data files for a minute, the site claims that “the Query Wizard for International Development Statistics [QWIDS] is the easiest way to search our database as it automatically extracts the most appropriate dataset from OECD.Stat to match your search” – so let’s try that… QWIDS.
Nice and simple then…?!
A bit of tinkering (setting the donor, unticking recipients so only countries – rather than countries and groupings are included) gives what I think is the data for the aid disbursements from the UK to other countries, data I could export as a CSV file; but there are no tools onsite to help me look at the top 10.
Poking around, it looks like the data’s also there to allow us to look at disbursements (or perhaps just allocations) by donor country and sector into a particular country? Maybe?! This would then let us see how aid was being allocated from the UK to the top 10 recipients, broken down by sector, which might be more illuminating? I also wonder if there are any relationships between aid paid by donors into a particular sector, and imports into the recipient country from the donor country within the same sectors? For this, we need trade data breakdowns. (We can get total flows between countries (I think?!) but I’m not sure how to find the data broken down by sector?)
The stats.oecd.org site does let us sort, but I couldn’t find an easy or clean way to limit results to countries, and exclude groupings:
The order (of aid disbursements from the UK in 2012) has the same rank order as the response to my MP’s question.
For the GDP and GDP per capita data, we can go to the World Bank:
Note a couple of things – units tend to be given in US dollars rather than Sterling; there are all sorts of US dollars… (see for example Accounting for Inflation – Deflators, or “What Does ‘Prices in Real Terms’ Actually Mean?”).
Hmm… maybe it would have been easier to find the data on the DFID site instead…
PS Indeed it was – Statistics on International Development 2013 – Tables has a link to a dataset that contains the league table: “Table 4: Top Twenty Recipients UK Net Bilateral ODA 2010 – 2012″.
The front page of this week’s Isle of Wight County Press describes a tragic incident relating to a particular care home on the Island earlier this year:
(Unfortunately, the story doesn’t seem to appear on the County Press’ website? Must be part of a “divide the news into print and online, and never the twain shall meet” strategy?!)
As I recently started pottering around the CQC website and various datasets they publish, I thought I’d jot down a few notes about what I could find. The clues from the IWCP article were the name of the care home – Waxham House, High Park Road, Ryde – and the proprietor – Sanjay Ramdany.
Using the “CQC Care Directory – With Filters” from the CQC data and information page, I found a couple of homes registered to that provider.
1-120578256, 19/01/2011, Waxham House, 1 High Park Road, Ryde, Isle of Wight, PO33 1BP 1-120578313, 19/01/2011, Cornelia Heights 93 George Street, Ryde, Isle of Wight, PO33 2JE 1-101701588, Mr Sanjay Prakashsingh Ramdany & Mrs Sandhya Kumari Ramdany
Looking up “Waxham House” on the CQC website gives us a copy of the latest report outcome:
Looking at the breadcrumb navigation, it seems we can directly get a list of other homes operated by the same proprietors:
I wonder if we can search the site by proprietor name too?
Looks like it…
So how did their other home fare?
By the by, according to the Food Standards Agency, how’s the food?
And how much money is the local council paying these homes?
[Click through on the image to see the app - hit Search to remove the error message and load the data!]
Why the refunds?
A check on OpenCorporates for director names turned up nothing.
I’m not trying to offer any story here about the actual case reported by the County Press, more a partial story about how we can start to look for data around a story to see if there may be more to the story we can find from open data sources.
I’ve no idea… Because there aren’t any, apparently: Poor data quality hindering government open data programme. And as I try to make sense of that article, it seems there aren’t any because of UTF-8, I think? Erm…
For my own council, the local hyperlocal, OnTheWight, publish a version of Adrian Short’s Armchair Auditor app at armchairauditor.onthewight.com. OnTheWight have turned a few stories from this data, I think, so they obviously have a strategy for making use of the data.
My own quirky skillset, such as it is, meant that it wasn’t too hard for me to start working with the original council published data to build an app showing spend in different areas, by company etc – Local Council Spending Data – Time Series Charts – although the actual application appears to have rotted (pound signs are not liked by the new shiny library and I can’t remember how to log in to the glimmer site:-(
I also tried to make sense of the data by trying to match it up to council budget areas, but that wasn’t too successful: What Role, If Any, Does Spending Data Have to Play in Local Council Budget Consultations?
But I still don’t know what questions to ask, what scripts to run? Some time ago, Ben Worthy asked Where are the Armchair Auditors? but I’m more interested in the question: what would they actually do? and what sort of question or series of questions might they usefully ask, and why?
Just having access to data is not really that very interesting. It’s the questions you ask for it, and the sorts of stories you look for in it, that count. So what stories might Armchair Auditors go looking for, what odd things might they seek out, what questions might they ask of the data?
Using browser based data analysis toolkits such as pandas in IPython notebooks, or R in RStudio, means you need to have access to python or R and the corresponding application server either on your own computer, or running on a remote server that you have access to.
When running occasional training sessions or workshops, this can cause several headaches: either a remote service needs to be set up that is capable of supporting the expected number of participants, security may need putting in place, accounts configured (or account management tools supported), network connections need guaranteeing so that participants can access the server, and so on; or participants need to install software on their own computers: ideally this would be done in advance of a training session, otherwise training time is spent installing, configuring and debugging software installs; some computers may have security policies that prevent users installing software, or require and IT person with admin privileges to install the software, and so on.
That’s why the coLaboratory Chrome extension looks like an interesting innovation – it runs an IPython notebook fork, with pandas and matplotlib as a Chrome Native Client application. I posted a quick walkthrough of the extension over on the School of Data blog: Working With Data in the Browser Using python – coLaboratory.
Via a Twitter exchange with @nativeclient, it seems that there’s also the possibility that R could run as a dependency free Chrome extension. Native Client seems to like things written in C/C++, which underpins R, although I think R also has some fortran dependencies. (One of the coLaboratory talks mentioned the to do list item of getting scipy (I think?) running in the coLaboratory extension, the major challenge there (or whatever the package was) being the fortran src; so there maybe be synergies in working the fortran components there?))
Within a couple of hours of the twitter exchange starting, Brad Nelson/@flagxor posted a first attempt at an R port to the Native Client. I don’t pretend to understand what’s involved in moving from this to an extension with some sort of useable UI, even if only a command line, but it represents an interesting possibility: of being able to run R in the browser (or at least, in Chrome). Package availability would be limited of course to packages compiled to run using PNaCl.
For training events, there is still the requirement that users install a Chrome browser on their computer and then install the extension into that. However, I think it is possible to run Chrome as a portable app – that is, from a flash drive such as a USB memory stick: Google Chrome Portable (Windows).
I’m not sure how fast it would be able to run, but it suggests there may be a way of carrying a portable, dependency free pandas environment around that you can run on a Windows computer from a USB key?! And maybe R too…?
Killer post title, eh?
Some time ago I put in an FOI request to the Isle of Wight Council for the transaction logs from a couple of ticket machines in the car park at Yarmouth. Since then, the Council made some unpopular decisions about car parking charges, got a recall and then in passing made the local BBC news (along with other councils) in respect of the extent of parking charge overpayments…
Here’s how hyperlocal news outlet OnTheWight reported the unfolding story…
- 11 new ways the council propose to make car parking more expensive
- Look again at parking and leisure centre charges, say Island Conservatives
- Increased car parking charges revealed
- Council could face legal action over car parking increases
- Council gives their view on the legal uses of car parking income
- Council claim they don’t yet know how many people wrote to them about parking changes
- Executive vote: Free parking in 24 car parks goes, including Appley and Puckpool and parking charges up
- Councillors ‘call-in’ decision on parking changes
- Date set for scrutiny of changes to parking charges
- Follow live coverage of parking changes being scrutinised (Updated) (includes a copy of the call-in notice)
- Isle of Wight car parkers overpaid £186,706.35 between 2011-13
I really missed a trick not getting involved in this process – because there is, or could be, a significant data element to it. And I had a sample of data that I could have doodled with, and then gone for the whole data set.
Anyway, I finally made a start on looking at the data I did have with a view to seeing what stories or insight we might be able to pull from it – the first sketch of my conversation with the data is here: A Conversation With Data – Car Parking Meter Data.
It’s not just the parking meter data that can be brought to bear in this case – there’s another set of relevant data too, and I also had a sample of that: traffic penalty charge notices (i.e. traffic warden ticket issuances…)
With a bit of luck, I’ll have a go at a quick conversation with that data over the next week or so… Then maybe put in a full set of FOI requests for data from all the Council operated ticket machines, and all the penalty notices issued, for a couple of financial years.
Several things I think might be interesting to look at Island-wide:
- in much the same was as Tube platforms suffer from loading problems, where folk surge around one entrance or another, do car parks “fill up” in some sort of order, eg within a car park (one meter lags the other in terms of tickets issued) or one car park lags another overall;
- do different car parks have a different balance of ticket types issued (are some used for long stay, others for short stay?) and does this change according to what day of the week it is?
- how does the issuance of traffic penalty charge notices compare with the sorts of parking meter tickets issued?
- from the timestamps of when traffic penalty charge notices tickets are issued, can we work out the rounds of different traffic warden patrols?
The last one might be a little bit cheeky – just like you aren’t supposed to share information about the mobile speed traps, perhaps you also aren’t supposed to share information that there’s a traffic warden doing the rounds…?!
First up, Downes suggests that:
The traditional course is designed like a book – it is intended to run in a sequence, the latter bits build on the first bits, and if you start a book and abandon it p[art way through there is a real sense in which you can say the book has failed, because the whole point of a book is to read it from beginning to end.
But our MOOCs are not designed like that. Though they have a beginning and an end and a range of topics in between, they’re not designed to be consumed in a linear fashion the way a book it. Rather, they’re much more like a magazine or a newspaper (or an atlas or a city map or a phone book). The idea is that there’s probably more content than you want, and that you’re supposed to pick and choose from the items, selecting those that are useful and relevant to your present purpose.
And so here’s the response to completion rates: nobody ever complained that newspapers have low completion rates. And yet no doubt they do,. Probably far below the ‘abysmal’ MOOC completion rates (especially if you include real estate listings and classified ads). People don’t read a newspaper to complete it, they read a newspaper to find out what’s important.
Martin (Weller) responds:
Stephen Downes has a nice analogy, (which he blogged at my request, thankyou Stephen) in that it’s like a newspaper, no-one drops out of a newspaper, they just take what they want. This has become repeated rather like a statement of fact now. I think Stephen’s analogy is very powerful, but it is really a statement of intent. If you design MOOCs in a certain way, then the MOOC experience could be like reading a newspaper. The problem is 95% of MOOCs aren’t designed that way. And even for the ones that are, completion rates are still an issue.
Here’s why they’re an issue. MOOCs are nearly always designed on a week by week basis (which would be like designing a newspaper where you had to read a certain section by a certain time). I’ve blogged about this before, but from Katy Jordan’s data we reckon 45% of those who sign up, never turn up or do anything. It’s hard to argue that they’ve had a meaningful learning experience in any way. If we register those who have done anything at all, eg just opened a page, then by the end of week 2 we’re down to about 35% of initial registrations. And by week 3 or 4 it’s plateauing near 10%. The data suggests that people are definitely not treating it like a newspaper. In Japan some research was done on what sections of newspapers people read.
He goes on:
… Most MOOCs are about 6-7 weeks long, so 90% of your registered learners are never even looking at 50% of your content. That must raise the question of why are you including it in the first place? If a subject requires a longer take at it, beyond 3 weeks say, then MOOCs really may not be a very good approach to it. There is a hard, economic perspective here, it costs money to make and run MOOCs, and people will have to ask if the small completion rates are the most effective way to get people to learn that subject. You might be better off creating more stand alone OERs, or putting money into better supported outreach programmes where you can really help people stay with the course. Or maybe you will actually design your MOOC to be like a newspaper.
I buy three newspapers a week – the Isle of Wight County Press (to get a feel for what’s happened and is about to happen locally, as well as seeing who’s currently recruiting), the Guardian on a Saturday (see what news stories made it as far as Saturday comment, do the Japanese number puzzles, check out the book reviews, maybe read the odd long form interview and check a recipe or two), and the Observer on a Sunday (read colleagues’ columns, longer form articles by journalists I know or have met, check out any F1 sports news that made it into that paper, book reviews, columns, and Killer again…).
So I skim bits, have old faithfuls I read religiously, and occasionally follow through on a long form article that was maybe advertised on the cover and I might have missed otherwise.
Newspapers are organised in a particular way, and that lets me quickly access the bits I know I want to access, and throw the rest straight onto the animal bedding pile, often unread and unopened.
So MOOCs are not really like that, at least, not for me.
For me MOOCs are freebie papers I’ve picked up and then thrown, unread, onto the animal bedding pile. For me.
What I can see, though, as MOOCs as partworks. Partworks are those titles you see week on week in the local newsagent with a new bit on the cover that, if collected over weeks and months and assembled in the right way, result in a flimsy plastic model you’ve assembled yourself with an effective cost price running into hundreds of pounds.
[Retro: seems I floated the MOOC as partwork idea before - Online Courses or Long Form Journalism? Communicating How the World Works… - and no-one really bit then either...]
In the UK, there are several notable publishers of partwork titles, including for example Hachette, De Agostini,Eaglemoss. Check out their homepages – then check out the homepages of a few MOOC providers. (Note to self – see if any folk working in marketing of MOOC platform providers came from a partwork publishing background.)
Here’s a riff reworking the Wikipedia partwork page:
partworkMOOC is a written publicationan online course released as a series of planned magazine-like issueslessons over a period of time. IssuesLessons are typically released on a weekly, fortnightly or monthly basis, and often a completed set is designed to form a reference work oncomplete course in a particular topic. Partwork seriesMOOCs run for a determined length and have a finite life. Generally, partworksMOOCs cover specific areas of interest, such as sports, hobbies, or children’s interest and stories such as PC Ace and the successful The Ancestral Trail series by Marshall Cavendish Ltdrandom university module subjects, particularly ones that tie in to the telly or hyped areas of pseudo-academic interest. They are generally sold at newsagents and are mostly supported by massive television advertising campaigns for the launchhosted on MOOC platforms because exploiting user data and optimising user journeys through learning content is something universities don't really understand and avoid trying to do. In the United Kingdom, partworksMOOCs are usually launched by heavy television advertising each Januarymentioned occasionally in the press, often following a PR campaign by the UK MOOC platfrom, FutureLearn. PartworksMOOCs often include cover-mounted items with each issue that build into a complete set over time. For example, a partwork about artMOOC might include a small number of paints or pencils that build into a complete art-setso-called "badges" that can be put into an online "backpack" to show off to your friends, family, and LinkedIn trawlers ; a partwork about dinosaurs might include a few replica bones that build a complete model skeleton at the end of the series; a partwork about films may include a DVD with each issue. In Europe, partworks with collectable models are extremely popular; there are a number of different publications that come with character figurines or diecast model vehicles, for example: The James Bond Car Collection.
In addition, completed
partworksMOOCs have sometimes been used as the basis for receiving a non-academic credit bearing course completion certificate, or to create case-bound reference works and encyclopediasa basis for a piece of semi-formal assessment and recognition. An example is the multi-volume Illustrated Science and Invention Encyclopedia which was created with material first published in the How It Works partworkNEED TO FIND A GOOD EXAMPLE.
In the UK,
partworksMOOCs are the fourth-best selling magazine sector, after TV listing guides, women’s weeklies and women’s monthliesNEED SOME NUMBERS HERE*.... A common inducement is a heavy discount for the first one or two issues??HOW DO MOOCs SELL GET SOLD?. The same seriesMOOC can be sold worldwide in different languages and even in different variations.
* Possibly useful starting point? BBC News Magazine: Let’s get this partwork started
The Wikipedia page goes on to talk about serialisation (ah, the good old days when I still had hoped for feeds and syndication… eg OpenLearn Daily Learning Chunks via RSS and then Serialised OpenLearn Daily RSS Feeds via WordPress) and the Pecia System (new to me), which looks like it could provide an interesting starting point on a model of peer-co-created learning, or somesuch. There’s probably a section on it in this year’s Innovating Pedagogy report. Or maybe there isn’t?!;-)
Sort of related but also not, this article from icrossing on ‘Subscribe is the new shop.’ – Are subscription business models taking over? and John Naughton’s column last week on the (as then, just leaked) Kindle subscription model – Kindle Unlimited: it’s the end of losing yourself in a good book, I’m reminded of Subscription Models for Lifelong Students and Graduate With Who (Whom?!;-), Exactly…?, which several people argued against and which I never really tried to defend, though I can’t remember what the arguments were, and I never really tried to build a case with numbers in it to see whether or not it might make sense. (Because sometimes you think the numbers should work out in your favour, but then they don’t… as in this example: Restaurant Performance Sunk by Selfies [via RBloggers].)
Erm, oh yes – back to the MOOCs.. and the partworks models. Martin mentioned the economics – just thinking about the partwork model (pun intended, or maybe not) here, how are parts costed? Maybe an expensive loss leader part in week 1, then cheap parts for months, then the expensive parts at the end when only two people still want them? How will print on demand affect partworks (newsagent has a partwork printer round the back to print of the bits that are needed for whatever magazines are sold that week?) And how do the partwork costing models then translate to MOOC production and presentation models?
Big expensively produced materials in front loaded weeks, then maybe move to smaller presentation methods, get the forums working a little better with smaller, more engaged groups? How about the cMOOC ideas – up front in early weeks, or pushed back to later weeks, where different motivations, skills, interest and engagement models play out.
MOOCs are newspapers? Nah… MOOCs as partwork – that works better as a model for me. (You can always buy a partwork mid-way through because you are interested in that week’s content, or the content covered by the magazine generally, not because you are interested in the plastic model or badge.
Thinks: hmm, partworks come in at least two forms, don’t they – one to get pieces to build a big model of a boat or a steam train or whatever. The other where you get a different superhero figurine each week and the aim it attract the completionist. Which isn’t to say that part 37 might not be stupidly popular because it has a figure that is just generally of interest, ex- of being part of a set?
One of the comment themes I’ve noticed around the first Challenge in the Tata F1 Connectivity Innovation Prize, a challenge to rethink what’s possible around the timing screen given only the data in the real time timing feed, is that the non-programmers don’t get to play. I don’t think that’s true – the challenge seems to be open to ideas as well as practical demonstrations, but it got me thinking about what technical ways in might be to non-programmers who wouldn’t know where to start when it came to working with the timing stream messages.
The answer is surely the timing screen itself… One of the issues I still haven’t fully resolved is a proven way of getting useful information events from the timing feed – it updates the timing screen on a cell by cell basis, so we have to finesse the way we associate new laptimes or sector times with a particular driver, bearing in mind cells update one at a time, in a potentially arbitrary order, and with potentially different timestamps.
So how about if we work with a “live information model” by creating a copy of an example timing screen in a spreadsheet. If we know how, we might be able to parse the real data stream to directly update the appropriate cells, but that’s largely by the by. At least we have something we can work work to start playing with the timing screen in terms of a literal reimagining of it. So what can we do if we put the data from an example timing screen into a spreadsheet?
If we create a new worksheet, we can reference the cells in the “original” timing sheet and pull values over. The timing feed updates cells on a cell by cell basis, but spreadsheets are really good at rippling through changes from one or more cells which are themselves reference by one or more others.
The first thing we might do is just transform the shape of the timing screen. For example, we can take the cells in a column relating to sector 1 times and put them into a row.
The second thing we might do is start to think about some sums. For example, we might find the difference between each of those sector times and (for practice and qualifying sessions at least) the best sector time recorded in that session.
The third thing we might do is to use a calculated value as the basis for a custom cell format that colours the cell according to the delta from the best session time.
Simple, but a start.
I’ve not really tried to push this idea very far – I’m not much of a spreadsheet jockey – but I’d be interested to know how folk who are might be able to push this idea…
PS FWIW, my entry to the competition is here: #f1datajunkie challenge 1 entry. It’s perhaps a little off-brief, but I’ve been meaning to do this sort of summary for some time, and this was a good starting point. If I get a chance, I’ll have a go a getting the parsers to work properly properly!