Confusing Chart? Seaborn Jointplot

I’ve just been doodling with some data and the seaborn graphic library and managed to confuse myself a couple of times when quickly glancing at some “jointplots” that add marginal histograms to a scatter plot.

The code is easy enough:

sns.jointplot(x='Indoors Sub-domain Rank (where 1 is most deprived)',
           y='Outdoors Sub-domain Rank (where 1 is most deprived)', 

which give charts of the form:


but how do you intuitively see the histograms to compare them?


My at-a-glance (tired, past midnight…) reaction keeps “seeing” the top comparison, representing a 90 degree counterclockwise rotation about the top left corner of the y-axis chart. But if you think about for more than a glance, that obviously puts the large-y values at low-x, and low y-values at large-x.

The correct way is the bottom one; to make the comparison you need to flip the y-axis chart, folding its bottom right corner towards the top-left corner of the x-axis chart. This is a rotation and a reflection. Which is hard.

But if we do the flip to try to help (can we do this using seaborn???):


my eye now feels as if it wants to help out, keeping all the near corners of the various charts close together (top right of the y-axis chart, bottom left of the x-axis one, and top left of the main panel) all close together, and flipping the bottom left corner of the y-axis chart as if up to the top right corner of the x-axis chart. (i.e. in this configuration it wants to do the flip?)

The distribution of bars in  the marginal charts may also be complicating matters, encouraging the eye to match large with large (which in this case is wrong…).


Link Sharing in Classrooms, Workshops and Conferences

Every so often, I get reminded of old OUseful experiments by recent news, some of which still seem to work, and some of which don’t (the latter usually as a result of link rot or third party service closures).

So for example, a few weeks ago, the Google Education blog announced a new Chrome extension – Share to Classroom (Get your students on the same (web)page, instantly.

In the first case, it seems as if the extension lets someone at the front of the room share pages to class member screens inside the room. Secondly, folk in the class are also free to browse to other pages of their own finding and push suggestions of new pages to the person at the front, who can review them and then share them with everyone.

Anyway, in part it reminded me of one of my old hacks – the FeedShow Link Presenter. This was a much cruder affair, creating a frame based page that accepted an RSS feed of links referred to by the main presenter that audience members could click forwards and back through, but that also seems to have started exploring a “back with me” feature to sync everyone’s pages.

Presenters could go off piste (splashing page links not contained in the feed, but it looks as if these couldn’t be synched. (I’m not sure if I addressed that in a later revision.) Nor could audience members suggest links back to the main presenter.

The Feedshow link presenter had a Yahoo Pipes backend, and still seems to work; but with Pipes dues to close on September 30th, it looks as if this experiment’s time will have come to an end…


Ho hum… (I guess I’ve stopped doing link based presentations anyway, but it was nice to be reminded of them:-)

Poking Around the VW Thing (Algorithmic Cheating in the Automobile Industry)

The news in recent days that VW had installed a software device into some of it’s diesel engines that identify when the engine is being tested for emissions on a dynamometer (“dyno”) provides a nice demonstration of how “intelligent” software control systems can be use to identify particular operating environments and switch into an appropriate – or inappropriate – operational mode.

As I keep finding, press coverage of the events seems to offer less explanation and context than the original document that seems to have kicked off the recent publicity, specifically a letter from the US environmental protection agency to the Volkswagen Group of America:


(The document is a PDF of a scanned document; I had hoped to extracted the text using a variant of this recipe, running Tika on my own computer via Kitematic, Getting Text Out Of Anything (docs, PDFs, Images) Using Apache Tika, but it doesnlt seem to extract text from the inlined image(s). Instead, I uploaded it to Google Drive, then opened it in Google docs – you see pages from the original doc as an image, and below is the extracted text.)

Here are some of the bits that jumped out at me:

A defeat device is an AECD [Auxiliary Emission Control Device] “that reduces the effectiveness of the emission control system under conditions which may reasonably be expected to be encountered in normal vehicle operation and procedure use, unless: (1) Such conditions are substantially included in the Federal emission test procedure; (2) The need for the AECD is justified in terms of protecting the vehicle against damage or accident; (3) The AECD does not go beyond the requirements of engine starting; or (4) The AECD applies only for emergency vehicles …” 40 C.F.R. § 86.1803-01.

Motor vehicles equipped with defeat devices, … cannot be certified.

The CAA makes it a violation “for any person to manufacture or sell, or offer to sell, or install, any part or component intended for use with, or as part of any motor vehicle or motor vehicle engine, where a principal effect of the part or component is to bypass, defeat, or render inoperative any device or element of design installed on or in a motor vehicle or motor vehicle engine in compliance with regulations under this subchapter, and where the person knows or should know that such part or component is being offered for sale or installed for such use or put to such use.” CAA § 203(a)(3)(B), 42 U.S.C. § 7522(a)(3)(B); 40 C.F.R. § 86.1854-12(a)(3)(ii).

Each VW vehicle identified … has AECDs that were not described in the application for the COC that purportedly covers the vehicle. Specifically, VW manufactured and installed software in the electronic control module (ECM) of these vehicles that sensed when the vehicle was being tested for compliance with EPA emission standards. For ease of reference, the EPA is calling this the “switch.” The “switch” senses whether the vehicle is being tested or not based on various inputs including the position of the steering wheel, vehicle speed, the duration of the engine’s operation, and barometric pressure. These inputs precisely track the parameters of the federal test procedure used for emission testing for EPA certification purposes. During EPA emission testing, the vehicles ECM ran software which produced compliant emission results under an ECM calibration that VW referred to as the “dyno calibration” (referring to the equipment used in emissions testing, called a dynamometer). At all other times during normal vehicle operation, the “switch” was activated and the vehicle ECM software ran a separate “road calibration” which reduced the effectiveness of the emission control system (specifically the selective catalytic reduction or the lean NOx trap). As a result, emissions of NOx increased by a factor of 10 to 40 times above the EPA compliant levels, depending on the type of drive cycle (e.g., city, highway). … Over the course of the year following the publication of the WVU study [TH: see link below], VW continued to assert to CARB and the EPA that the increased emissions from these vehicles could be attributed to various technical issues and unexpected in-use conditions. VW issued a voluntary recall in December 2014 to address the issue. … When the testing showed only a limited benefit to the recall, CARB broadened the testing to pinpoint the exact technical nature of the vehicles’ poor performance, and to investigate why the vehicles’ onboard diagnostic system was not detecting the increased emissions. None of the potential technical issues suggested by VW explained the higher test results consistently confirmed during CARB’s testing. It became clear that CARB and the EPA would not approve certificates of conformity for VW’s 2016 model year diesel vehicles until VW could adequately explain the anomalous emissions and ensure the agencies that the 2016 model year vehicles would not have similar issues. Only then did VW admit it had designed and installed a defeat device in these vehicles in the form of a sophisticated software algorithm that detected when a vehicle was undergoing emissions testing.

VW knew or should have known that its “road calibration” and “switch” together bypass, defeat, or render inoperative elements of the vehicle design related to compliance with the CAA emission standards. This is apparent given the design of these defeat devices. As described above, the software was designed to track the parameters of the federal test procedure and cause emission control systems to underperform when the software determined that the vehicle was not undergoing the federal test procedure.

VW’s “road calibration” and “switch” are AECDs’ that were neither described nor justified in the applicable COC applications, and are illegal defeat devices.

The news also reminded of another tech journalism brouhahah from earlier this year around tractor manufacturer John Deere arguing that farmers don’t own their tractors, but instead purchase “an implied license for the life of the vehicle to operate the vehicle” (Wired, We Can’t Let John Deere Destroy the Very Idea of Ownership).

I didn’t really follow that story properly at the time but it seems the news arose out of a response to a consultation by the US Copyright Office around the Digital Millennium Copyright Act (DMCA), and in particular a “Proposed Class 21: Vehicle software – diagnosis, repair, or modification” category (first round comments, second round comments) for the DCMA Section 1201: Exemptions to Prohibition Against Circumvention of Technological Measures Protecting Copyrighted Works.

Here’s how the class was defined:

21. Proposed Class 21: Vehicle software – diagnosis, repair, or modification
This proposed class would allow circumvention of TPMs [technological protection measures] protecting computer programs that control the functioning of a motorized land vehicle, including personal automobiles, commercial motor vehicles, and agricultural machinery, for purposes of lawful diagnosis and repair, or aftermarket personalization, modification, or other improvement. Under the exemption as proposed, circumvention would be allowed when undertaken by or on behalf of the lawful owner of the vehicle.

Note the phrase “for purposes of lawful diagnosis and repair”…

I also note a related class:

22. Proposed Class 22: Vehicle software – security and safety research
This proposed class would allow circumvention of TPMs protecting computer programs that control the functioning of a motorized land vehicle for the purpose of researching the security or safety of such vehicles. Under the exemption as proposed, circumvention would be allowed when undertaken by or on behalf of the lawful owner of the vehicle.

(and in passing note Proposed Class 27: Software – networked medical devices…).

Looking at some of the supporting documents, it’s interesting to see how the lobby moved. For example, from the Senior Director of Environmental Affairs for the Alliance of Automobile Manufacturers:

The proponents state that an exemption is needed for three activities related to vehicles – diagnosis, repair, and modification. In my limited time, I will explain why, for the first two activities – diagnosis and repair – there is no need to circumvent access controls on Electronic Control Units (ECUs). Then, I will address why tampering with ECUs to “modify” vehicle performance undermines national regulatory goals for clean air, fuel efficiency, and auto safety, and why the Copyright Office should care about that.

1. Diagnosis/repair
The arguments put forward by the proponents of this exemption are unfounded. State and federal regulations, combined with the Right to Repair MOU and the 2002 “Dorgan letter,” guarantee all independent repair shops and individual consumers access to all the information and tools needed to diagnose and repair Model Year 1996 or newer cars. This information and these tools are already accessible online, through a thriving and competitive aftermarket. Every piece of information and every tool used to diagnose and repair vehicles at franchised dealers is available to every consumer and every independent repair shop in America. This has been the case for the past 12 years. Moreover, all of these regulations and agreements require automakers to provide the information and tools at a “fair and reasonable price.” No one in the last 12 years has disputed this fact, in any of the various avenues for review provided, including U.S. EPA, the California Air Resources Board, and joint manufacturer-aftermarket organizations.

There is absolutely no need to hack through technological protection measures and copy ECU software to diagnose and repair vehicles.

2. Modification
The regulations and agreements discussed above do not apply to information needed to “modify” engine and vehicle software. We strongly support a competitive marketplace in the tools and information people need so their cars continue to perform as designed, in compliance with all regulatory requirements. But helping people take their cars out of compliance with those requirements is something we certainly do not want to encourage. That, in essence, is what proponents of exemption #21 are calling for, in asserting a right to hack into vehicle software for purposes of “modification.” In the design and operation of ECUs in today’s automobiles, manufacturers must achieve a delicate balance among many competing regulatory demands, notably emissions (air pollution); fuel economy; and of course, vehicle safety. If the calibrations are out of balance, the car may be taken out of compliance. This is so likely to occur with many of the modifications that the proponents want to make that you could almost say that noncompliance is their goal, or at least an inevitable side effect.

Manufacturer John Deere suggested that:

1. The purpose and character of the use frustrate compliance with federal public safety and environmental regulations

The first fair use factor weighs against a finding of fair use because the purpose and character of the use will encourage non-compliance with environmental regulations and will interfere with the ability of manufacturers to identify and resolve software problems, conduct recalls, review warranty claims, and provide software upgrade versions.

And General Motors seem to take a similar line:

TPMs also ensure that vehicles meet federally mandated safety and emissions standards. For example, circumvention of certain emissions-oriented TPMs, such as seed/key access control mechanisms, could be a violation of federal law. Notably, the Clean Air Act (“CAA”) prohibits “tampering” with vehicles or vehicle engines once they have been certified in a certain configuration by the Environmental Protection Agency (“EPA”) for introduction into U.S. commerce. “Tampering” includes “rendering inoperative” integrated design elements to modify vehicle and/or engine performance without complying with emissions regulations. In addition, the Motor Vehicle Safety Act (“MVSA”) prohibits the introduction into U.S. commerce of vehicles that do not comply with the Federal Motor Vehicle Safety Standards, and prohibits manufacturers, dealers, distributors, or motor vehicle repair businesses from knowingly making inoperative any part of a device or element of design installed on or in a motor vehicle in compliance with an applicable motor vehicle standard.14

Further, tampering with these systems would not be obvious to a subsequent owner or driver of a vehicle that has been tampered with. If a vehicle’s airbag systems, including any malfunction indicator lights, have been disabled (whether deliberately or inadvertently), a subsequent vehicle owner’s safety will be in jeopardy without warning. Further, if a vehicle’s emissions systems have been tampered with, a subsequent owner would have no way of knowing this has occurred. For tampering that the subsequent owner eventually discovers, manufacturer warranties do not cover the repair of damage caused by the tampering, placing the repair cost on the subsequent owner. For good cause, federal environmental and safety regulations regarding motor vehicles establish a well-recognized overall policy against allowing tampering with in-vehicle electronic systems designed for safety and emissions control.

While so-called “tinkerers” and enthusiasts may wish to modify their vehicle software for personal needs, granting greater access to vehicle software for purposes of modification fails to consider the overall concerns surrounding regulatory compliance and safety and the overall impact on safety and the environment. … Thus, the current prohibition ensures the distribution of safe and secure vehicle software within an overall vehicle security strategy implemented by car manufacturers that does not restrict vehicle owners’ ability to diagnose, modify or repair their cars.

The arguments from the auto lobby therefore go along the lines of “folk can’t mess with the code because they’ll try to break the law”, as opposed to the manufacturers systematically breaking the law, or folk trying to find out why a car performs nothing like the apparently declared figures. And I’m sure there are no elements of the industry wanting to prevent folk from looking at the code lest they find that it has “test circumvention” code baked in to it by the actual manufacturers…

What the VW case throws up, perhaps, is the need for a clear route for investigators to be allowed to find a way of checking on the compliance behaviour of various algorithms, not just in formal tests but also in unannounced to the engine management system, everyday road tests.

And that doesn’t necessarily require on-the-road tests in a real vehicle. If the controller is a piece of software acting on digitised sensor inputs to produce a particular set of control algorithm outputs, the controller can be tested on a digital testbench or test harness against various test inputs covering a variety of input conditions captured from real world data logging. This is something I think I need to read up more about… this could be a quick way in to the very basics: National Instruments: Building Flexible, Cost-Effective ECU Test Systems White Paper. Something like this could also be relevant: Gehring, J. and Schütte, H., “A Hardware-in-the-Loop Test Bench for the Validation of Complex ECU Networks”, SAE Technical Paper 2002-01-0801, 2002, doi:10.4271/2002-01-0801 (though the OU Library fails to get me immediate access to this resource…:-(.

PS In passing, I just spotted this: Auto Parts Distributor Pleads Guilty to Manufacturing and Selling Pirated Mercedes-Benz Software – it seems that Mercedes-Benz distribute “a portable tablet-type computer that contains proprietary software created by to diagnose and repair its automobiles and that requires a code or ‘license key’ to access [it]” and that a company had admitted to obtaining “without authorization, … [the] Mercedes-Benz SDS software and updates, modified and duplicated the software, and installed the software on laptop computers (which served as the SDS units)”. So a simple act of software copyright/license infringement, perhaps, relating to offboard testing and diagnostic tools. But another piece in the jigsaw, for example, when it comes to engineering software that can perform diagnostics.

PPS via @mhawksey, a link to the relevant West Virginia University test report – In-use emissions testing of light-duty diesel vehicles in the U.S. and noting Martin’s observation that there are several references to Volkswagen’s new 2.0 l TDI engine for the most stringent emission standards — Part 2 (reference [31] in the paper, Hadler, J., Rudolph, F., Dorenkamp, R., Kosters, M., Mannigel, D., and Veldten, B., “Volkswagen’s New 2.0l TDI Engine for the Most Stringent Emission Standards – Part 2,” MTZ Worldwide, Vol. 69, June, (2008). , which the OU Library at least doesn’t subscribe to:-(…

“Interestingly” the report concluded:

In summary, real-world NOx emissions were found to exceed the US-EPA Tier2-Bin5 standard (at full useful life) by a factor of 15 to 35 for the LNT equipped Vehicle A, by a factor of 5 to 20 for the urea-SCR fitted Vehicle B (same engine as Vehicle A) and at or below the standard for Vehicle C with exception of rural-up/downhill driving conditions, over five predefined test routes. Generally, distance-specific NOx emissions were observed to be highest for rural-up/downhill and lowest for high-speed highway driving conditions with relatively flat terrain. Interestingly, NOx emissions factors for Vehicles A and B were below the US-EPA Tier2-Bin5 standard for the weighted average over the FTP-75 cycle during chassis dynamometer testing at CARB’s El Monte facility, with 0.022g/km ±0.006g/km (±1σ, 2 repeats) and 0.016g/km ±0.002g/km (±1σ, 3 repeats), respectively.

It also seems that the researchers spotted what might be happening to explain the apparently anomalous results they were getting with help from reference 31: “The probability of this explanation is additionally supported by a detailed description of the after-treatment control strategy for Vehicle A presented elsewhere [31]“.

PPPS I guess one way of following the case might be track the lawsuits, such as this class action complaint against VW filed in California (/via @funnymonkey) or this one, also in San Francisco, or Tennessee, or Georgia, or another district in Georgia, or Santa Barbara, or Illinois, or Virginia, and presumably the list goes on… (I wonder if any of those complaints are actually informative/well-researched in terms of their content? And whether they are all just variations on a similar theme? In which case, they could be an interesting basis for a comparative text analysis?)

In passing, I also note that suits have previously – but still recently – been filed against VW, amongst others, regarding misleading fuel consumption claims, for example in the EU.

Tinkering With Data Consuming WordPress Plugins

Now that I’ve got my own webspace on Reclaim Hosting and set up a handful of my own self-hosted WordPress blogs, I started having a tinker with some custom plugins. I’ve started off with something simple, a couple of shortcode plugins that I can use to embed a simple leaflet map into a post or a page.


So far, I’ve tried a couple of different approaches…

The first one has been pared down to just a default accepting shortcode:


The shortcode calls a bit of code that makes a cached, (at most, three times a day) request to a scraper that returns a list of currently open for consultation planning applications to the Isle of Wight Council. (That said, I can pass in a few shortcode parameters relating to the size of the map (which is a fixed size, unfortunately – I haven’t worked out to make it flexible…) and the default location and zoom.) The scraper itself runs once daily.


My feeling is that this sort of code would work best in the context of a WordPress page, acting as a destination that allows folk to just check on currently open applications.

The second plugin embeds a map that displays markers for recent house sales (as recorded by the Land Registry prices paid dataset). This dataset is published as a monthly set of data a month or two after the fact and is downloaded to my local desktop. A python script then reads in the data, creates a new WordPress post containing the shortcode with the data baked in, and uploads the post to WordPress (where it appears in the draft queue).


In this shortcode, the marker data is currently encoded using the PHP serialise format (via the python phpserialize dumps method) and embedded in the post as the value of shortcode attribute.

[MultiMarkerLeafletMap zoom=11 lat=50.675 lon=-1.32 width=800 height=500 markers='a:112:{i:0;a:5:{s:3:"lat";d:50.699382843;s:4:"date";s:10:"2015-07-10";s:3:"lon";d:-1.29297620442;s:8:"location";s:22:"SAVOY COURT, TOWN...]

In this case, with the marker data baked into the shortcode, there’s a good argument for rendering the map within a timestamped post as a fixed map (at least, ‘fixed’ in the sense that the data is unchanging).

The PHP un/serialize route isn’t ideal because I think it raises security issues? I originally tried to pass the data as serialised JSON, but the data is in the form of a list and it seems the ] breaks things. I guess what I should really do is see if I can pass the data in as serialised JSON between the shortcode tags rather than pass it in as an attribute?

Another approach might be to define just a simple map embed shortcode, and then support additional shortcodes that add markers to the map?

My PHP coding is also a bit scrappy (and I keep forgetting the ;’s…:-( I think one thing I do need to do is pop the simple plugin code functions into a class to keep them safe, and also patch in a hack or two (as seems to be required?) so that the leaflet map libraries are only loaded into post and page headers for posts/pages that actually contain one of the map shortcodes.

I’m also thinking now I need to find a way to support boundary lines and shape colouring?



DON’T PANIC – The Latest OU/Hans Rosling Co-Pro Has (Almost) Arrived…

To tie in with the UN’s Sustainable Development Goals summit later this week, the OU teamed up with production company Wingspan Productions and data raconteur Professor Hans Rosling to produce a second “Don’t Panic” lecture performance that airs on BBC2 at 8pm tonight: Don’t Panic – How to End Poverty in 15 Years.

Here’s a trailer…

…and here’s the complementary OpenLearn site: Don’t Panic – How to End Poverty in 15 Years (OpenLearn).

If you saw the previous outing – DON’T PANIC — The Facts About Population – it takes a similar format, once again using the Musion projection system to render “holographic” data visualisations that Hans tells his stories around.

(I did try to suggest that a spinning 3d chart could be quite compelling up on the big screen to illustrate different country trajectories over time, but was told it’d probably be too complicated graphic for the audience to understand!;-)

Off the back of the previous co-production, the OU commissioned a series of video shorts featuring Hans Rosling that review several ways in which we can make sense of global development using data, and statistics:

One idea for making use of these videos was to incorporate them into a full open course on FutureLearn, but for a variety of (internal?) reasons, that idea was canned. However, some of the material I’d started sketching for that possibility have finally made the light of day. They appear as a series of OpenLearn lessons relating to the first three short films listed above, with the videos cut into bite size fragments and interspersed throughout a narrative text and embedded interactive data charts:

You might also pick up on some of the activity possibilities that are included too…

Note that those lessons are not quite presented as originally handed over… I was half hoping OpenLearn might have a go at displaying them as “scrollytelling” immersive stories as something of experiment, but that appears not to be the case (maybe I should have actually published the stories first?! Hmmm…!). Anyway, here’s what I original drafted, using Storybuilder:

If you have any comments on the charts, or feedback on the immersive story presentation (did it work for you, or did you find it irritating?), please let me know via the comments below.

PS if you are interested in doing a FutureLearn course with an development data feel, at least in part, check out OUr forthcoming FutureLearn MOOC, Learn to Code for Data Analysis. Using interactive IPython notebooks, you’ll learn how to start wrangling and visualising open datasets (including weather data, World Bank indicators data, and UN Comtrade import and export data) using the Python programming language and the pandas data wrangling python package.

Even Though RSS Never Went Away, Could It Be Coming Back as a Facebook Sinker?

Long time readers will know I was – am – a huge fan of RSS and Atom, simple feed based protocols for syndicating content and attachment links, even going so far as to write a manifesto of a sort at one point (We Ignore RSS at OUr Peril).

This blog, and the earlier archived version of it, are full of reports and recipes around various RSS experiments and doodles, although in more recent years I haven’t really been using RSS as a creative medium that much, if at all.

But today I noticed this on the official Facebook developer blog: Publishing Instant Articles Directly From Your Content Management System [Instant Article docs]. Or more specifically, this:

When publishers get started with Instant Articles, they provide an RSS feed of their articles to Facebook, a format that most Content Management Systems already support. Once this RSS feed is set up, Instant Articles automatically loads new stories as soon as they are published to the publisher’s website and apps. Updates and corrections are also automatically captured via the RSS feed so that breaking news remains up to date.

So… Facebook will use RSS to synch content into Facebook from publishers’ CMS’.

Depending on the agreement Facebook has with the publishers, it may require that those feeds are private, rather than public, feeds that sink the the content directly into Facebook.

But I wonder, will it also start sinking content from other independent publishers into the Facebook platform via those open feeds, providing even less reason for Facebook users to go elsewhere as it drops bits of content from the open web into closed, personal Facebook News Feeds? Hmmm…

There seems to be another sort of a grab for attention going on too:

Each Instant Article is associated with the URL where the web version is hosted on the publisher’s website. This means that Instant Articles are open and compatible with all of the ways that people share links around the web today:

  • When a friend or page you follow shares a link in your News Feed, we check to see if there is an Instant Article associated with that URL. If so, you will see it as an Instant Article. If not, it will open on the web browser.
  • When you share an Instant Article on Facebook or using email, SMS, or Twitter, you are sharing the link to the publisher website so anyone can open the article no matter what platform they use.

Associating each Instant Article with a URL makes it easy for publishers to adopt Instant Articles without changing their publishing workflows and means that people can read and share articles without thinking about the platform or technology behind the scenes.

Something like this maybe?


Which is to say, this?


Or maybe not. Maybe there is some enlightened self interest in this, and perhaps Facebook will see a reason to start letting its content out via open syndication formats, like RSS.

Or maybe RSS will end up sinking the Facebook platform, by allowing Facebook users to go off the platform but still accept content from it?

Whatever the case, as Facebook becomes a set of social platform companies rather than a single platform company, I wonder: will it have an open standard, feed based syndication bus to help content flow within and around those companies? Even if that content is locked inside the confines of a Facebook-parent-company-as-web attention wall?

PS So the ‘related content’ feature on my WordPress blog associates this post with an earlier one: Is Facebook Stifling the Free Flow of Information?, which it seems was lamenting an earlier decision by Facebook to disable the import of content into Facebook using RSS…?! What goes around, comes around, it seems?!

First Steps in a Conversational Slackbot interface to CQC Inspection Data

A few months ago, I started to have a play with ratings data from the CQC – the Care Quality Commission. I don’t seem to have posted the scraper tools I ended up with anywhere, but I have started playing with them again over the last couple of weeks in the context of my slackbot explorations.

In particular, I’ve started working about a conversational workflow for keeping track of inspections in a particular local area. At the current time, the CQC website only seems to support alerts relating to new reports at the level of a particular location, although it is possible to get faceted search results relating to reports published over the course of the last week or last month. For the local journalist wanting to keep tabs on reports associated with local providers, this means setting up a data beat that includes checking the CQC website on a regular basis, firstly to see whether there are any new reports, and secondly to see what sort of reports they are.

And as a report from OnTheWight today shows (Beacon Health Centre rated ‘Good’ by CQC), this can include good news stories as well as bad news one.


So what’s the thing I’ve been exploring in the slack context? A time saver, hopefully. In the first case, it provides a quick way of checking up on reports from the local area released over the previous week or previous month:

To begin with, we can ask for a summary report of recent inspections:


The report does a bit of counting – to provide some element of context – and provides a highlights statement regarding the overall result from each of the reports. (At the moment, I don’t sort the order in which reports are presented. There are opportunities here for prioritising which results to show. I also need to think about wether results should be provided as a single slackbot response, as is currently the case, or using a separate (and hence, “star-able”) response for each report.)

After briefly skimming over the recent results, we can tunnel down into a slightly more detailed summary of the report by passing in a location ID:


As part of the report, this returns a link to the CQC website so we can inspect the full result at source. I’ve also got a scraper that pulls the full reports from the CQC site, but at the moment it’s not returning the result to slack (I think there’s a message size limit which I’m overflowing, so I need to find what that limit it and split up the response to get round it.). That said, slack is perhaps not the best place to return long reports? Maybe a better way would be to submit quoted report components into a draft WordPress blog post?

We can also pull back administrative information regarding a location from its location ID.


This report also includes information about other locations operated by the same provider. (I think I do have a report somewhere that summarises the report ratings over all the locations associated with a given provider, so we can look to see how well other establishments operated by the same provider are rated, but I haven’t wired that into the Slack bot yet.)

There are several other ways we can develop this conversation…

Company number and charity number information is available for some providers, which means it should be to trivial to ask return company registration information and company directors information from Companies House or OpenCorporates, and perhaps even data from the Charities Commission.

Rather more scruffily, perhaps, we could use location name and postcode to try a search on the Food Standards Agency website to see if we can find food ratings data for establishments of interest.

There might also be opportunities for linking in items from local spending data to try to capture local authority spend with a particular location or provider. This would be simplified if council payments to CQC rated establishments or providers included the CQC location or provider ID, but that’s perhaps too much to ask for.

If nothing else, however, this demonstrates a casual conversational way in which a local journalist might be able to use slack as part of a local data beat to run regular, periodic checks over recent reports published by the CQC relating to local care establishments.