Poking Around the VW Thing (Algorithmic Cheating in the Automobile Industry)

The news in recent days that VW had installed a software device into some of it’s diesel engines that identify when the engine is being tested for emissions on a dynamometer (“dyno”) provides a nice demonstration of how “intelligent” software control systems can be use to identify particular operating environments and switch into an appropriate – or inappropriate – operational mode.

As I keep finding, press coverage of the events seems to offer less explanation and context than the original document that seems to have kicked off the recent publicity, specifically a letter from the US environmental protection agency to the Volkswagen Group of America:


(The document is a PDF of a scanned document; I had hoped to extracted the text using a variant of this recipe, running Tika on my own computer via Kitematic, Getting Text Out Of Anything (docs, PDFs, Images) Using Apache Tika, but it doesnlt seem to extract text from the inlined image(s). Instead, I uploaded it to Google Drive, then opened it in Google docs – you see pages from the original doc as an image, and below is the extracted text.)

Here are some of the bits that jumped out at me:

A defeat device is an AECD [Auxiliary Emission Control Device] “that reduces the effectiveness of the emission control system under conditions which may reasonably be expected to be encountered in normal vehicle operation and procedure use, unless: (1) Such conditions are substantially included in the Federal emission test procedure; (2) The need for the AECD is justified in terms of protecting the vehicle against damage or accident; (3) The AECD does not go beyond the requirements of engine starting; or (4) The AECD applies only for emergency vehicles …” 40 C.F.R. § 86.1803-01.

Motor vehicles equipped with defeat devices, … cannot be certified.

The CAA makes it a violation “for any person to manufacture or sell, or offer to sell, or install, any part or component intended for use with, or as part of any motor vehicle or motor vehicle engine, where a principal effect of the part or component is to bypass, defeat, or render inoperative any device or element of design installed on or in a motor vehicle or motor vehicle engine in compliance with regulations under this subchapter, and where the person knows or should know that such part or component is being offered for sale or installed for such use or put to such use.” CAA § 203(a)(3)(B), 42 U.S.C. § 7522(a)(3)(B); 40 C.F.R. § 86.1854-12(a)(3)(ii).

Each VW vehicle identified … has AECDs that were not described in the application for the COC that purportedly covers the vehicle. Specifically, VW manufactured and installed software in the electronic control module (ECM) of these vehicles that sensed when the vehicle was being tested for compliance with EPA emission standards. For ease of reference, the EPA is calling this the “switch.” The “switch” senses whether the vehicle is being tested or not based on various inputs including the position of the steering wheel, vehicle speed, the duration of the engine’s operation, and barometric pressure. These inputs precisely track the parameters of the federal test procedure used for emission testing for EPA certification purposes. During EPA emission testing, the vehicles ECM ran software which produced compliant emission results under an ECM calibration that VW referred to as the “dyno calibration” (referring to the equipment used in emissions testing, called a dynamometer). At all other times during normal vehicle operation, the “switch” was activated and the vehicle ECM software ran a separate “road calibration” which reduced the effectiveness of the emission control system (specifically the selective catalytic reduction or the lean NOx trap). As a result, emissions of NOx increased by a factor of 10 to 40 times above the EPA compliant levels, depending on the type of drive cycle (e.g., city, highway). … Over the course of the year following the publication of the WVU study [TH: see link below], VW continued to assert to CARB and the EPA that the increased emissions from these vehicles could be attributed to various technical issues and unexpected in-use conditions. VW issued a voluntary recall in December 2014 to address the issue. … When the testing showed only a limited benefit to the recall, CARB broadened the testing to pinpoint the exact technical nature of the vehicles’ poor performance, and to investigate why the vehicles’ onboard diagnostic system was not detecting the increased emissions. None of the potential technical issues suggested by VW explained the higher test results consistently confirmed during CARB’s testing. It became clear that CARB and the EPA would not approve certificates of conformity for VW’s 2016 model year diesel vehicles until VW could adequately explain the anomalous emissions and ensure the agencies that the 2016 model year vehicles would not have similar issues. Only then did VW admit it had designed and installed a defeat device in these vehicles in the form of a sophisticated software algorithm that detected when a vehicle was undergoing emissions testing.

VW knew or should have known that its “road calibration” and “switch” together bypass, defeat, or render inoperative elements of the vehicle design related to compliance with the CAA emission standards. This is apparent given the design of these defeat devices. As described above, the software was designed to track the parameters of the federal test procedure and cause emission control systems to underperform when the software determined that the vehicle was not undergoing the federal test procedure.

VW’s “road calibration” and “switch” are AECDs’ that were neither described nor justified in the applicable COC applications, and are illegal defeat devices.

The news also reminded of another tech journalism brouhahah from earlier this year around tractor manufacturer John Deere arguing that farmers don’t own their tractors, but instead purchase “an implied license for the life of the vehicle to operate the vehicle” (Wired, We Can’t Let John Deere Destroy the Very Idea of Ownership).

I didn’t really follow that story properly at the time but it seems the news arose out of a response to a consultation by the US Copyright Office around the Digital Millennium Copyright Act (DMCA), and in particular a “Proposed Class 21: Vehicle software – diagnosis, repair, or modification” category (first round comments, second round comments) for the DCMA Section 1201: Exemptions to Prohibition Against Circumvention of Technological Measures Protecting Copyrighted Works.

Here’s how the class was defined:

21. Proposed Class 21: Vehicle software – diagnosis, repair, or modification
This proposed class would allow circumvention of TPMs [technological protection measures] protecting computer programs that control the functioning of a motorized land vehicle, including personal automobiles, commercial motor vehicles, and agricultural machinery, for purposes of lawful diagnosis and repair, or aftermarket personalization, modification, or other improvement. Under the exemption as proposed, circumvention would be allowed when undertaken by or on behalf of the lawful owner of the vehicle.

Note the phrase “for purposes of lawful diagnosis and repair”…

I also note a related class:

22. Proposed Class 22: Vehicle software – security and safety research
This proposed class would allow circumvention of TPMs protecting computer programs that control the functioning of a motorized land vehicle for the purpose of researching the security or safety of such vehicles. Under the exemption as proposed, circumvention would be allowed when undertaken by or on behalf of the lawful owner of the vehicle.

(and in passing note Proposed Class 27: Software – networked medical devices…).

Looking at some of the supporting documents, it’s interesting to see how the lobby moved. For example, from the Senior Director of Environmental Affairs for the Alliance of Automobile Manufacturers:

The proponents state that an exemption is needed for three activities related to vehicles – diagnosis, repair, and modification. In my limited time, I will explain why, for the first two activities – diagnosis and repair – there is no need to circumvent access controls on Electronic Control Units (ECUs). Then, I will address why tampering with ECUs to “modify” vehicle performance undermines national regulatory goals for clean air, fuel efficiency, and auto safety, and why the Copyright Office should care about that.

1. Diagnosis/repair
The arguments put forward by the proponents of this exemption are unfounded. State and federal regulations, combined with the Right to Repair MOU and the 2002 “Dorgan letter,” guarantee all independent repair shops and individual consumers access to all the information and tools needed to diagnose and repair Model Year 1996 or newer cars. This information and these tools are already accessible online, through a thriving and competitive aftermarket. Every piece of information and every tool used to diagnose and repair vehicles at franchised dealers is available to every consumer and every independent repair shop in America. This has been the case for the past 12 years. Moreover, all of these regulations and agreements require automakers to provide the information and tools at a “fair and reasonable price.” No one in the last 12 years has disputed this fact, in any of the various avenues for review provided, including U.S. EPA, the California Air Resources Board, and joint manufacturer-aftermarket organizations.

There is absolutely no need to hack through technological protection measures and copy ECU software to diagnose and repair vehicles.

2. Modification
The regulations and agreements discussed above do not apply to information needed to “modify” engine and vehicle software. We strongly support a competitive marketplace in the tools and information people need so their cars continue to perform as designed, in compliance with all regulatory requirements. But helping people take their cars out of compliance with those requirements is something we certainly do not want to encourage. That, in essence, is what proponents of exemption #21 are calling for, in asserting a right to hack into vehicle software for purposes of “modification.” In the design and operation of ECUs in today’s automobiles, manufacturers must achieve a delicate balance among many competing regulatory demands, notably emissions (air pollution); fuel economy; and of course, vehicle safety. If the calibrations are out of balance, the car may be taken out of compliance. This is so likely to occur with many of the modifications that the proponents want to make that you could almost say that noncompliance is their goal, or at least an inevitable side effect.

Manufacturer John Deere suggested that:

1. The purpose and character of the use frustrate compliance with federal public safety and environmental regulations

The first fair use factor weighs against a finding of fair use because the purpose and character of the use will encourage non-compliance with environmental regulations and will interfere with the ability of manufacturers to identify and resolve software problems, conduct recalls, review warranty claims, and provide software upgrade versions.

And General Motors seem to take a similar line:

TPMs also ensure that vehicles meet federally mandated safety and emissions standards. For example, circumvention of certain emissions-oriented TPMs, such as seed/key access control mechanisms, could be a violation of federal law. Notably, the Clean Air Act (“CAA”) prohibits “tampering” with vehicles or vehicle engines once they have been certified in a certain configuration by the Environmental Protection Agency (“EPA”) for introduction into U.S. commerce. “Tampering” includes “rendering inoperative” integrated design elements to modify vehicle and/or engine performance without complying with emissions regulations. In addition, the Motor Vehicle Safety Act (“MVSA”) prohibits the introduction into U.S. commerce of vehicles that do not comply with the Federal Motor Vehicle Safety Standards, and prohibits manufacturers, dealers, distributors, or motor vehicle repair businesses from knowingly making inoperative any part of a device or element of design installed on or in a motor vehicle in compliance with an applicable motor vehicle standard.14

Further, tampering with these systems would not be obvious to a subsequent owner or driver of a vehicle that has been tampered with. If a vehicle’s airbag systems, including any malfunction indicator lights, have been disabled (whether deliberately or inadvertently), a subsequent vehicle owner’s safety will be in jeopardy without warning. Further, if a vehicle’s emissions systems have been tampered with, a subsequent owner would have no way of knowing this has occurred. For tampering that the subsequent owner eventually discovers, manufacturer warranties do not cover the repair of damage caused by the tampering, placing the repair cost on the subsequent owner. For good cause, federal environmental and safety regulations regarding motor vehicles establish a well-recognized overall policy against allowing tampering with in-vehicle electronic systems designed for safety and emissions control.

While so-called “tinkerers” and enthusiasts may wish to modify their vehicle software for personal needs, granting greater access to vehicle software for purposes of modification fails to consider the overall concerns surrounding regulatory compliance and safety and the overall impact on safety and the environment. … Thus, the current prohibition ensures the distribution of safe and secure vehicle software within an overall vehicle security strategy implemented by car manufacturers that does not restrict vehicle owners’ ability to diagnose, modify or repair their cars.

The arguments from the auto lobby therefore go along the lines of “folk can’t mess with the code because they’ll try to break the law”, as opposed to the manufacturers systematically breaking the law, or folk trying to find out why a car performs nothing like the apparently declared figures. And I’m sure there are no elements of the industry wanting to prevent folk from looking at the code lest they find that it has “test circumvention” code baked in to it by the actual manufacturers…

What the VW case throws up, perhaps, is the need for a clear route for investigators to be allowed to find a way of checking on the compliance behaviour of various algorithms, not just in formal tests but also in unannounced to the engine management system, everyday road tests.

And that doesn’t necessarily require on-the-road tests in a real vehicle. If the controller is a piece of software acting on digitised sensor inputs to produce a particular set of control algorithm outputs, the controller can be tested on a digital testbench or test harness against various test inputs covering a variety of input conditions captured from real world data logging. This is something I think I need to read up more about… this could be a quick way in to the very basics: National Instruments: Building Flexible, Cost-Effective ECU Test Systems White Paper. Something like this could also be relevant: Gehring, J. and Schütte, H., “A Hardware-in-the-Loop Test Bench for the Validation of Complex ECU Networks”, SAE Technical Paper 2002-01-0801, 2002, doi:10.4271/2002-01-0801 (though the OU Library fails to get me immediate access to this resource…:-(.

PS In passing, I just spotted this: Auto Parts Distributor Pleads Guilty to Manufacturing and Selling Pirated Mercedes-Benz Software – it seems that Mercedes-Benz distribute “a portable tablet-type computer that contains proprietary software created by to diagnose and repair its automobiles and that requires a code or ‘license key’ to access [it]” and that a company had admitted to obtaining “without authorization, … [the] Mercedes-Benz SDS software and updates, modified and duplicated the software, and installed the software on laptop computers (which served as the SDS units)”. So a simple act of software copyright/license infringement, perhaps, relating to offboard testing and diagnostic tools. But another piece in the jigsaw, for example, when it comes to engineering software that can perform diagnostics.

PPS via @mhawksey, a link to the relevant West Virginia University test report – In-use emissions testing of light-duty diesel vehicles in the U.S. and noting Martin’s observation that there are several references to Volkswagen’s new 2.0 l TDI engine for the most stringent emission standards — Part 2 (reference [31] in the paper, Hadler, J., Rudolph, F., Dorenkamp, R., Kosters, M., Mannigel, D., and Veldten, B., “Volkswagen’s New 2.0l TDI Engine for the Most Stringent Emission Standards – Part 2,” MTZ Worldwide, Vol. 69, June, (2008). , which the OU Library at least doesn’t subscribe to:-(…

“Interestingly” the report concluded:

In summary, real-world NOx emissions were found to exceed the US-EPA Tier2-Bin5 standard (at full useful life) by a factor of 15 to 35 for the LNT equipped Vehicle A, by a factor of 5 to 20 for the urea-SCR fitted Vehicle B (same engine as Vehicle A) and at or below the standard for Vehicle C with exception of rural-up/downhill driving conditions, over five predefined test routes. Generally, distance-specific NOx emissions were observed to be highest for rural-up/downhill and lowest for high-speed highway driving conditions with relatively flat terrain. Interestingly, NOx emissions factors for Vehicles A and B were below the US-EPA Tier2-Bin5 standard for the weighted average over the FTP-75 cycle during chassis dynamometer testing at CARB’s El Monte facility, with 0.022g/km ±0.006g/km (±1σ, 2 repeats) and 0.016g/km ±0.002g/km (±1σ, 3 repeats), respectively.

It also seems that the researchers spotted what might be happening to explain the apparently anomalous results they were getting with help from reference 31: “The probability of this explanation is additionally supported by a detailed description of the after-treatment control strategy for Vehicle A presented elsewhere [31]“.

PPPS I guess one way of following the case might be track the lawsuits, such as this class action complaint against VW filed in California (/via @funnymonkey) or this one, also in San Francisco, or Tennessee, or Georgia, or another district in Georgia, or Santa Barbara, or Illinois, or Virginia, and presumably the list goes on… (I wonder if any of those complaints are actually informative/well-researched in terms of their content? And whether they are all just variations on a similar theme? In which case, they could be an interesting basis for a comparative text analysis?)

In passing, I also note that suits have previously – but still recently – been filed against VW, amongst others, regarding misleading fuel consumption claims, for example in the EU.

Tinkering With Data Consuming WordPress Plugins

Now that I’ve got my own webspace on Reclaim Hosting and set up a handful of my own self-hosted WordPress blogs, I started having a tinker with some custom plugins. I’ve started off with something simple, a couple of shortcode plugins that I can use to embed a simple leaflet map into a post or a page.


So far, I’ve tried a couple of different approaches…

The first one has been pared down to just a default accepting shortcode:


The shortcode calls a bit of code that makes a cached, (at most, three times a day) request to a morph.io scraper that returns a list of currently open for consultation planning applications to the Isle of Wight Council. (That said, I can pass in a few shortcode parameters relating to the size of the map (which is a fixed size, unfortunately – I haven’t worked out to make it flexible…) and the default location and zoom.) The scraper itself runs once daily.


My feeling is that this sort of code would work best in the context of a WordPress page, acting as a destination that allows folk to just check on currently open applications.

The second plugin embeds a map that displays markers for recent house sales (as recorded by the Land Registry prices paid dataset). This dataset is published as a monthly set of data a month or two after the fact and is downloaded to my local desktop. A python script then reads in the data, creates a new WordPress post containing the shortcode with the data baked in, and uploads the post to WordPress (where it appears in the draft queue).


In this shortcode, the marker data is currently encoded using the PHP serialise format (via the python phpserialize dumps method) and embedded in the post as the value of shortcode attribute.

[MultiMarkerLeafletMap zoom=11 lat=50.675 lon=-1.32 width=800 height=500 markers='a:112:{i:0;a:5:{s:3:"lat";d:50.699382843;s:4:"date";s:10:"2015-07-10";s:3:"lon";d:-1.29297620442;s:8:"location";s:22:"SAVOY COURT, TOWN...]

In this case, with the marker data baked into the shortcode, there’s a good argument for rendering the map within a timestamped post as a fixed map (at least, ‘fixed’ in the sense that the data is unchanging).

The PHP un/serialize route isn’t ideal because I think it raises security issues? I originally tried to pass the data as serialised JSON, but the data is in the form of a list and it seems the ] breaks things. I guess what I should really do is see if I can pass the data in as serialised JSON between the shortcode tags rather than pass it in as an attribute?

Another approach might be to define just a simple map embed shortcode, and then support additional shortcodes that add markers to the map?

My PHP coding is also a bit scrappy (and I keep forgetting the ;’s…:-( I think one thing I do need to do is pop the simple plugin code functions into a class to keep them safe, and also patch in a hack or two (as seems to be required?) so that the leaflet map libraries are only loaded into post and page headers for posts/pages that actually contain one of the map shortcodes.

I’m also thinking now I need to find a way to support boundary lines and shape colouring?



DON’T PANIC – The Latest OU/Hans Rosling Co-Pro Has (Almost) Arrived…

To tie in with the UN’s Sustainable Development Goals summit later this week, the OU teamed up with production company Wingspan Productions and data raconteur Professor Hans Rosling to produce a second “Don’t Panic” lecture performance that airs on BBC2 at 8pm tonight: Don’t Panic – How to End Poverty in 15 Years.

Here’s a trailer…

…and here’s the complementary OpenLearn site: Don’t Panic – How to End Poverty in 15 Years (OpenLearn).

If you saw the previous outing – DON’T PANIC — The Facts About Population – it takes a similar format, once again using the Musion projection system to render “holographic” data visualisations that Hans tells his stories around.

(I did try to suggest that a spinning 3d chart could be quite compelling up on the big screen to illustrate different country trajectories over time, but was told it’d probably be too complicated graphic for the audience to understand!;-)

Off the back of the previous co-production, the OU commissioned a series of video shorts featuring Hans Rosling that review several ways in which we can make sense of global development using data, and statistics:

One idea for making use of these videos was to incorporate them into a full open course on FutureLearn, but for a variety of (internal?) reasons, that idea was canned. However, some of the material I’d started sketching for that possibility have finally made the light of day. They appear as a series of OpenLearn lessons relating to the first three short films listed above, with the videos cut into bite size fragments and interspersed throughout a narrative text and embedded interactive data charts:

You might also pick up on some of the activity possibilities that are included too…

Note that those lessons are not quite presented as originally handed over… I was half hoping OpenLearn might have a go at displaying them as “scrollytelling” immersive stories as something of experiment, but that appears not to be the case (maybe I should have actually published the stories first?! Hmmm…!). Anyway, here’s what I original drafted, using Storybuilder:

If you have any comments on the charts, or feedback on the immersive story presentation (did it work for you, or did you find it irritating?), please let me know via the comments below.

PS if you are interested in doing a FutureLearn course with an development data feel, at least in part, check out OUr forthcoming FutureLearn MOOC, Learn to Code for Data Analysis. Using interactive IPython notebooks, you’ll learn how to start wrangling and visualising open datasets (including weather data, World Bank indicators data, and UN Comtrade import and export data) using the Python programming language and the pandas data wrangling python package.

Even Though RSS Never Went Away, Could It Be Coming Back as a Facebook Sinker?

Long time readers will know I was – am – a huge fan of RSS and Atom, simple feed based protocols for syndicating content and attachment links, even going so far as to write a manifesto of a sort at one point (We Ignore RSS at OUr Peril).

This blog, and the earlier archived version of it, are full of reports and recipes around various RSS experiments and doodles, although in more recent years I haven’t really been using RSS as a creative medium that much, if at all.

But today I noticed this on the official Facebook developer blog: Publishing Instant Articles Directly From Your Content Management System [Instant Article docs]. Or more specifically, this:

When publishers get started with Instant Articles, they provide an RSS feed of their articles to Facebook, a format that most Content Management Systems already support. Once this RSS feed is set up, Instant Articles automatically loads new stories as soon as they are published to the publisher’s website and apps. Updates and corrections are also automatically captured via the RSS feed so that breaking news remains up to date.

So… Facebook will use RSS to synch content into Facebook from publishers’ CMS’.

Depending on the agreement Facebook has with the publishers, it may require that those feeds are private, rather than public, feeds that sink the the content directly into Facebook.

But I wonder, will it also start sinking content from other independent publishers into the Facebook platform via those open feeds, providing even less reason for Facebook users to go elsewhere as it drops bits of content from the open web into closed, personal Facebook News Feeds? Hmmm…

There seems to be another sort of a grab for attention going on too:

Each Instant Article is associated with the URL where the web version is hosted on the publisher’s website. This means that Instant Articles are open and compatible with all of the ways that people share links around the web today:

  • When a friend or page you follow shares a link in your News Feed, we check to see if there is an Instant Article associated with that URL. If so, you will see it as an Instant Article. If not, it will open on the web browser.
  • When you share an Instant Article on Facebook or using email, SMS, or Twitter, you are sharing the link to the publisher website so anyone can open the article no matter what platform they use.

Associating each Instant Article with a URL makes it easy for publishers to adopt Instant Articles without changing their publishing workflows and means that people can read and share articles without thinking about the platform or technology behind the scenes.

Something like this maybe?


Which is to say, this?


Or maybe not. Maybe there is some enlightened self interest in this, and perhaps Facebook will see a reason to start letting its content out via open syndication formats, like RSS.

Or maybe RSS will end up sinking the Facebook platform, by allowing Facebook users to go off the platform but still accept content from it?

Whatever the case, as Facebook becomes a set of social platform companies rather than a single platform company, I wonder: will it have an open standard, feed based syndication bus to help content flow within and around those companies? Even if that content is locked inside the confines of a Facebook-parent-company-as-web attention wall?

PS So the ‘related content’ feature on my WordPress blog associates this post with an earlier one: Is Facebook Stifling the Free Flow of Information?, which it seems was lamenting an earlier decision by Facebook to disable the import of content into Facebook using RSS…?! What goes around, comes around, it seems?!

First Steps in a Conversational Slackbot interface to CQC Inspection Data

A few months ago, I started to have a play with ratings data from the CQC – the Care Quality Commission. I don’t seem to have posted the scraper tools I ended up with anywhere, but I have started playing with them again over the last couple of weeks in the context of my slackbot explorations.

In particular, I’ve started working about a conversational workflow for keeping track of inspections in a particular local area. At the current time, the CQC website only seems to support alerts relating to new reports at the level of a particular location, although it is possible to get faceted search results relating to reports published over the course of the last week or last month. For the local journalist wanting to keep tabs on reports associated with local providers, this means setting up a data beat that includes checking the CQC website on a regular basis, firstly to see whether there are any new reports, and secondly to see what sort of reports they are.

And as a report from OnTheWight today shows (Beacon Health Centre rated ‘Good’ by CQC), this can include good news stories as well as bad news one.


So what’s the thing I’ve been exploring in the slack context? A time saver, hopefully. In the first case, it provides a quick way of checking up on reports from the local area released over the previous week or previous month:

To begin with, we can ask for a summary report of recent inspections:


The report does a bit of counting – to provide some element of context – and provides a highlights statement regarding the overall result from each of the reports. (At the moment, I don’t sort the order in which reports are presented. There are opportunities here for prioritising which results to show. I also need to think about wether results should be provided as a single slackbot response, as is currently the case, or using a separate (and hence, “star-able”) response for each report.)

After briefly skimming over the recent results, we can tunnel down into a slightly more detailed summary of the report by passing in a location ID:


As part of the report, this returns a link to the CQC website so we can inspect the full result at source. I’ve also got a scraper that pulls the full reports from the CQC site, but at the moment it’s not returning the result to slack (I think there’s a message size limit which I’m overflowing, so I need to find what that limit it and split up the response to get round it.). That said, slack is perhaps not the best place to return long reports? Maybe a better way would be to submit quoted report components into a draft WordPress blog post?

We can also pull back administrative information regarding a location from its location ID.


This report also includes information about other locations operated by the same provider. (I think I do have a report somewhere that summarises the report ratings over all the locations associated with a given provider, so we can look to see how well other establishments operated by the same provider are rated, but I haven’t wired that into the Slack bot yet.)

There are several other ways we can develop this conversation…

Company number and charity number information is available for some providers, which means it should be to trivial to ask return company registration information and company directors information from Companies House or OpenCorporates, and perhaps even data from the Charities Commission.

Rather more scruffily, perhaps, we could use location name and postcode to try a search on the Food Standards Agency website to see if we can find food ratings data for establishments of interest.

There might also be opportunities for linking in items from local spending data to try to capture local authority spend with a particular location or provider. This would be simplified if council payments to CQC rated establishments or providers included the CQC location or provider ID, but that’s perhaps too much to ask for.

If nothing else, however, this demonstrates a casual conversational way in which a local journalist might be able to use slack as part of a local data beat to run regular, periodic checks over recent reports published by the CQC relating to local care establishments.

Some Idle Thoughts on Managing Temporal Posts in WordPress

Now that I’ve got a couple of my own WordPress blogs running off the back of my Reclaim Hosting account, I’ve started to look again at possible ways of tinkering with WordPress.

The first thing I had a look at was posting a draft WordPress post from a script.

Using a WordPress role editor plugin (e.g. a long the lines of this User Role Editor) it’s easy enough to create a new role with edit and upload permissions only [WordPress roles and capabilities], and create a new ‘autoposter’ user with that role. Code like the following then makes it easy enough to upload an image to WordPress, grab the URL, insert it into a post, and then submit the post – where it will, by default, appear as a draft post:

#Ish Via: http://python-wordpress-xmlrpc.readthedocs.org/en/latest/examples/media.html
from wordpress_xmlrpc import Client, WordPressPost
from wordpress_xmlrpc.compat import xmlrpc_client
from wordpress_xmlrpc.methods import media, posts
from wordpress_xmlrpc.methods.posts import NewPost

wp = Client('http://blog.example.org/xmlrpc.php', ACCOUNTNAME, ACCOUNT_PASSWORD)

def wp_simplePost(client,title='ping',content='pong, <em>pong<em>'):
    post = WordPressPost()
    post.title = title
    post.content = content
    response = client.call(NewPost(post))
    return response

def wp_uploadImageFile(client,filename):

    mimes={'png':'image/png', 'jpg':'image/jpeg'}
    # prepare metadata
    data = {
            'name': filename,
            'type': mimetype,  # mimetype

    # read the binary file and let the XMLRPC library encode it into base64
    with open(filename, 'rb') as img:
            data['bits'] = xmlrpc_client.Binary(img.read())

    response = client.call(media.UploadFile(data))
    return response

def quickTest():
    txt = "Hello World"
    txt=txt+'<img src="{}"/><br/>'.format(wp_uploadImageFile(wp,'hello2world.png')['url'])
    return txt


Dabbling with this then got me thinking about the different sorts of things that WordPress allows you to publish in general. It seems to me that there are essentially three main types of thing you can publish:

  1. posts: the timestamped elements that appear in a reverse chronological order in a WordPress blog. Posts can also be tagged and categorised and viewed via a tag or category page. Posts can be ‘persisted’ at the top of the posts page by setting them as a “sticky” post.
  2. pages: static content pages typically used to contain persistent, unchanging content. For example, an “About” page. Pages can also be organised hierarchically, with child subpages defined relative to a specified ‘parent’ page.
  3. sidebar elements and widgets: these can contain static or dynamic content.

(By the by, a range of third party plugins appear to support the conversion of posts to pages, for example Post Type Switcher [untested] or the bulk converter Convert Post Types [untested].)

Within a page or a post, we can also include a shortcode element that can be used to include a small piece of templated text or generated from the execution of some custom code (which it seems could be python: running a python script from a WordPress shortcode). Shortcodes run each time a page is loaded, although you can use the WordPress Transients database API to implement a simple cache for them to improve performance (eg as described here and here).

Within a post, page or widget, we can also embed dynamic content. For example, we could embed a map that displays dynamically created markers that are essentially out of the control of the page or post publisher. Note that by default WordPress strips iframes from content (and it also seems reluctant to allow the upload of html files to the media gallery, at least by default). The preferred way to include custom embedded content seems to be to define a shortcode to embed the required content, although there are plugins around that allow you to embed iframes. (I didn’t spot one that let you inline the content of the iframe using srcdoc though?)

When we put together the Isle of Wight planning applications : Mapped page, one of the issues related to how updates to the map should be posted over time.


That is, should the map be uploaded to a fixed page and show only the most recent data, should it be posted as a timestamped post, to provide archival copies of the page, or should it be posted to a page and support a timeslider/history function?

Thinking about this again, the distinction seems to rely on what sort of (re)discovery we want to encourage or support. For example, if the page is a destination page, then we should probably use a page with a fixed URL for the most recent map. Older maps could be accessed via archive links, or perhaps subpages, if a time-filter wasn’t available on a single map view. Alternatively, we might want to alert readers to the map, in which case it might make more sense to use a timestamped post. (We could of course use a post to announce an update to the page, perhaps including a screenshot of the latest map in the post.)

It also strikes me that we need to consider publication schedules by a news outlet compared to the publication schedules associated with a particular dataset.

For example, Land Registry House Prices Paid data is published on a monthly basis a few weeks after each month the data has been collected for. In this case, it probably makes sense to publish on a monthly basis.

But what about care home or food outlet inspection data? The CQC publish data as it becomes available, although searches support the retrieval of data for a particular area published over the last week or last month relative the time the search is made. The Food Standards Agency produce updates to data download files on a daily basis, but the file for any particular area is only updated when it contains new data. (So on any given day, you don’t know which, if any, area files will be updated.)

In this case, it may well be that a news outlet may want to do a couple of things:

  • publish summaries of reports over the last week or last month, on a weekly or monthly schedule – “The CQC published reports for N care homes in the region over the last month, of which X were positive and Y were negative”, etc.
  • engage in a more immediate or responsive publication of stories around particular reports as they are published by the responsible agency. In this case, the journalist needs to find a way of discovering stories in a timely fashion, either through signing up to alerts or inspecting the agency site on a regular basis.

Again, it might be that we can use posts and pages in complementary way: pages that act as fixed destination sites with a fixed URL, and perhaps links off to archived historical sub-pages, as well as related news stories, that contain the latest summary; and posts that announce timely reports as well as ‘page updated’ announcements when the slower-changing page is updated.

More abstractly, it probably makes sense to consider the relative frequencies with which data is originally published (also considering whether the data is published according to a fixed schedule, or in a more responsive way as and when data becomes available), the frequency with which journalists check the data site, and the frequency with which journalists actually publish data related stories.

Tweetable Bullet Points

Reading through Pew Research blog post announcing the highlights of a new report just now (Libraries at the Crossroads), I spotted various Twitter icons around the document, not unlike the sort of speech bubble comment icons used in paragraph level commenting systems.


Nice – built in opportunities for social amplification of particular bullet points:-)

PS thinks: hmm, maybe I could build something similar into auto-generated copy for contexts like this?

PPS via @ned_potter, the suggestion that maybe they’re making use of this WordPress plugin: ClickToTweet. Hmmm… And via @mhawksey, this plugin: Inline Tweet Sharer.