I doubt there are many readers of this blog who aren’t familiar with science fiction guru Arthur C. Clarke’s adage that “[a]ny sufficiently advanced technology is indistinguishable from magic”. And there may even be a playful few who invoke Rowlingesque spells on the commandline using Harry Potter
bash aliases. So I was wondering again today about what other magical or folkloric ideas could be used to help engage folk’s curiosity in how the world of tech works, and maybe teach computing related ideas through stories.
For example, last week I noticed that a reasonable number of links on Wikipedia point to the Internet Archive.
I also picked from a recent Recode/Decode podcast interview between the person you may know as the awesomest tech interviewer ever, Kara Swisher, and Internet Archive champion, Brewster Kahle, that bots do the repair work. So things like the User:InternetArchiveBot and/or CyberBot II maybe? Broken links are identified, and link references updated to point to archival copies. (For more info, see: More than 1 million formerly broken links in English Wikipedia updated to archived versions from the Wayback Machine and Fixing broken links in Wikipedia (especially the comments).)
Hmm… helpful bots.. like helpful spirits, or Brownies in a folkloric sense. Things that come out at night and help invisibly around the home…
And if there are helpful spirits, there are probably malicious ones too. The code equivalent of boggarts and bogles that cause mischief or mayhem – robot phone callers, or scripts that raise pop-ups when you’re trying to read a post online, for example? Maybe we if we start to rethink of online tech inconveniences as malevolent spirits we’ll find better ways to ignore or dispel them?! Or at least find a way to engage people into thinking about them, and from that working out how best to get rid of them or banish them from our lives?
PS the problem of Link Rot is an issue for maintaining OU course materials too. As materials are presented year on year, link targets move away and/or die. Sometimes the materials are patched with a corrected link to wherever the resource moved to, other times we refresh materials and find a new resource to link to. But generally, I wonder, why don’t we make like Wikipedia and get a Brownie to help? Are there Moodle bots to do helpful work like this around the VLE?
A couple of weeks ago I posted a demo of how to automate the production of a templated report (catchment for GP practices by LSOA on the Isle of Wight) using
knitr (Reporting in a Repeatable, Parameterised, Transparent Way).
Today, I noticed another report, with data, from the House of Commons Library on Superfast Broadband Coverage in the UK. This reports at the ward level rather than the LSOA level the GP report was based on, so I wondered how easy it would be to reuse the GP/LSOA code for a broadband/ward map…
After fighting with the Excel data file (metadata rows before the header and at the end of the table, cruft rows between the header and data table proper) and the R library I was using to read the file (it turned the data into a tibble, with spacey column names I couldn’t get to work with
ggplot, rather than a dataframe – I ended saving to CSV then loading back in again…), not many changes were required to the code at all… What I really should have done was abstracted the code in to an R file (and maybe some importable Rmd chunks) and tried to get the script down to as few lines of bespoke code to handle the new dataset as possible – maybe next time…
I also had a quick play at generating a shiny app from the code (again, cut and pasting rather than abstracting into a separate file and importing… I guess at least now I have three files to look at when trying to abstract the code and to test against…!)
So this has got me thinking – what are the commonly produced “types” of report or report section, and what bits of common/reusuble code would make it easy to generate new automation scripts, at least at a first pass, for a new dataset?
A few weeks ago, I popped together a post listing a few Data Journalism Units on Github. These repos (that is, repositories), are being used to share code (for particular interactives, for example), data, and analysis scripts. They’re starting to hint at ways in which support for public reproducible local data journalism might start to emerge from developing (standardised) data repositories and reproducible workflows built around them.
Here are a handful of other signals that I think support this trend that I’ve come across in the last few weeks (if they haven’t appeared in your own feeds, a great shortcut to many of them is via @digidickinson’s weekly Media Mill Gazette):
- the BBC Local Democracy Reporters Scheme (incl. a Data Journalism Hub) (NUJ briefing [PDF])- a consultation on this closed last week, but I haven’t seen a response yet?
- the launch of the Bureau for Investigative Journalism Local Data Lab;
- nice demonstrations of putting local and locally collected data to work from associates of Data Mill North and Bath Hacked;
- local/national investigative data journalism from the Bristol Cable;
- locally segmented national datasets from GetTheData (note that this is not the deprecated GetTheData QnA site that emerged after a manic conversation with Rufus Pollock several years ago!;-)
- local profiling tools from Public Health England (eg alcohol), and locally partitioned data from DCLG OpenDataCommunities, the Consumer Data Research Centre and the House of Commons Library.
And here’s another one, from today – the Associated Press putting together a pilot with data publishing platform data.world “to help newsrooms find local stories within large datasets” (Localizing data, quantifying stories, and showing your work at The Associated Press ). I’m not sure what the pilot will involve, but the rationale sounds interesting:
Transparency is important. It’s a standard we hold the government to, and it’s a standard we should hold the press to. The more journalists can show their work, whether it’s a copy of a crucial document or the data underlying an analysis, the more reason their audience has to accept their findings (or take issue with them in an informed way). When we share our data and methodology with our members, those journalists give us close scrutiny, which is good for everyone. And when we can release the data more broadly and invite our readers to check our work, we create a more secure grounding for the relationship with the reader.
:-) S’what we need… Show your working…
At the risk of coming across as a bit snobbish, this ad for a Data Journalist for The Penny Hoarder riled me somewhat…
Do you have a passion for telling stories with data? We’re looking for a data journalist who can crunch statistics about jobs, budgeting, spending and saving — and produce compelling digital content that resonates with our readers. You should have expertise in data mining and analysis, and the ability to present the results in conversational, fun articles and/or telling graphics.
As our data journalist, you will produce revealing, clickable, data-driven articles and/or graphics, plus serve as a resource for our growing team of writers and editors. We envision using data sources such as the Bureau of Labor Statistics and U.S. Census Bureau to report on personal finance issues of interest to our national readership of young professionals, coupon fans and financially striving people of all ages. We want to infuse our blog with seriously interesting data while staying true to our vibe: fun, weird, useful.
Our ideal candidate…
– Can write in a bloggy, conversational voice that emphasizes what the data means to real people
– Has a knack for identifying clicky topics and story angles that are highly shareable
– Gets excited when a blog post goes viral
According to Wikipedia (who else?!;-), Tabloid journalism is a style of journalism that emphasizes sensational crime stories, gossip columns about celebrities and sports stars, junk food news and astrology.
(Yes, yes, I know, I know, tabloid papers can also do proper, hard hitting investigative journalism… But I’m thinking about that sense of the term…)
So what might tabloid data journalism be? See above?
PS ish prompted by @SophieWarnes, it’s probably worth mentioning the aborted Ampp3d project in this context… eg Ampp3d launches as ‘socially-shareable data journalism’ site, Martin Belam talks about Trinity Mirror’s data journalism at Ampp3d and The Mirror Is Making Widespread Cuts To Its Online Journalism.
PPS …and a write-up of that by Sophie: Is there room for ‘tabloid data journalism’?
After starting to reread my 6th edition copy of How Parliament Works over the weekend, which is now notably dated, I had a quick poke around Amazon looking to see whether there’s a more recent edition (there is…). In doing so, I saw various mentions to historical “Standing Orders of the House of Commons“. A quick search of the Parliament website turned up an appropriate page, and a link to a PDF of the 2016 orders.
Having a print copy of such a document to leave laying around means I’ll able to start to pick up stuff from it using osmotic reading(?!;-), but I couldn’t find anywhere to buy such a copy. And printing it out on looseleaf A4 is way too much like faffing around.
However, it seems that on the one hand Parliamentary licensing is quite liberal (the Open Parliament License), and on the other, no-one would know anyway if I uploaded the PDF and got it printed on-demand, in bound copy, private access style, from Lulu:
With a two or three quid for postage, that comes in at less than the “cover price” of £10 too.
Which got me thinking… maybe I should try to find some other reference material to bundle into the “book” too? The additional page charge for another couple of hundred pages makes no difference to the marginal cost of the postage etc…
(Unfortunately, Parliament doesn’t distribute an electronic copy of Erskine May. Instead, you need a library, or several hundred quid to give to Lexis Nexis.)
It’s a shame Lulu closed their API down, too… that could have been a useful way of eg auto-generating some POD/book printed copies of report and consultation document readings that I typically open into tabs and then never read. (Osmotic reading of long form content through a screen is something I still struggle to do…)
PS If you’ve never tried a Lulu book before, here’s one I prepared earlier… ;-)
We will, I think, be seeing increasing use of the surveillance devices we’ve carry with us and have installed in homes as sources of “tech witness” evidence in the courts…
For example, at the end of last year were reports of the prosecution of a 2015 crime in which the police requested copies of records (court papers, 08/26/2016 01:36 PM SEARCH WARRANT FILED) from Amazon’s audio surveillance device, the Amazon Echo (BBC, Guardian, Independent; the article that broke the story from The Information is subscription only).
Form the justification of the request for the search warrant:
On ??, the Honorable Judge ?? reviewed and approved a search warrant for ??’s residence once again, located at ??, specifically for the search and seizure of electronic devices capable of storing and transmitting any form of data that could be related to this investigation. Officers executed this search warrant on this same date and during the course of the search, I located an Amazon Echo device in the kitchen, lying on the kitchen counter next to the refrigerator, plugged into the wall outlet. I had previously observed this device in the same position and state during the previous search warrant on ??.
While searching ??’ residence, we discovered numerous devices that were used for “smart home” services, to include a “Nest” thermometer that is Wi-Fi connected and remotely controlled, a Honeywell alarm system that included door monitoring alarms and motion sensor in the living room, a wireless weather monitoring system outside on the back patio, and WeMo devices in the garage area for remote-activated lighting purposes that had not been opened yet. All of these devices, to include the Amazon Echo device, can be controlled remotely using a cell phone, computer, or other device capable of communicating through a network and are capable of interacting with one another through the use of one or more applications or programs. Through investigation, it was learned that during the time period of ??’s time at the residence, music was being wirelessly streamed throughout the home and onto the back patio of the residence, which could have been activated and controlled utilizing the Amazon Echo device or an application for the device installed on ??’s cell Apple iPhone.
The Amazon Echo device is constantly listening for the “wake” command of “Alexa” or “Amazon,” and records any command, inquiry, or verbal gesture given after that point, or possibly at all times without the “wake word” being issued, which is uploaded to Amazon.com’s servers at a remote location. It is believed that these records are retained by Amazon.com and that they are evidence related to the case under investigation.
On ??, Amazon.com was served with a search warrant that was reviewed and approved by Circuit Court Judge ?? on the same date. The search warrant was sent through Amazon’s law enforcement email service and was also sent through United States Postal Service Certiﬁed Mail to their corporate headquarters in Tumwater, Washington. The search warrant was received by Amazon through the mail on ??, and representatives with Amazon have been in contact with this agency since receiving the search warrant. In speaking with their law enforcement liaison, Greg Haney, I was informed on two separate occasions that Amazon was in possession of the requested data in the search warrant but needed to consult with their counsel prior to complying with the search warrant. As of ??, Amazon has not provided our agency with the requested data and an extension for the originally ordered search warrant was sought.
After being served with the second search warrant, Amazon did not comply with providing all of the requested information listed in the search warrant, specifically any information that the Echo device could have transmitted to their servers. This agency maintains custody of the Echo device and it has since been learned that the device contains hardware capable of storing data, to potentially include time stamps, audio ﬁles,\nor other data. It is believed that the device may contain evidence related to this investigation and a search of the device itself will yield additional data pertinent to this case.
Our agency has also maintained custody of ??’s cell phone, an LG Model LG—E980, and ??’s cell phone, a Huawei Nexus cell phone, that was seized from ?? as a result of his arrest on ??, and we have been unable to access the data stored on the devices due to a passcode lock on them. Despite efforts to obtain the passcode, the devices could not be accessed. Our agency now has the ability to utilize data extraction methods that negate the need for passcodes and efforts to search ?? and ??’s devices will continue upon issuance of this warrant.
Today, via @charlesarthur (& also Schneier), I notice a story describing how Cops use pacemaker data to charge homeowner with arson, insurance fraud. (I found some (court records Middletown, Butler County, 16CRA04386) but couldn’t find/see the filing for the warrant?) It seems that “[p]olice set out to disprove ??’s story … by obtaining a search warrant to collect data from [his] pacemaker. WLWT5 reported that the cops wanted to know “??’s heart rate, pacer demand and cardiac rhythms before, during and after the fire.”
This builds on previous examples of Fitbit data being called on as evidence in at least of couple of US court cases, challenging claims made by individuals that they were engaged in one sort of behaviour when their logged physiological data suggested they were not.
And of course, many cars now have their own black box, which is likely to include ever more detailed data logs. For example, a recent report by the US Depart of Transportation National Highway Traffic Safety Administration (NHTSA) included reference to “data logs, image files, and records related to the crashes … provided by Tesla in response to NHTSA subpoenas.”
It’ll be interesting to see the extent to which contemporary data/video/audio collecting devices will be viewed as reliable (or unreliable) witnesses, and further down the line, the extent to which algorithmic classifications are trusted. For example, in using OCR to extract the text from the scanned PDF of the court filing shown above, which for some reason I had to convert to a JPG image before Apache Tika running on docker cloud would extract text from it, I noticed on one page it has mis-recognised Amazon servers as Amazon sewers.
PS in passing, I’m quite amazed at how much personal information is made available via public documents associated with the justice system in the US.
Via Andy Dickinson’s Media Mill Gazette open data / data journalism newsletter (issue 92), I notice that Croydon Clinical Commissioning Group appears to have taken a decision to stop prescribing specialist baby formula.
Although hospital prescription data is not typically released as public data (though I wonder, is it FOIable?), which ruled out a quick Sunday morning data dive chasing the weekend newspaper story that Drugs firms are accused of putting cancer patients at risk over price hikes, prescribing data is available for GPs, both as an open data download and via the openprescribing API.
So a wondering for a possible data dive… For GPs in a particular CCG (easy enough to find), could we find prescriptions relating to the baby milk formulas mentioned in the Croydon story (Nutramigen and Neocate) and then see how related prescribing – and costs of prescribing – have changed over the last 12 months?
Yet another thing to add to the “could do this if my time was my own” list…