At the risk of coming across as a bit snobbish, this ad for a Data Journalist for The Penny Hoarder riled me somewhat…
Do you have a passion for telling stories with data? We’re looking for a data journalist who can crunch statistics about jobs, budgeting, spending and saving — and produce compelling digital content that resonates with our readers. You should have expertise in data mining and analysis, and the ability to present the results in conversational, fun articles and/or telling graphics.
As our data journalist, you will produce revealing, clickable, data-driven articles and/or graphics, plus serve as a resource for our growing team of writers and editors. We envision using data sources such as the Bureau of Labor Statistics and U.S. Census Bureau to report on personal finance issues of interest to our national readership of young professionals, coupon fans and financially striving people of all ages. We want to infuse our blog with seriously interesting data while staying true to our vibe: fun, weird, useful.
Our ideal candidate…
– Can write in a bloggy, conversational voice that emphasizes what the data means to real people
– Has a knack for identifying clicky topics and story angles that are highly shareable
– Gets excited when a blog post goes viral
According to Wikipedia (who else?!;-), Tabloid journalism is a style of journalism that emphasizes sensational crime stories, gossip columns about celebrities and sports stars, junk food news and astrology.
(Yes, yes, I know, I know, tabloid papers can also do proper, hard hitting investigative journalism… But I’m thinking about that sense of the term…)
So what might tabloid data journalism be? See above?
PS ish prompted by @SophieWarnes, it’s probably worth mentioning the aborted Ampp3d project in this context… eg Ampp3d launches as ‘socially-shareable data journalism’ site, Martin Belam talks about Trinity Mirror’s data journalism at Ampp3d and The Mirror Is Making Widespread Cuts To Its Online Journalism.
After starting to reread my 6th edition copy of How Parliament Works over the weekend, which is now notably dated, I had a quick poke around Amazon looking to see whether there’s a more recent edition (there is…). In doing so, I saw various mentions to historical “Standing Orders of the House of Commons“. A quick search of the Parliament website turned up an appropriate page, and a link to a PDF of the 2016 orders.
Having a print copy of such a document to leave laying around means I’ll able to start to pick up stuff from it using osmotic reading(?!;-), but I couldn’t find anywhere to buy such a copy. And printing it out on looseleaf A4 is way too much like faffing around.
However, it seems that on the one hand Parliamentary licensing is quite liberal (the Open Parliament License), and on the other, no-one would know anyway if I uploaded the PDF and got it printed on-demand, in bound copy, private access style, from Lulu:
With a two or three quid for postage, that comes in at less than the “cover price” of £10 too.
Which got me thinking… maybe I should try to find some other reference material to bundle into the “book” too? The additional page charge for another couple of hundred pages makes no difference to the marginal cost of the postage etc…
(Unfortunately, Parliament doesn’t distribute an electronic copy of Erskine May. Instead, you need a library, or several hundred quid to give to Lexis Nexis.)
It’s a shame Lulu closed their API down, too… that could have been a useful way of eg auto-generating some POD/book printed copies of report and consultation document readings that I typically open into tabs and then never read. (Osmotic reading of long form content through a screen is something I still struggle to do…)
PS If you’ve never tried a Lulu book before, here’s one I prepared earlier… ;-)
We will, I think, be seeing increasing use of the surveillance devices we’ve carry with us and have installed in homes as sources of “tech witness” evidence in the courts…
For example, at the end of last year were reports of the prosecution of a 2015 crime in which the police requested copies of records (court papers, 08/26/2016 01:36 PM SEARCH WARRANT FILED) from Amazon’s audio surveillance device, the Amazon Echo (BBC, Guardian, Independent; the article that broke the story from The Information is subscription only).
Form the justification of the request for the search warrant:
On ??, the Honorable Judge ?? reviewed and approved a search warrant for ??’s residence once again, located at ??, specifically for the search and seizure of electronic devices capable of storing and transmitting any form of data that could be related to this investigation. Officers executed this search warrant on this same date and during the course of the search, I located an Amazon Echo device in the kitchen, lying on the kitchen counter next to the refrigerator, plugged into the wall outlet. I had previously observed this device in the same position and state during the previous search warrant on ??.
While searching ??’ residence, we discovered numerous devices that were used for “smart home” services, to include a “Nest” thermometer that is Wi-Fi connected and remotely controlled, a Honeywell alarm system that included door monitoring alarms and motion sensor in the living room, a wireless weather monitoring system outside on the back patio, and WeMo devices in the garage area for remote-activated lighting purposes that had not been opened yet. All of these devices, to include the Amazon Echo device, can be controlled remotely using a cell phone, computer, or other device capable of communicating through a network and are capable of interacting with one another through the use of one or more applications or programs. Through investigation, it was learned that during the time period of ??’s time at the residence, music was being wirelessly streamed throughout the home and onto the back patio of the residence, which could have been activated and controlled utilizing the Amazon Echo device or an application for the device installed on ??’s cell Apple iPhone.
The Amazon Echo device is constantly listening for the “wake” command of “Alexa” or “Amazon,” and records any command, inquiry, or verbal gesture given after that point, or possibly at all times without the “wake word” being issued, which is uploaded to Amazon.com’s servers at a remote location. It is believed that these records are retained by Amazon.com and that they are evidence related to the case under investigation.
On ??, Amazon.com was served with a search warrant that was reviewed and approved by Circuit Court Judge ?? on the same date. The search warrant was sent through Amazon’s law enforcement email service and was also sent through United States Postal Service Certiﬁed Mail to their corporate headquarters in Tumwater, Washington. The search warrant was received by Amazon through the mail on ??, and representatives with Amazon have been in contact with this agency since receiving the search warrant. In speaking with their law enforcement liaison, Greg Haney, I was informed on two separate occasions that Amazon was in possession of the requested data in the search warrant but needed to consult with their counsel prior to complying with the search warrant. As of ??, Amazon has not provided our agency with the requested data and an extension for the originally ordered search warrant was sought.
After being served with the second search warrant, Amazon did not comply with providing all of the requested information listed in the search warrant, specifically any information that the Echo device could have transmitted to their servers. This agency maintains custody of the Echo device and it has since been learned that the device contains hardware capable of storing data, to potentially include time stamps, audio ﬁles,\nor other data. It is believed that the device may contain evidence related to this investigation and a search of the device itself will yield additional data pertinent to this case.
Our agency has also maintained custody of ??’s cell phone, an LG Model LG—E980, and ??’s cell phone, a Huawei Nexus cell phone, that was seized from ?? as a result of his arrest on ??, and we have been unable to access the data stored on the devices due to a passcode lock on them. Despite efforts to obtain the passcode, the devices could not be accessed. Our agency now has the ability to utilize data extraction methods that negate the need for passcodes and efforts to search ?? and ??’s devices will continue upon issuance of this warrant.
Today, via @charlesarthur (& also Schneier), I notice a story describing how Cops use pacemaker data to charge homeowner with arson, insurance fraud. (I found some (court records Middletown, Butler County, 16CRA04386) but couldn’t find/see the filing for the warrant?) It seems that “[p]olice set out to disprove ??’s story … by obtaining a search warrant to collect data from [his] pacemaker. WLWT5 reported that the cops wanted to know “??’s heart rate, pacer demand and cardiac rhythms before, during and after the fire.”
This builds on previous examples of Fitbit data being called on as evidence in at least of couple of US court cases, challenging claims made by individuals that they were engaged in one sort of behaviour when their logged physiological data suggested they were not.
And of course, many cars now have their own black box, which is likely to include ever more detailed data logs. For example, a recent report by the US Depart of Transportation National Highway Traffic Safety Administration (NHTSA) included reference to “data logs, image files, and records related to the crashes … provided by Tesla in response to NHTSA subpoenas.”
It’ll be interesting to see the extent to which contemporary data/video/audio collecting devices will be viewed as reliable (or unreliable) witnesses, and further down the line, the extent to which algorithmic classifications are trusted. For example, in using OCR to extract the text from the scanned PDF of the court filing shown above, which for some reason I had to convert to a JPG image before Apache Tika running on docker cloud would extract text from it, I noticed on one page it has mis-recognised Amazon servers as Amazon sewers.
PS in passing, I’m quite amazed at how much personal information is made available via public documents associated with the justice system in the US.
Via Andy Dickinson’s Media Mill Gazette open data / data journalism newsletter (issue 92), I notice that Croydon Clinical Commissioning Group appears to have taken a decision to stop prescribing specialist baby formula.
Although hospital prescription data is not typically released as public data (though I wonder, is it FOIable?), which ruled out a quick Sunday morning data dive chasing the weekend newspaper story that Drugs firms are accused of putting cancer patients at risk over price hikes, prescribing data is available for GPs, both as an open data download and via the openprescribing API.
So a wondering for a possible data dive… For GPs in a particular CCG (easy enough to find), could we find prescriptions relating to the baby milk formulas mentioned in the Croydon story (Nutramigen and Neocate) and then see how related prescribing – and costs of prescribing – have changed over the last 12 months?
Yet another thing to add to the “could do this if my time was my own” list…
(According to Collins English dictionary, pareidolia, (noun), the imagined perception of a pattern or meaning where it does not actually exist, as in considering the moon to have human features.)
Whilst reviewing / scoping* possible programming editor environments for the new level 1 courses, one of the things I was encouraged to look at was Philip Guo’s interactive Python Tutor.
According the the original writeup (Philip J. Guo. Online Python Tutor: Embeddable Web-Based Program Visualization for CS Education. In Proceedings of the ACM Technical Symposium on Computer Science Education (SIGCSE), March 2013), the application has an HTML front end that calls on on a backend debugger: “the Online Python Tutor backend takes the source code of a Python program as input and produces an execution trace as output. The backend executes the input program under supervision of the standard Python debugger module (
bdb), which stops execution after every executed line and records the program’s run-time state.”
The tutor itself allows you to step through code snippets a line at a time, displaying a trace of the current variable values.
Another nice feature of the Online Python Tutor, though it was a bit ropey when I first tried it out a few months ago, was the shared session support, that a learner and a tutor see, via a shared link, the same session, with an additional chat box allowing them to chat over the shared experience in realtime.
Whilst the Online Python Tutor allows URLs to saved programs (“tutorials”) to be generated and shared: link to the demo shown in the movie above. The code is actually passed via the URL.
One of the problems with the Online Python Tutor is that requires a network connection so that the code can be passed to the interpreter back end, executed to generate the code trace, and then passed back to the browser. It didn’t take long for folk to start embedding the tutor in an iframe to give a pseudo-traceability experience in the notebook context, but now the Online Python Tutor inspired nbtutor extension makes cell based tracing against the local python kernel possible**.
The nbtutor extension provides cell by cell tracing (when running a cell, all the code in the cell is executed, the trace returned, and then available for visualising. Note that all variables in scope are displayed in the trace, even if they have been set in other cells outside of the nbtutor magic. (I’m not sure if there’s a setting that allows you just to display the variables that are referenced within the cell?) It is also possible to clear all variables in the global scope via a magic parameter, with a prompt to confirm that you really do want to clear out all those variable values.
I’m not sure that the best way would be to go about framing nbtutor exercises in a Jupyter notebook context, but I note that the notebooks used to support the MPR213 (Programming and Information Technology) course from the Department of Mechanical and Aeronautical Engineering in the Faculty of Engineering, Built Environment and Information Technology at the University of Pretoria now include nbtutor examples.
* A cynic might say scoping in the sense not seriously considering anything other than the environments that had already been decided on before the course production process had really started… ;-) I also preferred BlockPy over Scratch, for example. My feeling was that if the OU was going to put developer effort in (the original claim was we wouldn’t have to put effort into Scratch, though of course we are because Scratch wasn’t quite right…) we could add more value to the OU and the community by getting involved with BlockPy, rather than a programming environment developed for primary school kids. Looking again at the “friendly” error messages that the BlockPy environment offers, I’m starting to wondering if elements of that could be reused for some IPython notebook magic…
** Again, I’m of the mind that were it 20 years ago, porting the Online Python Tutor to the Jupyter notebook context might have been something we’d have considered doing in the OU…
One of the challenges of working with Jupyter notebooks to date has been the question of diffing, spotting the differences between two versions of the same notebook. This made collaborative authoring and reviewing of notebooks a bit tricky. It also acted as a brake on using notebooks for student assessment. It’s easy enough to to set an exercise using a templated notebook and then get students to work through it, but marking the completed notebook in return can be a bit fiddly. (The nbgrader system addresses this in part but at the expense of the overhead in terms of having to use additional nbgrader formatting and markup.)
However, there’s ongoing effort now around nbdime (docs). Given past success in getting Jupyter notebook previews displayed in Github, it wouldn’t be unreasonable to think that the diff view too might make it into that environment at some point too…
At the moment,
nbdime works from the command line. It can produce a text diff in the console, or launch a notebook viewer in the browser that shows differences between two notebooks.
The differ works on a cell by cell basis and highlights changes and addtions. (Extra emphasis on the changed text in a markdown cell doesn’t seem to work at the moment?)
If you change the contents of a code cell, or the outputs of a code cell have changed, those differences are identified too. (Note the extra emphasis in the code cell on the changed text, but not in the output.)
To improve readability, you can collapse the display of changed code cell output.
Where cell outputs include graphical objects, differences to these are highlighted too.
(Whilst I note that Github has various tools for exploring the differences between two versions of the same image, I suspect that sort of comparison will be difficult to achieve inline in the notebook differencer.)
I suspect one common way of using
nbdime will be to compare the current state of a notebook with a checkpointed version. (Jupyter notebooks autosave the current state of the notebook quite regulalry. If you force a save, the current state is saved but a “checkpoint” version of the notebook is also saved to a hidden folder. If things go really wrong with your current notebook, you can restore it to the checkpointed version.)
If you’ve saved a checkpoint of a notebook, and want to compare the current (autosaved) version with it, you need to point to the checkpointed file in the checkpoint folder:
nbdiff-web .ipynb_checkpoints/MY_FILE-checkpoint.ipynb MY_FILE.ipynb. It’d be nice if a switch could handle this automatically, eg
nbdime_web --compare-checkpoint MY_FILE.ipynb (It would also be nice if the
nbdiff command could force the notebook to autosave before a diff is run, but I’m not sure how that could be achieved?)
It also strikes me that when restoring from a checkpoint, it might be possible to combine the restoration action with the differencer view so that you can decide which bits of the current notebook you might want to keep (i.e. essentially treat the differences between the current and checkpointed version as conflicts that need to be resolved?)
This is probably pushing things a bit far, but I also wonder if lightweight, inline, cell level differencing would be possible, given that each cell in at running notebook has an undo feature that goes back multiple streps?
Finally, a note about using the differencer to support marking. The differencer view is an HTML file, so whilst you can compare a student’s notebook with the orignal you can’t edit their notebook directly in the differencer to add marks or feedback. (I really do need to have another play with nbgrader, I think…)
PS It’s also worth noting that SageMathCloud has a history slider that lets you run over different autosaved versions of a notebook, although differences are not highlighted.
PPS Thinks: what I’d like is a differencer that generates a new notebook with addition/deletion cells highlighted and colour styled so that I could retain – or delete – the cell and add cells of my own… Something akin to track changes, for example. That way I could run different cells, add annotations, etc etc (related issue).