Fragment – “Autonomous Vehicles are Safer” – Who Says, How?

Over the weekend, I got a bit ranty on the Twitterz about autonomous vehicles, taking issue with a variety of things, including some of the numbers used to promote “autonomous vehicles save lives” claims, and the infallibility of autonomous vehicles because of, you know, “AI”.

Noting a link from the Overspill blog today from the New York Times – America Is Now an Outlier on Driving Deaths – showing how the US is one of the laggards in the “Deaths per billion vehicle miles traveled” stakes, I thought I’d grab together some of the links I shared previously just in case I ever get round to starting to dig into this a bit more.

As weak support for my belief that “convenient” numbers are used to make “justifiable” claims about the significantly better safety record of autonomous vehicles, I’d shared the following links without digging in to the them (they’re starting points…):

  • on fatality and accident rates differing in different countries and on different road types:
    • WHO /Global Health Observatory (GHO) data on Road Traffic Deaths
    • UK Department for Transport National Statistics: Transport Statistics Great Britain: 2016, for example TSGB0801 (Reported road accidents and casualties, population, vehicle population, index of vehicle mileage, by road user type)TSGB0802 (Reported road accident casualties by road user type and severity), TSGB0803 (Reported accidents and accident rates by road class and severity)TSGB0809 (International comparisons of road deaths: number and rates for different road users, by selected countries), TSGB0811 (Motor vehicle offences: findings of guilt at all courts fixed penalty notices and written warnings, by type of offence)
  • on possible side effects, weak signals on some of the things the lobbiests might go after to simple the problems faced by autonomous vehicles, perhaps to help provide better accident stats in early days of any roll-out to provide further “evidence” for why autonomous vehicles are better:
    • a Wired article Maybe It’s Time to Cede US Freeways to Driverless Cars on a white paper from a VC firm proposing an ““autonomous vehicle corridor” replacing the I-5 freeway between Seattle and Vancouver”; I wonder if access to things like bus lanes or AV lanes on motorways is also being lobbied for in the UK?
    • a proposal from the US National Highway Traffic Safety Administration on Federal Motor Vehicle Safety Standards; V2V Communications. “V2V” in this case being ‘vehicle to vehicle’, and relating to such things as the transmission of Basic Safety Messages (BSMs) “which contain data elements such as speed, heading, trajectory, and other information”. These are presumably easy to build in to autonomous vehicles and are likely to also benefit other new vehicles with sensors already in place for lane tracking, cruise control etc. I’m saying nothing about the privacy implications that spring up around the local broadcast of this data… It also sets up the model for arguments where technologies used to support autonomous vehicle operation become mandated as required on human riven machines. (Presumably, because machines don’t need seatbelts, US drivers will continue to assume they don’t need to wear seatbelts either…)

I’m not saying AVs don’t / won’t have better safety records than human driven cars under particular circumstances, I’m just wary about the basis of any comparisons made and how the numbers are chosen to make the AV case more attractive…

I haven’t had time to dig into the numbers, but it’d be interesting to see how different breakdowns of the actual numbers compare to headline figures (selected how?) proffered by the AV lobbiests. I suspect they go for a convenient average, where Anscombe’s Quartet is more the reality…

One thing I suspect the lobbiests will claim is improvements in safety projected from the implementation of a range of technologies, not all of which are necessarily limited to autonomous vehicles, such as BSMs that could be used by human driven cars in support of things like lane following and advanced cruise control. I also suspect AV lobbiests will compare road deaths per billion km in US now with projected road deaths from AV vehicles with BSM implemented everywhere in some point in the future. Which would be an unfair comparison (assuming that human driver rates could also be improved with the adoption of some of the same yet to be adopted wider vehicle safety technologies).

I also haven’t seen any simulations of road accidents on motorways and dual carriageways for different road and traffic conditions and different mixes of “human” and “autonomous” drivers?

On the matter of “AI, therefore good”, I am circumspect:

  • the trained models used to generate control signals for AVs will almost definitely include biases we haven’t identified yet; I also wonder about the range of different models required for different national driving styles and road types (English country lanes are not the same as wide US boulevards). I also wonder about how the models will need to evolve over time, as folk change their own behaviour with respect to AVs and as more AVs hit the road and more “machine assist” technologies are built in human driven cars. And what happens when you go from one regime (high density AVs/machine assist) to another (high density human drivers);
  • tech gets hacked, for profit, fun or out of sheer maliciousness. For example, hacking the environment to fool machines, as in recent demonstrations of Slight Street Sign Modifications Can Completely Fool Machine Learning Algorithms; and folk are already working on proof of concept attacks based on recreated models (for example, Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN. Will particular paint jobs be made illegal becuase they’re confusing to computers? I’ve also been reading The War Magician: The True Story of Jasper Maskelyne, which makes me thing it’d be interesting to see what a magician turned hacker could come up with…!
  • people are also playful and will likely try to game or taunt AVs, as in James Bridle’s Autonomous Trap 001 ;-)

I also wonder what protocols will exist when meeting an AV on a single lane country road, for example, or situations of four cars meeting at a mini-roundabout (an occurrence oft-encountered, and that can lead to frequent delays, on the Isle of Wight…!). Who will give precedence to whom, and how will it be signalled? If autonomous vehicles are only allowed on particular categories of road, then does that makes it more likely that we will also end up with roads/lanes where only AVs are allowed (to “make it fair”?!).

I imagine there are lots of other concerns – but also lobbiest claims – out there for the reading of in the evidence of the recent Lords inquiry into Connected and Autonomous Vehicles.

Recent Releases: Plotly Falcon SQL Client and the Remarkable Datasette SQLite2API Generator

A few years ago, Plot.ly launched an online chart hosting service that allowed users to create  – and host – charts based on their own datasets. This was followed with the release of a open source charting libraries and a Python dashboard framework (dash). Now, they’ve joined the desktop query engine’n’charting party with the Falcon SQL Client, a free electron desktop app for Mac and Windows (code).

The app (once started – my Mac complained it was unsigned when I tried to run it the first time) appears to allow you to connect to a range of databases, query engines (such as Apache Drill) and SQLite.

Unfortunately, I couldn’t find a file suffix that would work when looking for a SQLite file to try it out quickly – and whilst trying to find any docs at all for connecting to SQLite (there are none that I can find at the moment), I got the impression that SQLite is not really a first class endpoint for Plotly:

I did manage to get it running against my ergast MySQL container though:

Providing the client with a database name, it loads in the database tables and allows you to query against them. Expanding a table reveals the column names and data type:

After running a query, there’s an option to generate various charts against the result, albeit in a limited way. (I couldn’t label or size elements in the scatter plot, for example.)

The chart types on offer are… a bit meh…:

The result can be viewed as a table, but there are no sort options on the column headers – you have to do that yourself in the query, I guess?

The export options are as you might expect. CSV is there, but not a CSV data package that is bundled with metadata, for example:

All in all, if this is an entry level competitor to Tableau, it’s a very entry level… I’d probably be more tempted to use the browser based Franchise query engine, not least because that also lets you query over CSV files. (That said, from a quick check of the repo, it doesn’t look like folk are working on it much:-(.

Far more compelling in quick-query land is a beautiful component from Simon Willison (who since he’s started blogging again has, as ever before, just blown me away with the stuff he turns up and tinkers with): datasette.

This python package lets you post a SQLite file as a queryable API. (Along the way Simon also produced a handy command line routine for loading CSV files into a SQLite3 database: simonw/csvs-to-sqlite.) Out of the box, datasette lets you fire up the API as a local service, or as a remotely hosted one on Zeit Now (which I’ve yet to play with).

In part, this also reminded me of creating simple JSON APIs from a Jupyter notebook, and the appmode Jupyter extension that allows you to run a widgetised notebook as an app. In short, it got me wondering about how services/apps created that way could be packaged and distributed more easily, perhaps using something like Binderhub?

Idly Wondering… Python Packages From Jupyter Notebooks

How can we go about using Jupyter notebooks to create Python packages?

One of the ways of saving a Jupyter notebook is as a python file, which could be handy…

One of the ways of using a Jupyer notebook is to run it inside another notebook by calling it using the %run cell magic – which provides a crude way of importing the contents of one notebook into another.

Another way of using a Jupyter notebook is to treat it as as Python module using a recipe described in Importing Jupyter Notebooks as Modules that hooks into the python import machinery. (It even looks to work with importing notebooks containing code cells that include IPython commands?)

But could we mark up one or more linked notebooks in some way that would build a Python package and zip it up for distribution via pip?

I’ve no idea how it would work, but here’s something related-ish (via @simonw) that creates a command line interface from a Python file: click:

Click is a Python package for creating beautiful command line interfaces in a composable way with as little code as necessary. It’s the “Command Line Interface Creation Kit”. It’s highly configurable but comes with sensible defaults out of the box.

It aims to make the process of writing command line tools quick and fun while also preventing any frustration caused by the inability to implement an intended CLI API.

I guess it could work on Python exported from a Jupyter notebook too?

TO DO: see if I can write a Jupyter notebook that can be used to generate a CLI, (perhaps creating a Jupyter notebook extension to create a CLI from a notebook?)

See also: oschuett/appmode Jupyter extension for creating and launching a simple web app from a Jupyter notebook.

Hmm… also via @simonw, could Zeit Now be used to launch appmode apps, as in Datasette: instantly create and publish an API for your SQLite databases?

Programming, meh… Let’s Teach How to Write Computational Essays Instead

From Stephen Wolfram, a nice phrase to describe the sorts of thing you can create using tools like Jupyter notebooks, Rmd and Mathematica notebooks: computational essays that complements the “computational narrative” phrase that is also used to describe such documents.

Wolfram’s recent blog post What Is a Computational Essay?, part essay, part computational essay,  is primarily a pitch for using Mathematica notebooks and the Wolfram Language. (The Wolfram Language provides computational support plus access to a “fact engine” database that ca be used to pull factual information into the coding environment.)

But it also describes nicely some of the generic features of other “generative document” media (Jupyter notebooks, Rmd/knitr) and how to start using them.

There are basically three kinds of things [in a computational essay]. First, ordinary text (here in English). Second, computer input. And third, computer output. And the crucial point is that these three kinds of these all work together to express what’s being communicated.

In Mathematica, the view is something like this:


In Jupyter notebooks:

In its raw form, an RStudio Rmd document source looks something like this:

A computational essay is in effect an intellectual story told through a collaboration between a human author and a computer. …

The ordinary text gives context and motivation. The computer input gives a precise specification of what’s being talked about. And then the computer output delivers facts and results, often in graphical form. It’s a powerful form of exposition that combines computational thinking on the part of the human author with computational knowledge and computational processing from the computer.

When we originally drafted the OU/FutureLearn course Learn to Code for Data Analysis (also available on OpenLearn), we wrote the explanatory text – delivered as HTML but including static code fragments and code outputs – as a notebook, and then ‘ran” the notebook to generate static HTML (or markdown) that provided the static course content. These notebooks were complemented by actual notebooks that students could work with interactively themselves.

(Actually, we prototyped authoring both the static text, and the elements to be used in the student notebooks, in a single document, from which the static HTML and “live” notebook documents could be generated: Authoring Multiple Docs from a Single IPython Notebook. )

Whilst the notion of the computational essay as a form is really powerful, I think the added distinction between between generative and generated documents is also useful. For example, a raw Rmd document of Jupyter notebook is a generative document that can be used to create a document containing text, code, and the output generated from executing the code. A generated document is an HTML, Word, or PDF export from an executed generative document.

Note that the generating code can be omitted from the generated output document, leaving just the text and code generated outputs. Code cells can also be collapsed so the code itself is hidden from view but still available for inspection at any time:

Notebooks also allow “reverse closing” of cells—allowing an output cell to be immediately visible, even though the input cell that generated it is initially closed. This kind of hiding of code should generally be avoided in the body of a computational essay, but it’s sometimes useful at the beginning or end of an essay, either to give an indication of what’s coming, or to include something more advanced where you don’t want to go through in detail how it’s made.

Even if notebooks are not used interactively, they can be used to create correct static texts where outputs that are supposed to relate to some fragment of code in the main text actually do so because they are created by the code, rather than being cut and pasted from some other environment.

However, making the generative – as well as generated – documents available means readers can learn by doing, as well as reading:

One feature of the Wolfram Language is that—like with human languages—it’s typically easier to read than to write. And that means that a good way for people to learn what they need to be able to write computational essays is for them first to read a bunch of essays. Perhaps then they can start to modify those essays. Or they can start creating “notes essays”, based on code generated in livecoding or other classroom sessions.

In terms of our own learnings to date about how to use notebooks most effectively as part of a teaching communication (i.e. as learning materials), Wolfram seems to have come to many similar conclusions. For example, try to limit the amount of code in any particular code cell:

In a typical computational essay, each piece of input will usually be quite short (often not more than a line or two). But the point is that such input can communicate a high-level computational thought, in a form that can readily be understood both by the computer and by a human reading the essay.

...

So what can go wrong? Well, like English prose, can be unnecessarily complicated, and hard to understand. In a good computational essay, both the ordinary text, and the code, should be as simple and clean as possible. I try to enforce this for myself by saying that each piece of input should be at most one or perhaps two lines long—and that the caption for the input should always be just one line long. If I’m trying to do something where the core of it (perhaps excluding things like display options) takes more than a line of code, then I break it up, explaining each line separately.

It can also be useful to "preview" the output of a particular operation that populates a variable for use in the following expression to help the reader understand what sort of thing that expression is evaluating:

Another important principle as far as I’m concerned is: be explicit. Don’t have some variable that, say, implicitly stores a list of words. Actually show at least part of the list, so people can explicitly see what it’s like.

In many respects, the computational narrative format forces you to construct an argument in a particular way: if a piece of code operates on a particular thing, you need to access, or create, the thing before you can operate on it.

[A]nother thing that helps is that the nature of a computational essay is that it must have a “computational narrative”—a sequence of pieces of code that the computer can execute to do what’s being discussed in the essay. And while one might be able to write an ordinary essay that doesn’t make much sense but still sounds good, one can’t ultimately do something like that in a computational essay. Because in the end the code is the code, and actually has to run and do things.

One of the arguments I've been trying to develop in an attempt to persuade some of my colleagues to consider the use of notebooks to support teaching is the notebook nature of them. Several years ago, one of the en vogue ideas being pushed in our learning design discussions was to try to find ways of supporting and encouraging the use of "learning diaries", where students could reflect on their learning, recording not only things they'd learned but also ways they'd come to learn them. Slightly later, portfolio style assessment became "a thing" to consider.

Wolfram notes something similar from way back when...

The idea of students producing computational essays is something new for modern times, made possible by a whole stack of current technology. But there’s a curious resonance with something from the distant past. You see, if you’d learned a subject like math in the US a couple of hundred years ago, a big thing you’d have done is to create a so-called ciphering book—in which over the course of several years you carefully wrote out the solutions to a range of problems, mixing explanations with calculations. And the idea then was that you kept your ciphering book for the rest of your life, referring to it whenever you needed to solve problems like the ones it included.

Well, now, with computational essays you can do very much the same thing. The problems you can address are vastly more sophisticated and wide-ranging than you could reach with hand calculation. But like with ciphering books, you can write computational essays so they’ll be useful to you in the future—though now you won’t have to imitate calculations by hand; instead you’ll just edit your computational essay notebook and immediately rerun the Wolfram Language inputs in it.

One of the advantages that notebooks have over some other environments in which students learn to code is that structure of the notebook can encourage you to develop a solution to a problem whilst retaining your earlier working.

The earlier working is where you can engage in the minutiae of trying to figure out how to apply particular programming concepts, creating small, playful, test examples of the sort of the thing you need to use in the task you have actually been set. (I think of this as a "trial driven" software approach rather than a "test driven* one; in a trial,  you play with a bit of code in the margins to check that it does the sort of thing you want, or expect, it to do before using it in the main flow of a coding task.)

One of the advantages for students using notebooks is that they can doodle with code fragments to try things out, and keep a record of the history of their own learning, as well as producing working bits of code that might be used for formative or summative assessment, for example.

Another advantage is that by creating notebooks, which may include recorded fragments of dead ends when trying to solve a particular problem, is that you can refer back to them. And reuse what you learned, or discovered how to do, in them.

And this is one of the great general features of computational essays. When students write them, they’re in effect creating a custom library of computational tools for themselves—that they’ll be in a position to immediately use at any time in the future. It’s far too common for students to write notes in a class, then never refer to them again. Yes, they might run across some situation where the notes would be helpful. But it’s often hard to motivate going back and reading the notes—not least because that’s only the beginning; there’s still the matter of implementing whatever’s in the notes.

Looking at many of the notebooks students have created from scratch to support assessment activities in TM351, it's evident that many of them are not using them other than as an interactive code editor with history. The documents contain code cells and outputs, with little if any commentary (what comments there are are often just simple inline code comments in a code cell). They are barely computational narratives, let alone computational essays; they're more of a computational scratchpad containing small code fragments, without context.

This possibly reflects the prior history in terms of code education that students have received, working "out of context" in an interactive Python command line editor, or a traditional IDE, where the idea is to produce standalone files containing complete programmes or applications. Not pieces of code, written a line at a time, in a narrative form, with example output to show the development of a computational argument.

(One argument I've heard made against notebooks is that they aren't appropriate as an environment for writing "real programmes" or "applications". But that's not strictly true: Jupyter notebooks can be used to define and run microservices/APIs as well as GUI driven applications.)

However, if you start to see computational narratives as a form of narrative documentation that can be used to support a form of literate programming, then once again the notebook format can come in to its own, and draw on styling more common in a text document editor than a programming environment.

(By default, Jupyter notebooks expect you to write text content in markdown or markdown+HTML, but WYSIWYG editors can be added as an extension.)

Use the structured nature of notebooks. Break up computational essays with section headings, again helping to make them easy to skim. I follow the style of having a “caption line” before each input. Don’t worry if this somewhat repeats what a paragraph of text has said; consider the caption something that someone who’s just “looking at the pictures” might read to understand what a picture is of, before they actually dive into the full textual narrative.

As well as allowing you to create documents in which the content is generated interactively - code cells can be changed and re-run, for example - it is also possible to embed interactive components in both generative and generated documents.

On the one hand, it's quite possible to generate and embed an interactive map or interactive chart that supports popups or zooming in a generated HTML output document.

On the other, Mathematica and Jupyter both support the dynamic creation of interactive widget controls in generative documents that give you control over code elements in the document, such as sliders to change numerical parameters or list boxes to select categorical text items. (In the R world, there is support for embedded shiny apps in Rmd documents.)

These can be useful when creating narratives that encourage exploration (for example, in the sense of  explorable explantations, though I seem to recall Michael Blastland expressing concern several years ago about how ineffective interactives could be in data journalism stories.

The technology of Wolfram Notebooks makes it straightforward to put in interactive elements, like Manipulate, [interact/interactive in Jupyter notebooks] into computational essays. And sometimes this is very helpful, and perhaps even essential. But interactive elements shouldn’t be overused. Because whenever there’s an element that requires interaction, this reduces the ability to skim the essay."

I've also thought previously that interactive functions are a useful way of motivating the use of functions in general when teaching introductory programming. For example, An Alternative Way of Motivating the Use of Functions?.

One of the issues in trying to set up student notebooks is how to handle boilerplate code that is required before the student can create, or run, the code you actually want them to explore. In TM351, we preload notebooks with various packages and bits of magic; in my own tinkerings, I'm starting to try to package stuff up so that it can be imported into a notebook in a single line.

Sometimes there’s a fair amount of data—or code—that’s needed to set up a particular computational essay. The cloud is very useful for handling this. Just deploy the data (or code) to the Wolfram Cloud, and set appropriate permissions so it can automatically be read whenever the code in your essay is executed.

As far as opportunities for making increasing use of notebooks as a kind of technology goes, I came to a similar conclusion some time ago to Stephen Wolfram when he writes:

[I]t’s only very recently that I’ve realized just how central computational essays can be to both the way people learn, and the way they communicate facts and ideas. Professionals of the future will routinely deliver results and reports as computational essays. Educators will routinely explain concepts using computational essays. Students will routinely produce computational essays as homework for their classes.

Regarding his final conclusion, I'm a little bit more circumspect:

The modern world of the web has brought us a few new formats for communication—like blogs, and social media, and things like Wikipedia. But all of these still follow the basic concept of text + pictures that’s existed since the beginning of the age of literacy. With computational essays we finally have something new.

In many respects, HTML+Javascript pages have been capable of delivering, and actually delivering, computationally generated documents for some time. Whether computational notebooks offer some sort of step-change away from that, or actually represent a return to the original read/write imaginings of the web with portable and computed facts accessed using Linked Data?

Some Recent Noticings From the Jupyter Ecosystem

Over the last couple of weeks, I’ve got back into the speaking thing, firstly at an OU TEL show’n’tell event, then at a Parliamentary Digital Service show’n’tell.

In each case, the presentation was based around some of the things you can do with notebooks, one of which was using the RISE extension to run a notebook as an interactive slideshow: cells map on to slides or slide elements, and code cells can be executed live within the presentation, with any generated cell outputs being displayed in the slide.

RISE has just been updated to include an autostart mode that can be demo’ed if you run the RISE example on Binderhub.

Which brings me to Binderhub. Originally know as MyBinder, Binderhub takes the MyBinder idea of building a Docker image based on the build specification and content files contained in a public Github repository, and launching a Docker container from that image. Binderhub has recently moved into the Jupyter ecosystem, with the result that there are several handy spin-off command line components; for example, jupyter-repo2docker lets you build, and optionally push and/or launch, a local image from a Github repository or a local repository.

To follow on from my OU show’n’tell, I started putting together a set of branches on a single repository (psychemedia/showntell) that will eventually(?!) contain working demos of how to use Jupyter notebooks as part of “generative document” workflow in particular topic areas. For example, for authoring texts containing rich media assets in a maths subject area, or music. (The environment I used for the shown’n’tell was my own build (checks to make sure I turned that cloud machine off so I’m not still paying for it!), and I haven’t got working Binderhub environments for all the subject demos yet. If anyone would like to contribute to setting up the builds, or adding to subject specific demos, please get in touch…)

I also prepped for the PDS event by putting together a Binderhub build file in my psychemedia/parlihacks repo so (most of) the demo code would work on Binderhub. I think the only think that doesn’t work at the moment is the Shiny app demo? This includes an RStudio environment, launched from the Jupter notebooks New menu. (For an example, see the binder-examples/dockerfile-rstudio demo.)

So – long and short of that – you can create multiple demo environments in a single Github repo using a different branch for each demo, and then launch them separately using Binderhub.

What else…?

Oh yes, a new extension gives you a Shiny like workflow for creating simple apps from a Jupyter notebook: appmode. This seems to complement the Jupyter dashboards approoach, by providing an “app view” of a notebook that displays the content of markdown cells and code cell outputs, but hides the code cell contents. So if you’e been looking for a Jupyter notebook equivalent to R/shiny app development, this may get you some of the way there… (One of the nice things about the app view is that you can easily “View Source” – and modify that source…)

Possibly related to the appmode way of doing things, one thing I showed in the PDS show’n’tell was how notebooks can be used to define simple API services using the jupyter/kernel_gateway (example). These seem to run okay – locally at least – inside Binderhub, although I didn’t try calling a Jupyter API service from outside the container. (Maybe they can be made publicly available via the jupyterhub/nbserverproxy? Why’s this relevant to appmode? My thinking is architecturally you could separate out concerns, having one or more notebooks running an API that is consumed from the appmode notebook?

Another recent announcement came from Google in the form of Colaboratory, a “research project created to help disseminate machine learning education and research”. The environment is “a Jupyter notebook environment that requires no setup to use”, although it does require registration to run notebook cells, and there appears to be a waiting list. The most interesting thing, perhaps, is the ability to collaboratively work on notebooks shared with other people across Google Drive. I think this is separate from the jupyterlab-google-drive initiative, which is looking to offer a similar sort of shared working, again through Google Drive?

By the by, it’s probably also worth noting that other big providers make notebooks available, such as Microsoft (notebooks.azure.com) and IBM (eg datascientistworkbench.com, cognitiveclass.ai; digging around, cognitiveclass.ai seems to be a rebranding of bigdatauniversity.com).

There are other hosted notebook servers relevant to education too: CoCalc (previously SageMathCloud) offers a free way in, as does gryd.us if you have a .edu email address. pythonanywhere.com/ offers notebooks to anyone on a paid plan.

It also seems like there are services starting to appear that offer free notebooks as well as compute power for research/scientific computing on a model similar to CoCalc (free tier in, then buy credits for additional services). For example, Kogence.

For sharing notebooks, I also just spotted Anaconda Cloud, which looks like it could be an interesting place to browse every so often…

Interesting times…

Sharing Goes Both Ways – No Secrets Social

A long time ago, I wrote a post on Personal Declarations on Your Behalf – Why Visiting One Website Might Tell Another You Were There that describes how publishers who host third party javascript on their website allow those third parties to track your visits to those websites.

This means I can’t just visit the UK Parliament website unnoticed, for example. Google get told about every page I visit on the site.

(I’m still not clear about the extent to which my personal Google identity (the one I log into Google with), my advertising Google identity (the one that collects information about the ads I’ve been shown and the pages I’ve visited that run Google ads), and my analytics Google identity (the one that collects information about the pages I’ve visited that run Google Analytics and that may be browser specific?) are: a) reconciled? b) reconcilable? I’m also guessing if I’m logged in to Chrome, my complete browsing history in that browser is associated with my Google personal identity?)

The Parliament website is not unusual in this respect. Google Analytics are all over the place.

In a post today linked to by @charlesarthur and yesterday by O’Reilly Radar, Gizmodo describes How Facebook Figures Out Everyone You’ve Ever Met.

One way of doing this is similar to the above, in the sense of other people dobbing you in.

For example, if you appear in the contacts on someone’s phone, and they allowed Facebook to “share” their phone contact details when they install the Facebook app (which many people do), Facebook gains access firstly to my contact details and secondly to the fact that I stand in some sort of relationship to you.

Facebook also has the potential to log that relationship against my data, even if I have never declared that relationship to Facebook.

So it’s not “my data” at all, in the sense of me having informed Facebook about the fact. It’s data “about me” that Facebook has collected from wherever it can.

I can see what I’ve told Facebook on my various settings pages, but I can’t see the “shadow information” that Facebook has learned about me from other people. Other than through taunts from Facebook about what it thinks it knows about me, such as friend suggestions for people it thinks I probably know (“People You May Know”), for example…

…or facts it might have harvested from people’s interactions with me. When did you, along with others, last wish someone “Happy Birthday” using social media, for example?

Even if individuals are learning how to use social media platforms to keep secrets from each other (Secrets and Lies Amongst Facebook Friends – Surprise Party Planning OpSec), those secrets are not being held from Facebook. Indeed, they may be announcing those secrets to it. (Is there a “secret party” event type?! For example, create a secret party event and then as the first option list the person or persons who should not be party to the details so Facebook can help you maintain the secrecy…?)

Hmm… thinks… when you know everything, you can use that information to help subsets of people keep secrets from intersecting sets of people? This is just like a twist on user and group permissions on multi-user computer systems,  but rather than using the system to grant or limit access to resources, you use it to control information flows around a social graph where the users set the access permissions on the information.

This is not totally unlike targeting ads (“dark ads”) to specific user groups, ads that are unseen by anyone outside those groups. Hmmm…

 

See also: Ad-Tech – A Great Way in To OSINT

Keeping Up With What’s Possible – Daily Satellite Imagery from AWS

Via @simonw’s rebooted blog, I  spotted this – Landsat on AWS: “Landsat 8 data is available for anyone to use via Amazon S3. All Landsat 8 scenes are available from the start of imagery capture. All new Landsat 8 scenes are made available each day, often within hours of production.”

What do things like this mean for research, and teaching?

For research, I’m guessing we’ve gone from a state 20 years ago – no data [widely] available – to 10 years ago – available under license, with a delay and perhaps as periodics snapshots – to now – daily availability. How does this imapct on research, and what sorts of research are possible? And how well suited are legacy workflows and tools to supporting work that can make use of daily updated datasets?

For teaching, the potential is there to do activities around a particular dataset that is current, but this introduces all sorts of issues when trying to write and support the activity (eg we don’t know what specific features the data will turn up in the future). We struggle with this anyway trying to write activities that give students an element of free choice or open-ended exploration where we don’t specifically constrain what they do. Which is perhaps why we tend to be so controlling – there is little opportunity for us to respond to something a student discovers for themselves.

The realtime-ish ness of data means we could engage students with contemporary issues, and perhaps enthuse them about the potential of working with datasets that we can only hint at or provide a grounding for in the course materials. There are also opportunities for introducing students to datasets and workflows that they might be able to use in their workplace, and as such act as a vector for getting new ways of working out of the Academy and out of the tech hinterland that the Academy may be aware of, and into more SMEs (helping SMEs avail themselves of emerging capabilities via OUr students).

At a more practical level, I wonder, if OU academics (research or teaching related) wanted to explore the LandSat 8 data on AWS, would they know how to get started?

What sort of infrastructure, training or support do we need to make this sort of stuff accessible to folk who are interested in exploring it for the first time (other than Jupyter notebooks, RStudio, and Docker of course!;-) ?

PS Alan Levine /@cogdog picks up on the question of what’s possible now vs. then: http://cogdogblog.com/2017/11/landsat-imagery-30-years-later/. I might also note: this is how the blogosphere used to work on a daily basis 10-15 years ago…