Category: Anything you want

Geographical Rights Management, Mesh based Surveillance, Trickle-Down and Over-Reach

Every so often there’s a flurry of hype around the “internet of things”, but in many respects it’s already here – and has been for several decades. I remember as a kind being intrigued by some technical documents describing some telemetry system or other that remote water treatment plants used to transmit status information back to base. And I vaguely remember from a Maplin magazine around the time an article or two about what equipment you needed to listen in on, and decode, the radio chatter of all manner of telemetry systems.

Perhaps the difference now is a matter of scale – it’s easier to connect to the network, comms are bidirectional (you can receive as well as transmit information), and with code you can effect change on receipt of a message. The tight linkage between software and hardware – bits controlling atoms – also means that we can start to treat more and more things as “plant” whose behaviour we can remotely monitor, and govern.

A good example of how physical, consumer devices can already be controlled – or at least, disabled – by a remote operator is described in a New York Times article that crossed my wires last week, Miss a Payment? Good Luck Moving That Car, which describes how “many subprime borrowers [… in the US] must have their car outfitted with a so-called starter interrupt device, which allows lenders to remotely disable the ignition. Using the GPS technology on the devices, the lenders can also track the cars’ location and movements.” As the loan payment due date looms, it seems that some devices also emit helpful beeps to remind you…. And if your car loan agreement stipulates you’ll only drive within a particular area, I imagine that you could find it’s been geofenced. (A geofence is geographical boundary line that can be used to detect whether a GPS tracked device has passed into, or exited from, a particular region. When used to disable a device that leaves – or enters – a particular area, as for example drones flying into downtown Washington, we might consider it a form “location based management” (or “geographical rights management (GRM)”?!) that can disable activity in a particular location where someone who claims to control use of that device in that space actually exerts their control. (Think: DRM for location…))

One of the major providers “starter interrupt devices” is a company called PassTime (product list). Their products include:

  • PassTime Plus, the core of their “automated collection technology”.
  • Trax: “PassTime TRAX is the entry level GPS tracking product”. Includes: Pin point GPS location service, Up to Six (6) simultaneous Geo-fences.
  • PassTime GPS: “provides asset protection at an economical price while utilizing the same hardware and software platform of PassTime’s Elite Pro line of products. GPS tracking and remote vehicle disable features offer customers tools for a swift recovery if needed.” Includes: Pin point GPS location service, Remote vehicle disable option, Tow-Detect Notification, Device Tamper Notification, Up to Six (6) simultaneous Geo-fences, 24-Hour Tracking, Automatic Location Heartbeat
  • Elite-Pro: “the ultimate combination of GPS functionality and Automated Collection Technology”. Includes the PassTime GPS features but also mentions “Wireless Command Delivery”.

PassTime seem to like the idea of geofences so much they have patents in related technologies: PassTime Awarded Patent for Geo-Fence and Tamper Notice (US Patent: 8018329). You can find other related patents by looking up other patents held by the inventors (for example…).

You’ll be glad to know that PassTime have UK partners… in the form of The Car Finance Company, who are apparently “the world’s largest user and first company in the UK to start fitting Payment Reminder Technology to your new car”. Largest user?! According to a recent [March 12, 2015] press release announcing an extension to their agreement that “will bring 70,000 payment assurance and telematics devices to the United Kingdom”.

Here’s how The Car Finance Company spin it: The Passtime system helps remind you when your repayments are due so you can ensure you stay on track with your loan and help repair and rebuild your credit. The device is only there to help you keep your repayments up to date, it doesn’t affect your car nor does it monitor the way you drive. From the recent press release, “PassTime has been supplying Payment Assurance and GPS devices to The Car Finance Company since 2009″ (my emphasis). I’m not sure if that means the PassTime GPS (with the starter interrupt) or the Trax device? If I was a journalist, rather than a blogger, I’d probably phone them to try to clarify that…

In passing, whilst searching for providers of automotive GPS trackers in the UK (and there are lots of them – search on something like GPS fleet management, for example…) I came across this rather intrusive piece of technology, The TRACKER Mesh Network, which “uses vehicles fitted with TRACKER Locate and TRACKER Plant to pick up reply codes from stolen vehicles with an activated TRACKER unit making them even easier to locate and recover”. Which is to say, this company has an ad hoc, mobile, distributed network of sensors spread across the UK road network that listen out for each other and opportunistically track each other. It’s all good, though:

“The TRACKER Mesh Network will enable the police to extend the network of ‘eyes and ears’ to identify and locate stolen vehicles more effectively using advanced technology and allow us to stay one step ahead of criminals who are becoming more and more adept at stealing cars. This is a real opportunity for the motoring public to help us clamp down on car thieves and raises public confidence in our ability to recover their possessions and bring the offenders to justice.”

(By the by, previous notes on ANPR – Automatic Number Plate Recognition. Also, note the EU eCall accident alerting system that automatically calls for help if you have a car accident [about, UK DfT eCall cost/benefit analysis].)

This conflation of commercial and police surveillance is… to be expected. But the data’s being collected, and it won’t go away. Snowden revelations revealed the scope of security service data collection activities, and chunks of that data won’t be going away either. The scale of the data collection is such that it’s highly unlikely that we’re all being actively tracked or that this data will ever meaningfully contribute to the detection of conspiracies, but it can and will be used post hoc to create paranoid data driven fantasies about who could have have met whom, when, discussed what, and so on.

I guess where we can practically start to get concerned is in considering the ‘trickle down’ way in which access to this data will increasingly be opened up, and/or sold, to increasing numbers of agencies and organisations, both public and private. As Ed Snowden apparently commented in a session as SXSW (Snowden at SXSW: Be very concerned about the trickle down of NSA surveillance to local police), “[t]hey’ve got everything. The question becomes, Now they’re empowered. They can leak [this stuff]. It does happen at the local level. These capabilities are created. High tech. Super secret. But they inevitably bleed over to law enforcement. When they’re brand new they’re only used in the extremes. But as that transition happens, more and more people get access, they use it in newer and more and more expansive and more abusive ways.”

(Trickle down – or over-reach – applies to legislation too. For example, from a story widely reported in April, 2008: Half of councils use anti-terror laws to spy on ‘bin crimes’, although the legality of such practices was challenged: Councils warned over unlawful spying using anti-terror legislation and guidance brought in in November 2012 that required local authorities to obtain judicial approval prior to using covert techniques. (I realise I’m in danger here of conflating things not specifically related to over-reach on laws “intended” to be limited to anti-terrorism associated activities (whatever they are) with over-reach…) Other reviews: Lords Constitution Committee – Second Report – Surveillance: Citizens and the State (Jan 2009), Big Brother Watch on How RIPA has been used by local authorities and public bodies and Cataloguing the ways in which local authorities have abused their covert surveillance powers. I’m guessing a good all round starting point would be the reports of the Independent Reviewer of Terrorism Legislation.)

When it comes to processing large amounts of data, finding meaningful, rather than spurious, connections connections between things can be hard… (Correlation is not causation, right?, as Spurious Correlations wittily points out…;-)

What is more manageable is dumping people onto lists and counting things… Or querying specifics. A major problem with the extended and extensive data collection activities going on at the moment is that access to the data to allow particular queries to be made will be extended. The problem is not that all your data is being collected now, the issue is that post hoc searches over it it could be made by increasing numbers of people in the future. Like bad tempered council officers having a bad day, or loan company algorithms with dodgy parameters.

PS Schneier on connecting the dots.. Why Mass Surveillance Can’t, Won’t, And Never Has Stopped A Terrorist

So What Can Text Analysis Do for You?

Despite believing we can treat anything we can represent in digital form as “data”, I’m still pretty flakey on understanding what sorts of analysis we can easily do with different sorts of data. Time series analysis is one area – the pandas Python library has all manner of handy tools for working with that sort of data that I have no idea how to drive – and text analysis is another.

So prompted by Sheila MacNeill’s post about textexture, which I guessed might be something to do with topic modeling (I should have read the about, h/t @mhawksey), here’s a quick round up of handy things the text analysts seem to be able to do pretty easily…

Taking the lazy approach, I has a quick look at the CRAN natural language processing task view to get an idea of what sort of tool support for text analysis there is in R, and a peek through the NLTK documentation to see what sort of thing we might be readily able to do in Python. Note that this take is a personal one, identifying the sorts of things that I can see I might personally have a recurring use for…

First up – extracting text from different document formats. I’ve already posted about Apache Tika, which can pull text from a wide range of documents (PDFs, extract text from Word docs, extract text from images), which seems to be a handy, general purpose tool. (Other tools are available, but I only have so much time, and for now Tika seems to do what I need…)

Second up, concordance views. The NLTK docs describe concordance views as follows: “A concordance view shows us every occurrence of a given word, together with some context.” So for example:

concordance

This can be handy for skimming through multiple references to a particular item, rather than having to do a lot of clicking, scrolling or page turning.

How about if we want to compare the near co-occurrence of words or phrases in a document? One way to do this is graphically, plotting the “distance” through the text on the x-axis, and then for categorical terms on y marking out where those terms appear in the text. In NLTK, this is referred to as a lexical dispersion plot:

lexical_dispersion

I guess we could then scan across the distance axis using a windowing function to find terms that appear within a particular distance of each other? Or use co-occurrence matrices for example (eg Co-occurrence matrices of time series applied to literary works), perhaps with overlapping “time” bins? (This could work really well as a graph model – eg for 20 pages, set up page nodes 1-2, 2-3, 3-4,.., 18-19, 19-20, then an actor node for each actor, connecting actors to page nodes for page bins on which they occur; then project the bipartite graph onto just the actor nodes, connecting actors who were originally to the same page bin nodes.)

Something that could be differently useful is spotting common sentences that appear in different documents (for example, quotations). There are surely tools out there that do this, though offhand I can’t find any..? My gut reaction would be to generate a sentence list for each document (eg using something like the handy looking textblob python library), strip quotation marks and whitespace, etc, sort each list, then run a diff on them and pull out the matched lines. (So a “reverse differ”, I think it’s called?) I’m not sure if you could easily also pull out the near misses? (If you can help me out on how to easily find matching or near matching sentences across documents via a comment or link, it’d be appreciated…:-)

The more general approach is to just measure document similarity – TF-IDF (Term Frequency – Inverse Document Frequency) and cosine similarity are key phrases here. I guess this approach could also be applied to sentences to find common ones across documents, (eg SO: Similarity between two text documents), though I guess it would require comparing quite a large number of sentences (for ~N sentences in each doc, it’d require N^2 comparisons)? I suppose you could optimise by ignoring comparisons between sentences of radically different lengths? Again, presumably there are tools that do this already?

Unlike simply counting common words that aren’t stop words in a document to find the most popular words in a doc, TF-IDF moderates the simple count (the term frequency) with the inverse document frequency. If a word is popular in every document, the term frequency is large and the document frequency is large, so the inverse document frequency (one divided by the document frequency) is small – which in turn gives a reduced TF-IDF value. If a term is popular in one document but not any other, the document frequency is small and so the relative document frequency is large, giving a large TF-IDF for the term in the rare document in which it appears. TF-IDF helps you spot words that are rare across documents or uncommonly frequent within documents.

Topic models: I thought I’d played with these quite a bit before, but if I did the doodles didn’t make it as far as the blog… The idea behind topic modeling is generate a set of key terms – topics – that provide an indication of the topic of a particular document. (It’s a bit more sophisticated than using a count of common words that aren’t stopwords to characterise a document, which is the approach that tends to be used when generating wordclouds…) There are some pointers in the comments to A Quick View Over a MASHe Google Spreadsheet Twitter Archive of UKGC12 Tweets about topic modeling in R using the R topicmodels package; this ROpenSci post on Topic Modeling in R has code for a nice interactive topic explorer; this notebook on Topic Modeling 101 looks like a handy intro to topic modeling using the gensim Python package.

Automatic summarisation/text summary generation: again, I thought I dabbled with this but there’s no sign of it on this blog:-( There are several tools and recipes out there that will generate text summaries of long documents, but I guess they could be hit and miss and I’d need to play with a few of them to see how easy they are to use and how well they seem to work/how useful they appear to be. The python sumy package looks quite interesting in this respect (example usage) and is probably where I’d start. A simple description of a basic text summariser can be found here: Text summarization with NLTK.

So – what have I missed?

PS In passing, see this JISC review from 2012 on the Value and Benefits of Text Mining.

Open Practice and My Academic Philosophy, Sort Of… Erm, Maybe… Perhaps..?!

Having got my promotion case through the sub-Faculty level committee (with support and encouragement from senior departmental colleagues), it’s time for another complete rewrite to try to get it though the Faculty committee. Guidance suggests that it is not inappropriate – and may even be encouraged – for a candidate to include something about their academic philosophy, so here are some scribbled thoughts on mine…

One of the declared Charter objects (sic) of the Open University is "to promote the educational well-being of the community generally", as well as " the advancement and dissemination of learning and knowledge". Both as a full-time PhD student with the OU (1993-1997), and then as an academic (1999-), I have pursued a model of open practice, driven by the idea of learning in public, with the aim of communicating academic knowledge into, and as part of, wider communities of practice, modeling learning behaviour through demonstrating my own learning processes, and originating new ideas in a challengeable and open way as part of my own learning journey.

My interest in open educational resources is in part a subterfuge, driven by a desire that educators be more open in demonstrating their own learning and critical practices, including the confusion and misconceptions they grapple with along the way, rather than being seen simply as professors of some sort of inalienable academic truth.

My interest in short course development is based on the belief that for the University to contribute effectively to continued lifelong education and professional development, we need to have offerings that are at an appropriate level of granularity as well as academic level. Degrees represent only one - early part - of that journey. Learners are unlikely to take more than one undergraduate degree in their lifetime, but there is no reason why they should not continue to engage in learning throughout their life. Evidence from the first wave of MOOCs suggests that many participants in those courses were already graduates, with an appreciation of the values of learning and the skills to enable them to engage with those offerings. The characteristation of MOOCs as cMOOCs xMOOCs (traditional course style offerings) or the looser networked modeled "connectivist MOOCs", xMOOCs cMOOCs, [H/T @r3becca in the comments;-)] represent different educational philosophies: the former may cruelly be described as being based on a model in which the learner expects to be taught (and the instructors expect to profess), whereas the latter requires that participants are engaged in a more personal, yet still collaborative, learning journey, where it is up to each participant to make sense of the world in an open and public way, informed and aided, but also challenged, by other participants. That's how I work every day. I try to make sense of the world to myself, often for a purpose, in public.

Much of my own learning is the direct result of applied problem solving. I try to learn something every day, often as the result of trying to do something each day that I haven't been able to do before. The OUseful.info blog is my own learning diary and a place I can look to refer to things I have previously learned. The posts are written in a way that reinforces my own learning, as a learning resource. The posts often take longer to write than the time taken to discover or originate the thing learned, because in them I try to represent a reflection and retelling of the rationale for the learning event and the context in which it arose: a problem to be solved, my state of knowledge at the time, the means by which I came to make sense of the situation in order to proceed, and the learning nugget that resulted. The thing I can see or do now but couldn't before. Capturing the "I couldn't do X because of Y but now I can, by doing Z" supports a similar form of discovery as the one supported by question and answer sites: the content is auto-optimised to include both naive and expert information, which aids discovery. (It often amused me that course descriptions would often be phrased in the terms and language you might expect to know having completed the course. Which doesn't help the novice discover it a priori, before they have learned those keywords, concepts or phrases that the course will introduce them to...). The posts also try to model my own learning process, demonstrating the confusion, showing where I had a misapprehension of just plain got it wrong. The blog also represents a telling of my own learning journey over an extended period of time, and such may be though of as an uncourse, something that could perhaps be looked at post hoc as a course but that was originated as my own personal learning journey unfolded.

Hmmm… 1500 words for the whole begging letter, so I need to cut the above down to a sentence…

Code as Magic, and the Vernacular of Data Wrangling Verbs

It’s been some time now since I drafted most of my early unit contributions to the TM351 Data management and analysis course. Part of the point (for me) in drafting that material was to find out what sorts of thing we actually wanted to say and help identify the sorts of abstractions we wanted to then build a narrative around. Another part of this (for me) means exploring new ways of putting powerful “academic” ideas and concepts into meaningful, contexts; finding new ways to describe them; finding ways of using them in conjunction with other ideas; or finding new ways of using – or appropriating them – in general (which in turn may lead to new ways of thinking about them). These contexts are often process based, demonstrating how we can apply the ideas or put them to use (make them useful…) or use the ideas to support problem identification, problem decomposition and problem solving. At heart, I’m more of a creative technologist than a scientist or an engineer. (I aspire to being an artist…;-)

Someone who I think has a great take on conceptualising the data wrangling process – in part arising from his prolific tool building approach in the R language – is Hadley Wickham. His recent work for RStudio is built around an approach to working with data that he’s captured as follows (e.g. “dplyr” tutorial at useR 2014 , Pipelines for Data Analysis):

Hadley_Wickham_dataAnalysisProcess2

Following an often painful and laborious process of getting data into a state where you can actually start to work with it), you can then enter into an iterative process of transforming the data into various shapes and representations (often in the sense of re-presentations) that you can easily visualise or build models from. (In practice, you may have to keep redoing elements of the tidy step and then re-feed the increasingly cleaned data back into the sensemaking loop.)

Hadley’s take on this is that the visualisation phase can spring surprises on you but doesn’t scale very well, whilst the modeling phase scales but doesn’t surprise you.

To support the different phases of activity, Hadley has been instrumental in developing several software libraries for the R programming language that are particular suited to the different steps. (For the modeling, there are hundreds of community developed and often very specialised R libraries for doing all manner of weird and wonderful statistics…)

Hadley_Wickham_dataAnalysisProcess

In many respects, I’ve generally found the way Hadley has presented his software libraries to be deeply pragmatic – the tools he’s developed are useful and in many senses naturalistic; they help you do the things you need to do in a way that makes practical sense. The steps they encourage you to take are natural ones, and useful ones. They are the sorts of tools that implement the sorts of ideas that come to mind when you’re faced with a problem and you think: this is the sort of thing I need (to be able) to do. (I can’t comment on how well implemented they are; I suspect: pretty well…)

Just as the data wrangling process diagram helps frame the sorts of things you’re likely to do into steps that make sense in a “folk computational” way (in the sense of folk computing or folk IT (also here), a computational correlate to notions of folk physics, for example), Hadley also has a handy diagram for helping us think about the process of solving problems computationally in a more general, problem solving sense:

Hadley_Wickham_programming

A cognitive think it step, identifying a problem, and starting to think about what sort of answer you want from it, as well as how you might start to approach it; a describe it step, where you describe precisely what it is you want to do (the sort of step where you might start scribbling pseudo-code, for example); and the computational do it step where the computational grunt work is encoded in a way that allows it to actually get done by machine.

I’ve been pondering my own stance towards computing lately, particularly from my own context of someone who sees computery stuff from a more technology, tool building and tool using context, (that is, using computery things to help you do useful stuff), rather than framing it as a purer computer science or even “trad computing” take on operationalised logic, where the practical why is often ignored.

So I think this is how I read Hadley’s diagram…

Hadley_Wickham_programming_ann

Figuring out what the hell it is you want to do (imagining, the what for a particular why), figuring out how to do it (precisely; the programming step; the how); hacking that idea into a form that lets a machine actually do it for you (the coding step; the step where you express the idea in a weird incantation where every syllable has to be the right syllable; and from which the magic happens).

One of the nice things about Hadley’s approach to supporting practical spell casting (?!) is that transformation or operational steps his libraries implement are often based around naturalistic verbs. They sort of do what they say on the tin. For example, in the dplyr toolkit, there are the following verbs:

Hadley_Wickham_dplyr_5verbs_groupby

These sort of map onto elements (often similarly named) familiar to anyone who has used SQL, but in a friendlier way. (They don’t SHOUT AT YOU for a start.) It almost feels as if they have been designed as articulations of the ideas that come to mind when you are trying to describe (precisely) what it is you actually want to do to a dataset when working on a particular problem.

In a similar way, the ggvis library (the interactive chart reinvention of Hadley’s ggplot2 library) builds on the idea of Leland Wilkinson’s “The Grammar of Graphics” and provides a way of summoning charts from data in an incremental way, as well as a functionally and grammatically coherent way. The words the libraries use encourage you to articulate the steps you think you need to take to solve a problem – and then, as if by magic, they take those steps for you.

If programming is the meditative state you need to get into to cast a computery-thing spell, and coding is the language of magic, things like dplyr help us cast spells in the vernacular.

Quick Note – Apache Tika (Document Text Extraction Service)

I came across Apache Tika a few weeks ago, a service that will tell you what pretty much any document type is based on it’s metadata, and will have a good go at extracting text from it.

With a prompt and a 101 from @IgorBrigadir, it was pretty easier getting started with it – sort of…

First up, I needed to get the Apache Tika server running. As there’s a containerised version available on dockerhub (logicalspark/docker-tikaserver), it was simple enough for me to fire up a server in a click using tutum (as described in this post on how to run OpenRefine in the cloud in just a couple of clicks and for a few pennies an hour; pretty much all you need to do is fire up a server, start a container based on logicalspark/docker-tikaserver, and tick to make the port public…)

As @IgorBrigadir described, I could ping the server on curl -X GET http://example.com:9998/tika and send a document to it using curl -T foo.doc http://example.com:9998/rmeta.

His suggested recipe for using python requests library borked for me – I couldn’t get python to open the file to get the data bits to send to the server (file encoding issues; one reason for using Tika is it’ll try to accept pretty much anything you throw at it…)

I had a look at pycurl:

!apt-get install -y libcurl4-openssl-dev
!pip3 install pycurl

but couldn’t make head or tail of how to use it: the pycurl equivalant of curl -T foo.doc http://example.com:9998/rmeta can’t be that hard to write, can it? (Translations appreciated via the comments…;-)

Instead I took the approach of dumping the result of a curl request on the command line into a file:

!curl -T Text/foo.doc http://example.com:9998/rmeta > tikatest.json

and then grabbing the response out of that:

import json
json.load(open('tikatest.json',encoding='utf-8'))[0]

Not elegant, and far from ideal, but a stop gap for now.

Part of the response from the Tika server is the text extracted from the document, which can then provide the basis for some style free text analysis…

I haven’t tried with any document types other than crappy old MS Word .doc formats, but this looks like it could be a really easy tool to use.

And with the containerised version available, and tutum and Digital Ocean to hand, it’s easy enough to fire up a version in the cloud, let alone my desktop, whenever I need it:-)

Whose Browser (or Phone, or Drone?!) Is It Anyway?

I’m not sure how many Chrome users follow any of the Google blogs that occasionally describe forthcoming updates to Google warez, but if you don’t you perhaps don’t realise quite how frequently things change. My browser, for example, is at something like version 40, even though I never consciously update it.

One thing I only noticed recently that a tab appeared in the top right hand of the browser showing that I’m logged in (to the browser) with a particular Google account. There doesn’t actually appear to be an option to log out – I can switch user or go incognito – and I’m not sure I remember even consciously logging in to it (actually, maybe a hazy memory, when I wanted to install a particular extension) and I have no idea what it actually means for me to be logged in?

Via the Google Apps Update blog, I learned today that being logged in to the browser will soon support is seemless synching of my Google docs into my Chrome browser environment (Offline access to Google Docs editors auto-enabled when signing into Chrome browser on the web). Following a pattern popularised by Apple, Google are innovating on our behalf and automatically opting us in to behaviours it thinks make sense for us. So just bear that in mind when you write a ranty resignation letter in Google docs and wonder why it’s synched to your work computer on your office desk:

Note that Google Apps users should not sign into a Chrome browser on public/non-work computers with their Google Apps accounts to avoid unintended file syncing.

If you actually have several Google apps accounts (for example, I have a personal one, and a couple of organisational ones: an OU one, an OKF one), I assume that the only docs that are synched are the ones on an account that matches the account I have signed in to in the browser. That said, synch permissions may be managed centrally for organisational apps accounts:

Google Apps admins can still centrally enable or disable offline access for their domain in the Admin console .. . Existing settings for domain-level offline access will not be altered by this launch.

I can’t help but admit that even though I won’t have consciously opted in to this feature, just like I don’t really remember logging in to Chrome on my desktop (how do I log out???) and I presumably agreed to something when I installed Chrome to let it keep updating itself without prompting me, I will undoubtedly find it useful one day: on a train, perhaps, when trying to update a document I’d forgotten to synch. It will be so convenient I will find it unremarkable, not noticing I can now do something I couldn’t do as easily before. Or I might notice, with a “darn, I wish I’d..” then “oh, cool, [kewel…] I can…).

“‘Oceania has always been at war with Eastasia.'” [George Orwell, 1984]

Just like when – after being sure I’d disabled or explicitly no; opted in to any sort of geo-locating or geo-tracking behaviour on my Android phone, I found I must have left a door open somewhere (or been automatically opted in to something I hadn’t appreciated when agreeing to a particular update (or by proxy, agreeing to allow something to update itself automatically and without prompting and with implied or explicit permission to automatically opt me in to new features….) and found I could locate my misplaced phone using the Android Device Manager (Where’s My Phone?).

This idea of allowing applications to update themselves in the background and without prompting is something we have become familiar with in many web apps, and in desktop apps such as Google Chrome, though many apps do still require the user to either accept the update or take an even more positive action to install an update when notified that one is available. (It seems that ever fewer apps require you to specifically search for updates…)

In the software world, we have gone from a world where the things we buy we immutable, to one where we could search for and install updates (eg to operating systems of software applications), then accept to install updates when alerted to the fact, to automatically (and invisibly) accepting updates.

In turn, many physical devices have gone from being purely mechanical affairs, to electro-mechanical ones, to logical-electro-mechanical devices (for example, that include logic elements hardwired into silicon), to ones containing factory programmable hardware devices (PROMs, programmable Read Only Memories), to devices that run programmable and then re</programmable firmware (that is to say, software).

If you have a games console, a Roku or MyTV box, or Smart TV, you’ve probably already been prompted to get a (free) online update. I don’t know, but could imagine, new top end cars having engine management system updates at regular service events.

However, one thing perhaps we don’t fully appreciate is that these updates can also be used to limit functionality that our devices previously had. If the updates are done seemlessly (without permission, in the background) this may come as something as a surprise. [Cf. the complementary issue of vendors having access to “their” content on “your” machine, as described here by the Guardian: Amazon wipes customer’s Kindle and deletes account with no explanation]

A good example of loss of functionality arising by an (enforced, though self-applied) firmware update was reported recently in the context of hobbiest drones:

On Wednesday, SZ DJI Technology, the Chinese company responsible for the popular DJI Phantom drones that online retailers sell for less than $500, announced that it had prepared a downloadable firmware update for next week that will prevent drones from taking off in restricted zones and prevent flight into those zones.

Michael Perry, a spokesman for DJI, told the Guardian that GPS locating made such an update possible: “We have been restricting flight near airports for almost a year.”

“The compass can tell when it is near a no-fly zone,” Perry said. “If, for some reason, a pilot is able to fly into a restricted zone and then the GPS senses it’s in a no-fly zone, the system will automatically land itself.”

DJI’s new Phantom drones will ship with the update installed, and owners of older devices will have to download it in order to receive future updates.

Makers of White House drone offer fix to bar Phantom menace from no-fly zones

What correlates might be applied to increasingly intelligent cars, I wonder?! Or at the other extreme, phones..?

PS How to log out of Chrome You need to administer yourself… From the Chrome Preferences Settings (sic), Disconnect your Google account.

Note that you have to take additional action to make sure that you remove all those synched presentations you’d prepared for job interviews at other companies from the actual computer…

Take care out there…!;-)

Paying for Dropbox and Other Useful Bits… (The Cost of Doing Business…)

A couple of years ago or so, Dropbox ran a promotion for academic users granting 15GB of space. Yesterday, I got an email:

As part of your school’s participation in Space Race, you received 15 GB of additional Dropbox space. The Space Race promotional period expires on March 4, 2015, at which point your Dropbox limit will automatically return to 5 GB.

As a friendly reminder, you’re currently using 14.6 GB of Dropbox space. If you’re over your 5 GB limit after March 4, you’ll no longer be able to save new photos, videos, and docs to Dropbox.

Need more space? Dropbox Pro gives you 1 TB of space to keep everything safe, plus advanced sharing controls, remote wipe for lost devices, and priority support. Upgrade before March 4 and we’ll give you 30% off your first year.

My initial thought was to tweet:

but then I thought again… The discounted price on a monthly payment plan is £5.59/month which on PayPal converted this month to $8.71. I use Dropbox all the time, and it forms part of my workflow for using Leanpub. As it’s the start of the month, I received a small royalty payment for the Wrangling F1 Data With R book. The Dropbox fee is about amount I’m getting per book sold, so it seems churlish not to subscribe to Dropbox – it is part of the cost of doing business, as it were.

The Dropbox subscription gets me 1TB, so this also got me thinking:

  • space is not now an issue, so I can move the majority of my files to Dropbox, not just a selection of folders;
  • space is not now an issue, so I can put all my github clones into Dropboxl
  • space is now now an issue, so though it probably goes against terms of service, I guess I could set up toplevel “family member” folders and we could all share the one subscription account, just selectively synching our own folders?

In essence, I can pretty much move to Dropbox (save for those files I don’t want to share/expose to US servers etc etc; just in passing, one thing Dropbox doesn’t seem to want to let me do is change the account email to another email address that I have another Dropbox account associated with. So I have a bit of an issue with juggling accounts…)

When I started my Wrangling F1 Data With R experiment, the intention was always to make use of any royalties to cover the costs associated with that activity. Leanpub pays out if you are owed more than $40 collected in the run up to 45 days ahead of a payment date (so the Feb 1st payout was any monies collected up to mid-December and not refunded since). If I reckon on selling 10 books a month, that gives me about $75 at current running. Selling 5 a month (so one a week) means it could be hit or miss whether I make the minimum amount to receive a payment for that month. (I could of course put the price up. Leanpub lets you set a minimum price but allows purchasers to pay what they want. I think $20 is the highest amount paid for a copy I’ve had to date, which generated a royalty of $17.50 (whoever that was – thank you :-)) You can also give free or discounted promo coupons away.) As part of the project is to explore ways of identifying and communicating motorsport stories, I’ve spent royalties so far on:

  • a subscription to GP+ (not least because I aspire to getting a chart in there!;-);
  • a subscription to the Autosport online content, in part to gain access to forix, which I’d forgotten is rubbish;
  • a small donation to sidepodcast, because it’s been my favourite F1 podcast for a long time.

Any books I buy in future relating to sports stats or motorsport will be covered henceforth from this pot. Any tickets I buy for motorsport events, and programmes at such events, will also be covered from this pot. Unfortunately, the price of an F1 ticket/weekend is just too much. A Sky F1 Channel subscription or day passes is also ruled out because I can’t for the life of me work out how much it’ll cost or how to subscribe; but I suspect it’ll be more than the £10 or so I’d be willing to pay per race (where race means all sessions in a race weekend). If my F1 iOS app subscription needs updating that’ll also count. Domain name registration (for example, I recently bought f1datajunkie.com) is about £15/$25 a year from my current provider. (Hmm, that seems a bit steep?) I subscribe to Racecar Engineering (£45/$70 or so per year), the cost of which will get added to the mix. A “big ticket” item I’m saving for (my royalties aren’t that much) on the wants list is a radio scanner to listen in to driver comms at race events (I assume it’d work?). I’d like to be able to make a small regular donation to help keep the ergast site on, but can’t see how to… I need to bear in mind tax payments, but also consider the above as legitimate costs of a self-employed business experiment.

I also figure that as an online publishing venture, any royalties should also go to supporting other digital tools I make use of as part of it. Some time ago, I bought in to the pinboard.in social bookmarking service, I used to have a flickr pro subscription (hmm, I possibly still do? Is there any point…?!) and I spend $13 a year with WordPress.com on domain mapping. In the past I have also gone ad-free ($30 per year). I am considering moving to another host such as Squarespace ($8 per month), because WordPress is too constraining, but am wary of what the migration will involve and how much will break. Whilst self-hosting appeals, I don’t want the grief of doing my own admin if things go pear shaped.

I’m a heavy user of RStudio, and have posted a couple of Shiny apps. I can probably get by on the shinyapps.io free plan for a bit (10 apps) – just – but the step up to the basic plan at $39 a month is too steep.

I used to use Scraperwiki a lot, but have moved away from running any persistent scrapers for some time now. morph.io (which is essentially Scraperwiki classic) is currently free – though looks like a subscription will appear at some point – so I may try to get back into scraping in the background using that service. The Scraperwiki commercial plan is $9/month for 10 scrapers, $29 per month for 100. I have tended in the past to run very small scrapers, which means the number of scrapers can explode quickly, but $29/month is too much.

I also make use of github on a free/open plan, and while I don’t currently have any need for private repos, the entry level micro-plan ($7/month) offers 5. I guess I could use a (private?) github rather than Dropbox for feeding Leanpub, so this might make sense. Of course, I could just treat such a subscription as a regular donation.

It would be quite nice to have access to IPython notebooks online. The easiest solution to this is probably something like wakari.io, which comes in at $25/month, which again is a little bit steep for me at the moment.

In my head, I figure £5/$8/month is about one book per month, £10/$15 is two, £15/$20 is three, £25/$40 is 5. I figure I use these services and I’m making a small amount of pin money from things associated with that use. To help guarantee continuity in provision and maintenance of these services, I can use the first step of a bucket brigade style credit apportionment mechanism to redistribute some of the financial benefits these services have helped me realise.

Ideally, what I’d like to do is spend royalties from 1 book per service per month, perhaps even via sponsored links… (Hmm, there’s a thought – “support coupons” with minimum prices set at the level to cover the costs of running a particular service for one month, with batches of 12 coupons published per service per year… Transparent pricing, hypothecated to specific costs!)

Of course, I could also start looking at running my own services in the cloud, but the additional time cost of getting up and running, as well as hassle of administration, and the stress related to the fear of coping in the face of attack or things properly breaking, means I prefer managed online services where I use them.