Archive for the ‘Thinkses’ Category
I’ve been in a ranty mood all day today, so to finish it off, here are some thoughts about how we can start to use #opendata to hold companies to account. The trigger was finding a dataset released by the Care Quality COmmission (CQC) listing the locations of premises registered with the CQC, and the operating companies of those locations (early observations on that data here).
The information is useful because it provides a way of generating aggregated lists of companies that are part of the same corporate group (for example, locations operated by Virgin Care companies, or companies operated by Care UK). When we have these aggregation lists, it means we can start to run the numbers across all the companies in a corporate group, and get some data back about how the companies that are part of a group are operating in general. The aggregated lists thus provide a basis for looking at the gross behaviour of a particular company. We can then start to run league tables against these companies (folk love league tables, right? At least, they do when it comes to public sector bashing). So we can start to see how the corporate groupings compare against each other, and perhaps also against public providers. Of course, there is a chance that the private groups will be shown to be performing better than public sector bodies, but that could be a useful basis for a productive conversation about why…
So what sorts of aggregate lists can we start to construct? The CQC data allows us to get lists of locations associated with various sorts of care delivery (care home, GP services, dentistry, more specialist services) and identify locations that are part of the same corporate group. For example, I notice that filtering the CQC data to care homes, the following are significant operators (the number relates to the number of locations they operate):
Voyage 1 Limited 273 HC-One Limited 169 Barchester Healthcare Homes Limited 168
When it comes to “brands”, we have the following multiple operators:
BRAND Four Seasons Group 346 BRAND Voyage 279 BRAND BUPA Group 246 BRAND Priory Group 183 BRAND HC-One Limited 169 BRAND Barchester Healthcare 168 BRAND Care UK 130 BRAND Caretech Community Services 118
For these operators, we could start to scrape their most recent CQC reports and build up a picture of how well the group as a whole is operating. In the same way that “armchair auditors” (whatever they are?!) are supposed to be able to hold local councils to account, perhaps they can do the same for companies, and give the directors a helping hand… (I would love to see open data activists buying a share and going along to a company shareholder meeting to give some opendata powered grief ;-)
Other public quality data sites provide us with hints at ways of generating additional aggregations. For example, from the Food Standards Agency, we can search on ‘McDonalds’ as a restaurant to bootstrap a search into premises operated by that company (although we’d probably also need to add in searches across takeaways, and perhaps also look for things like ‘McDonalds Ltd” to catch more of them?).
Note – the CQC data provides a possible steer here for how other data sets might be usefully extended in terms of the data they make available. For example, having a field for “operating company” or “brand” would make for more effective searches across branded or operated food establishments. Having company number (for limited companies and LLPs etc) provided would also be useful for disambiguation purposes.
Hmm, I wonder – would it make sense to start to identify the information that makes registers useful, and that we should start to keep tabs on? We could then perhaps start lobbying for companies to provide that data, and check that such data is being and continues to be collected? It may not be a register of beneficial ownership, but it would provide handy cribs for trying to establish what companies are part of a corporate grouping…
(By the by, picking up on Owen Boswarva’s post The UK National Information Infrastructure: It’s time for the private sector to release some open data too, these registers provide a proxy for the companies releasing certain sorts of data. For example, we can search for ‘Tesco’ as a supermarket on the FSA site. Of course, if companies were also obliged to publish information about their outlets as open data – something you could argue that as a public company they should be required to do, trading their limited liability for open information about where they might exert that right – we could start to run cross-checks (which is the sort of thing real auditors do, right?) and publish complete records of publicly account performance in terms of regulated quality inspections.)
The CQC and Food Standards Agency both operate quality inspection registers, so what other registers might we go to to build up a picture of how companies – particularly large corporate groupings – behave?
The Environment Agency publish several registers, including one detailing enforcement actions, which might be interesting to track, though I’m not sure how the data is licensed? The HSE (Health & Safety Executive) publish various notices by industry sector and subsector, but again, I’m not too clear on the licensing? The Chief Fire Officers Association (CFOA) publish a couple of enforcement registers which look as if they cover some of the same categories as the CQC data – though how easy it would be to reconcile the two registers, I don’t know (and again, I don’t know how the license is actually registered). One thing to bear in mind is that where registers contain personally identifiable information, any aggregations we build that incorporates such data (if we are licensed to build such things) means (I think) that we become data controllers for the purposes of the Data Protection Act (we are not the maintainers and publishers of the public register so we don’t benefit from the exemptions associated with that role).
Looking at the above, I’m starting to think it could be a really interesting exercise to pick some of the care home provider groups and have a go at aggregating any applicable quality scores and enforcement notices from the CQC, FSA, HSE and CFOA (and even the EA if any of their notices apply! Hmm… does any HSCIC data cover care homes at all too?) Coupled with this, a trawl of directors data to see how the separate companies in a group connect by virtue of directors (and what other companies may be indicated by common directors in a group?).
Other areas perhaps worth exploring – farms incorporated into agricultural groups? (Where would be find that data? One register that could be used to partially hold those locations to account may be the public register of pesticide enforcement notices as well as other EA notices?)
As well as registers and are there any other sources of information about companies we can add in to the mix? There’s lots: for limited companies we can pull down company registration details and lists of directors (and perhaps struck off directors) and some accounting information. Data about charities should be available from the Charities Commission. The HSCIC produces care quality indicators for a range of health providers, as well as prescribing data for individual GP practices. Data is also available about some of the medical trials that particular practices are involved in.
At a local council level, local councils maintain and publish a wide variety of registers, including registers of gaming machine licenses, licensed premises and so on. Where the premises are an outlet of a parent corporate group, we may be able to pick up the name of the parent group as the licensee. (Via @OwenBoswarva, it seems the Gambling Commission has a central list of operating license holders and licensed premises.)
Having identified influential corporate players, we might then look to see whether those same bodies are represented on lobbiest groups, such as the EU register of commission expert groups, or as benefactors of UK Parliamentary All Party groups, or as parties to meetings with Ministers etc.
We can also look across all those companies to see how much money the corporate groups are sinking from the public sector, by inspecting who payments are made to in the masses of transparency spending data that councils, government departments, and services such as the NHS publish. (For an example of this, see Spend Small Local Authority Spending Index; unfortunately, the bulk data you need to run this sort of analysis yourself is not openly available – you need to aggregate and clean it yourself.)
Once we start to get data that lists companies that are part of a group, we can start to aggregate open public data about all the companies in the group and look for patterns of behaviour within the groups, as well as across them. Lapses in one part of the group might suggest a weakness in high level management (useful for the financial analysts?), or act as a red flag for inspection and quality regimes.
Hmmm… methinks it’s time to start putting some of this open data to work; but put it to work by focussing on companies, rather than public bodies…
I think I also need to do a little bit of digging around how public registers are licensed? Should they all be licensed OGL by default? And what guidance, if any, is there around how we can make use of such data and not breach the Data Protection Act?
PS via @RDBinns, What do they know about me? Open data on how organisations use personal data, describing some of the things we can find from the data protection notifications published by the ICO [ICO data controller register].
Via Downes, I like this idea of Flipping Bloom’s Taxonomy Triangle which draws on the following inverted pyramid originally posted here: Simplified Bloom’s Taxonomy Visual and comments on a process in which “students are spending the majority of their time in the Creating and Evaluating levels of Bloom’s Taxonomy, and they go down into the lower levels to acquire the information they need when they need it” (from Jonathan Bergmann and Aaron Sams’ Flip Your Classroom: Reach Every Student In Every Class Every Day, perhaps?)
Here’s another example, from a blog post by education consultant Scott Mcleod: Do students need to learn lower-level factual and procedural knowledge before they can do higher-order thinking?, or this one by teacher Shelley Wright: Flipping Bloom’s Taxonomy.
This makes some sort of sense to me, though if you (mistakenly?) insist on reading it as a linear process it lacks the constructivist context that shows how some knowledge and understanding can be used to inform the practice of the playful creating/evaluating/analysing exploratory layer, which might in itself be directed at trying to illuminate a misunderstanding or confusion the learner has with respect to their own knowledge at the understanding level. (In fact, the more I look at any model the more issues I tend to get with it when it comes to actually picking it apart!;-)
As far as “remembering” goes, I think that also includes ‘making up plausible stories or examples” – i.e. constructed “rememberings” (that is, stories) of things that never happened.
[Thinkses in progress – riffing around the idea that transparency is not reporting. This is all a bit confused atm…]
UK Health Secretary Jeremy Hunt was on BBC Radio 4’s Today programme today talking about a new “open and honest reporting culture” for UK hospitals. Transparency, it seems, is about publishing open data, or at least, putting crappy league tables onto websites. I think: not….
The fact that a hospital has “a number” of mistakes may or may not be interesting. As with most statistics, there is little actual information in a single number. As the refrain on the OU/BBC co-produced numbers programme More or Less goes, ‘is it a big number or a small number?’. The information typically lies in the comparison with other numbers, either across time or across different entities (for example, comparing figures across hospitals). But comparisons may also be loaded. For a fair comparison we need to normalise numbers – that is, we need to put them on the same footing.
[A tweet from @kdnuggets comments: ‘The question to ask is not – “is it a big number or a small number?”, but how it compares with other numbers’. The sense of the above is that such a comparison is always essential. A score of 9.5 in a test is a large number when the marks are out of ten, a small one when out of one hundred. Hence the need for normalisation, or some other basis for generating a comparison.]
The above cartoon from web comic XKCD demonstrates this with a comment about how reporting raw numbers on a map often tends to just produce a population map illustrates this well. If 1% of town A with population 1 million has causal incidence [I made that phrase up: I mean, the town somehow causes the incidence of X at that rate] of some horrible X (that is, 10,000 people get it as a result of living in town A), and town B with a population of 50,000 (that is, 5,000 people get X) has a causal incidence of 10%, a simple numbers map would make you fearful of living in town A, but you’d be likely worse off moving to town B.
Sometimes a single number may appear to be meaningful. I have £2.73 in my pocket so I have £2.73 to spend when I go to the beach. But again, there is a need for comparison here. £2.73 needs to be compared against the price of things it can purchase to inform my purchasing decisions.
In the opendata world, it seems that just publishing numbers is taken as transparency. But that’s largely meaningless. Even being able to compare numbers year on year, or month on month, or hospital on hospital, is largely meaningless, even if those comparisons can be suitably normalised. It’s largely meaningless because it doesn’t help me make sense of the “so what?” question.
Transparency comes from seeing how those numbers are used to support decision making. Transparency comes from seeing how this number was used to inform that decision, and why it influenced the decision in that way.
Transparency comes from unpacking the decisions that are “evidenced” by the opendata, or other data not open, or no data at all, just whim (or bad policy).
Suppose a local council spends £x thousands on an out-of area placement several hundred miles away. This may or may not be expensive. We can perhaps look at other placement spends and see that the one hundred of miles away appears to offer good value for money (it looks cheap compared to other placements; which maybe begs the question why those other placements are bing used if pure cost is a factor). The transparency comes from knowing how the open data contributed to the decision. In many cases, it will be impossible to be fully transparent (i.e. to fully justify a decision based on opendata) because there will be other factors involved, such as a consideration of sensitive personal data (clinical decisions based around medical factors, for example).
So what that there are z mistakes in a hospital, for league table purposes – although one thing I might care about is how z is normalised to provide a basis of comparison with other hospitals in a league table. Because league tables, sort orders, and normalisation make the data political. On the other hand – maybe I absolutely do want to know the number z – and why is it that number? (Why is it not z/2 or 2*z? By what process did z come into being? (We have to accept, unfortunately, that systems tend to incur errors. Unless we introduce self-correcting processes. I absolutely loved the idea of error-correcting codes when I was first introduced to them!) And knowing z, how does that inform the decision making of the hospital? What happens as a result of z? Would the same response be prompted if the number was z-1, or z/2? Would a different response be in order if the number was z+1, or would nothing change until it hit z*2? In this case the “comparison” comes from comparing the different decisions that would result from the number being different, or the different decisions that can be made given a particular number. The meaning of the number then becomes aligned to the different decisions that are taken for different values of that number. The number becomes meaningful in relation to the threshold values that the variable corresponding to that number are set at when it comes to triggering decisions.)
Transparency comes not from publishing open data, but from being open about decision making processes and possibly the threshold values or rates of change in indicators that prompt decisions. In many cases the detail of the decision may not be fully open for very good reason, in which case we need to trust the process. Which means understanding the factors involved in the process. Which may in part be “evidenced” through open data.
Going back to the out of area placement – the site hundreds of miles away may have been decided on by a local consideration, such as the “spot price” of the service provision. If financial considerations play a part in the decision making process behind making that placement, that’s useful to know. It might be unpalatable, but that’s the way the system works. But it begs the question – does the cost of servicing that placement (for example, local staff having to make round trips to that location, opportunity cost associated with not servicing more local needs incurred by the loss of time in meeting that requirement) also form part of the financial consideration made during the taking of that decision? The unit cost of attending a remote location for an intervention will inevitably be higher than attending a closer one.
If financial considerations are part of a decision, how “total” is the consideration of the costs?
That is very real part of the transparency consideration. To a certain extent, I don’t care that it costs £x for spot provision y. But I do want to know that finance plays a part in the decision. And I also want to know how the finance consideration is put together. That’s where the transparency comes in. £50 quid for an iPhone? Brilliant. Dead cheap. Contract £50 per month for two years. OK – £50 quid. Brilliant. Or maybe £400 for an iPhone and a £10 monthly contract for a year. £400? You must be joking. £1250 or £520 total cost of ownership? What do you think? £50? Bargain. #ffs
Transparency comes from knowing the factors involved in a decision. Transparency comes from knowing what data is available to support those decisions, and how the data is used to inform those decisions. In certain cases, we may be able to see some opendata to work through whether or not the evidence supports the decision based on the criteria that are claimed to be used as the basis for the decision making process. That’s just marking. That’s just checking the working.
The transparency bit comes from understanding the decision making process and the extent to which the data is being used to support it. Not the publication of the number 7 or the amount £43,125.26.
Reporting is not transparency. Transparency is knowing the process by which the reporting informs and influences decision making.
I’m not sure that “openness” of throughput is a good thing either. I’m not even sure that openness of process is a Good Thing (because then it can be gamed, and turned against the public sector by private enterprise). I’m not sure at all how transparency and openness relate? Or what “openness” actually applies to? The openness agenda creeps (as I guess I am proposing here in the context of “openness” around decision making) and I’m not sure that’s a good thing. I don’t think we have thought openness through and I’m not sure that it necessarily is such a Good Thing after all…
What I do think we need is more openness within organisations. Maybe that’s where self-correction can start to kick in, when the members of an organisation have access to its internal decision making procedures. Certainly this was one reason I favoured openness of OU content (eg Innovating from the Inside, Outside) – not for it to be open, per se, but because it meant I could actually discover it and make use of it, rather than it being siloed and hidden away from me in another part of the organisation, preventing me from using it elsewhere in the organisation.
Managing the tracking suggested changes to the same set of docs, along with comments and observations, from multiple respondents in is one of the challenges any organisation who business is largely concerned with the production of documents has to face.
Passing shared/social living documents by reference rather than value, so that folk don’t have to share multiple physical copies of the same document, each annotated separately, is one way. Tools like track changes in word processor docs, wiki page histories, or git diffs, is another.
All documents have an underlying representation – web pages have HTML, word documents have whatever XML horrors lay under the hood, IPython notebooks have JSON.
Change tracking solutions like git show differences to the raw representation, as in this example of a couple of changes made to a (raw) IPython notebook:
Notebooks can also be saved in non-executable form that includes previously generated cell outputs as HTML, but again a git view of the differences would reveal changes at the HTML code level, rather than the rendered HTML level. (Tracked changes also include ‘useful’ ones, such as changes to cell contents, and (at a WYSWYG level at least) irrelevant ‘administrative’ value changes such as changes to hash values recored in the notebook source JSON.
However, the change management features are typically implemented through additional additional metadata/markup to the underlying representation:
For the course we’re working on at the moment, we’re making significant use of IPython notebooks, requiring comments/suggested changes from multiple reviewers over the same set of notebooks.
So I was wondering – what would it take to have an nbviewer style view in something like github that could render WYSIWYG track changes style views over a modified notebook in just cell contents and cell outputs?
This SO thread maybe touches on related issues: Using IPython notebooks under version control.
A similar principle would work the same for HTML too, of course. Hmm, thinks… are there any git previewers for HTML that log edits/diffs at the HTML level but then render those diffs at the WYSIWYG level in a traditional track changes style view?
Hmm… I wonder if a plugin for Atom.io might do this? (Anyone know if atom.io can also run as a service? Eg could I put it onto a VM and then axis it through localhost:ATOMIOPORT?)
PS also on the change management thing in IPython Notebooks, and again something that might make sense in a got context, is the management of ‘undo’ features in a cell.
IPython notebooks have a powerful cell-by-cell undo feature that works at least during a current session (if you shut down a notebook and then restart it, I assume the cell history is lost?). [Anyone know a good link describing/summarising the history/undo features of IPython Notebooks?]
I’m keen for students to take ownership of notebooks and try things out within them, but I’m also mindful that sometimes they make make repeated changes to a cell, lose the undo history for whatever reason, and then reset the cell to the “original” contents, for some definition of “original” (such as the version that was issued to the learner by the instructor, or the version the learner viewed at their first use of the notebook.)
A clunky solution is for students to duplicatea each notebook before they start to work on it so they have an original copy. But this is a bit clunky. I just want an option to reveal a “reset” button by each cell and then be able to reset it. Or perhaps in line with the other cell operations, reset either a specific highlight cell, reset all, cells, or reset all cells above or below a selected cell.
In digital electronics, the notions of fan in and fan out describe, respectively, the number of inputs a gate (or, on a chip, a pin) can handle, or the number of output connections it can drive. I’ve been thinking about this notion quite a bit, recently, in the context of concentrating information, or data, about a particular service.
For example, suppose I want to look at the payments made by a local council, as declared under transparency regulations. I can get the data for a particular council from a particular source. If we consider each organisation that the council makes a payment to as a separate output (that is, as a connection that goes between that council and the particular organisation), the fan out of the payment data gives the number of distinct organisations that the council has made a declared payment to.
One things councils do is make payments to other public bodies who have provided them with some service or other. This may include other councils (for example, for the delivery of services relating to out of area social care).
Why might this be useful? If we aggregate the payments data from different councils, we can set up a database that allows us to look at all payments from different councils to a particular organisation, (which may also be a particular council, which is obliged to publish its transaction data, as well as a private company, which currently isn’t). (See Using Aggregated Local Council Spending Data for Reverse Spending (Payments to) Lookups for an example of this. I think startup Spend Network are aggregating this data, but they don’t seem to be offering any useful open or free services, or data collections, off the back of it. OpenSpending has some data, but it’s scattergun in what’s there and what isn’t, depending as it does on volunteer data collectors and curators.)
The payments incoming to a public body from other public bodies are therefore available as open data, but not in a generally, or conveniently, concentrated way. The fan in public payments is given by the number of public bodies that have made a payment to a particular body (which may itself be a public body or may be a private company). If the fan in is large, it can be a major chore searching through the payments data of all the other public bodies trying to track down payments to the body of interest.
Whilst I can easily discover fan out payments from a public body, I can’t easily discover the originators of fan in public payments to a body, public or otherwise. Except that I could possibly FOI a public body for this information (“please send me a list of payments you have received from these bodies…”).
As more and more public services get outsourced to private contractors, I wonder if those private contractors will start to buy services off the public providers? I may be able to FOI the public providers for their receipts data (any examples of this, successful or otherwise?), but I wouldn’t be able to find any publicly disclosed payments data from the private provider to the public provider.
The transparency matrix thus looks something like this:
- payment from public body to public body: payment disclosed as public data, receipts available from analysis of all public body payment data (and reciipts FOIable from receiver?)
- payment from public body to private body: payment disclosed as public data; total public payments to private body can be ascertained by inspecting payments data of all public bodies. Effective fan-in can be increased by setting up different companies to receive payments and make it harder to aggregate total public monies incoming to a corporate group. (Would be useful if private companied has to disclose: a) total amount of public monies received from any public source, exceeding some threshold; b) individual payments above a certain value from a public body)
- payment from private body to public body: receipt FOIable from public body? No disclosure requirement on private body? Private body can effectively reduce fan out (that is, easily identified concentration of outgoing payments) by setting up different companies through which payments are made.
- payment from private body to private body: no disclosure requirements.
I have of course already wondered Do We Need Open Receipts Data as Well as Open Spending Data?. My current take on this would perhaps argue in favour of requiring all bodies, public or private, that receive more than £25,000, for example, in total per financial year from a particular corporate group* to declare all the transactions (over £500, say) from that body. A step on the road towards that would be to require bodies that receive more than a certain amount of receipts summed from across all public bodies to be subject to FOI at least in respect of payments data received from public bodies.
* We would need to define a corporate group somehow, to get round companies setting up EvilCo Public Money Receiving Company No. 1, EvilCo Public Money Receiving Company No. 2354 Ltd, etc, each of which only ever invoices up to £24,999. There would also have to be a way of identifying payments from the same public body but made through different accounts (for example, different local council directorates).
Whilst this would place a burden on all bodies, it would also start to level out the asymmetry between public body reporting and private body reporting in the matter of publicly funded transactions. At the moment, private company overheads for delivering subcontracted public services are less than public body overheads for delivering the same services in the matter of, for example, transparency disclosures, placing the public body at a disadvantage compared to the private body when it comes to transparency disclosures. (Note that things may be changing, at least in FOI stakes… See for example the latter part of Some Notes on Extending FOI.)
One might almost think the government was promoting transparency of public services gleeful in the expectation that as there privatisation agenda moves on a decreasing proportion of service providers will actually have to make public disclosures. Again, this asymmetry would make for unfair comparisons between service providers based on publicly available data if only data from public body providers of public services, rather than private providers of tendered public services, had to be disclosed.
So the take home, which has got diluted somewhat, is the proposal that the joint notions of fan in and fan out, when it comes to payment/receipts data, may be useful when it comes to helping us think about out how easy it is to concentrate data/information about payments to, or from, a particular body, and how policy can be defined to shine light where it needs shining.
A handful of posts caught my attention yesterday around the whole data thang…
First up, a quote on the New Aesthetic blog: “the state-of-the-art method for shaping ideas is not to coerce overtly but to seduce covertly, from a foundation of knowledge”, referencing an article on Medium: Is the Internet good or bad? Yes. The quote includes mention of an Adweek article (this one? Marketers Should Take Note of When Women Feel Least Attractive; see also a response and the original press release) that “noted that women feel less attractive on Mondays, and that this might be the best time to advertise make-up to them.”
I took this as a cautionary tale about the way in which “big data” qua theoryless statistical models based on the uncontrolled, if large, samples that make up “found” datasets, to pick up on a phrase used by Tim Harford in Big data: are we making a big mistake? [h/t @schmerg et al]) can be used to malevolent affect. (Thanks to @devonwalshe for highlighting that it’s not the data we should blame (“the data itself has no agency, so a little pointless to blame … Just sensitive to tech fear. Shifts blame from people to things.”) but the motivations and actions of the people who make use of the data.)
Which is to say – there’s ethics involved. As an extreme example, consider the possible “weaponisation” of data, for example in the context of PSYOP – “psychological operations” (are they still called that?) As the New Aesthetic quote, and the full Medium article itself, explain, the way in which data models allow messages to be shaped, targeted and tailored provides companies and politicians with a form of soft power that encourage us “to click, willingly, on a choice that has been engineered for us”. (This unpicks further – not only are we modelled so that the prompts are issued to us at an opportune time, but the choices we are provided with may also have been identified algorithmically.)
So that’s one thing…
Around about the same time, I also spotted a news announcement that Dunnhumby – an early bellwether of how to make the most of #midata consumer data – has bought “advertising technology” firm Sociomantic (press release): “dunnhumby will combine its extensive insights on the shopping preferences of 400 million consumers with Sociomantic’s intelligent digital-advertising technology and real-time data from more than 700 million online consumers to dramatically improve how advertising is planned, personalized and evaluated. For the first time, marketing content can be dynamically created specifically for an individual in real-time based on their interests and shopping preferences, and delivered across online media and mobile devices.” Good, oh…
A post on the Dunnhumby blog (It’s Time to Revolutionise Digital Advertising) provides further insight about what we might expect next:
We have decided to buy the company because the combination of Sociomantic’s technological capability and dunnhumby’s insight from 430m shoppers worldwide will create a new opportunity to make the online experience a lot better, because for the first time we will be able to make online content personalised for people, based on what they actually like, want and need. It is what we have been doing with loyalty programs and personalised offers for years – done with scale and speed in the digital world.
So what will we actually do to make that online experience better for customers? First, because we know our customers, what they see will be relevant and based on who they are, what they are interested in and what they shop for. It’s the same insight that powers Clubcard vouchers in the UK which are tailored to what customers shop for both online and in-store. Second, because we understand what customers actually buy online or in-store, we can tell advertisers how advertising needs to change and how they can help customers with information they value. Of course there is a clear benefit to advertisers, because they can spend their budgets only where they are talking to the right audience in the right way with the right content at the right time, measuring what works, what doesn’t and taking out a lot of guesswork. The real benefit though must be to customers whose online experience will get richer, simpler and more enjoyable. The free internet content we enjoy today is paid for by advertising, we just want to make it advertising offers and content you will enjoy too.
This needs probing further – are Dunnhumby proposing merging data about actual shopping habits in physical and online store with user cookies so that ads can be served based on actual consumption? (See for example Centralising User Tracking on the Web. How far has this got, I wonder? Seems like it may be here on mobile devices? Google’s New ‘Advertising ID’ Is Now Live And Tracking Android Phones — This Is What It Looks Like. Here’s the Android developer docs on Advertising ID. See also GigaOm on As advertisers phase out cookies, what’s the alternative?, eg in context of “known identifiers” (like email addresses and usernames) and “stable identifiers” (persistent device or browser level identifiers).)
That’s the second thing…
For some reason, it’s all starting to make me think of supersaturated solutions…
PS FWIW, the OU/BBC co-produced Bang Goes the Theory (BBC1) had a “Big Data” episode recently – depending on when you read this, you may still be able to watch it here: Bang Goes the Theory – Series 8 – Episode 3: Big Data
Remember mashups? Five years or so ago they were all the rage. At their heart, they provided ways of combining things that already existed to do new things. This is a lazy approach, and one I favour.
One of the key inspirations for me in this idea combinatorial tech, or tech combinatorics, is Jon Udell. His Library Lookup project blew me away in its creativity (the use of bookmarklets, the way the project encouraged you to access one IT service from another, the using of “linked data”, common/core-canonical identifiers to bridge services and leverage or enrich one from another, and so on) and was the spark that fired many of my own doodlings. (Just thinking about it again excites me now…)
As Jon wrote on his blog yesterday (Shiny old tech) (my emphasis):
What does worry me, a bit, is the recent public conversation about ageism in tech. I’m 20 years past the point at which Vinod Khosla would have me fade into the sunset. And I think differently about innovation than Silicon Valley does. I don’t think we lack new ideas. I think we lack creative recombination of proven tech, and the execution and follow-through required to surface its latent value.
Elm City is one example of that. Another is my current project, Thali, Yaron Goland’s bid to create the peer-to-peer web that I’ve long envisioned. Thali is not a new idea. It is a creative recombination of proven tech: Couchbase, mutual SSL authentication, Tor hidden services. To make Thali possible, Yaron is making solid contributions to Thali’s open source foundations. Though younger than me, he is beyond Vinod Khosla’s sell-by date. But he is innovating in a profoundly important way.
Can we draw a clearer distinction between innovation and novelty?
I often think of this in terms of appropriation (eg Appropriating Technology, Appropriating IT: innovative uses of emerging technologies or Appropriating IT: Glue Steps).
Or repurposing, a form of reuse that differs from the intended original use.
Openness helps here. Open technologies allow users to innovate without permission. Open licensing is just part of that open technology jigsaw; open standards another; open access and accessibility a third. Open interfaces accessed sideways. And so on.
Looking back over archived blog posts from five, six, seven years ago, the web used to be such fun. An open playground, full of opportunities for creative recombination. Now we have Facebook, where authenticated APIs give you access to local social neighbourhoods, but little more. Now we have Google using link redirection and link pollution at every opportunity. Services once open are closed according to economic imperatives (and maybe scaling issues; maybe some creative recombinations are too costly to support when a network scales). Maybe my memory of a time when the web was more open is a false memory?
Creative recombination, ftw.
PS just spotted this (Walking on custard), via @plymuni. If you don’t see why it’s relevant, you probably don’t get the sense of this post!