OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for June 2010

Principles for, and Practicalities of, Open Public Data

with one comment

Following the first meeting of the Public Sector Transparency Board last week, which is tasked with “driv[ing] forward the Government’s transparency agenda, making it a core part of all government business, a set of 11 draft public data principles have been posted for comment on the data.gov.uk wiki: Draft Public Data Principles [wiki version]

Following the finest Linked Data principles, each draft principle has its own unique URI… err, only it doesn’t… ;-) [here's how they might look on WriteToReply - WTR: Draft Public Data Principles - with unique URLs for each principle;-)]

The principles broadly state that users have a right to open public data, and that data should be published in ways that make it useable and useful (so machine readable, not restrictively licensed, easily discoverable and standards based, timely and fine grained). In addition, data unerlying government websites will be made available (as for example in the case of the DirectGov Syndication API?) Other public bodies will be encouraged to publish inventories of their data holdings and make it available for reuse.

A separate blog post on the data.gov.uk blog describes some of the practical difficulties that local government offices might face when opening up their data: Publishing Local Open Data – Important Lessons from the Open Election Data project (Again, unique URLs for individual lessons are unavailable, but here’s an example of how we might automatically generate identifiers for them;-) WTR: Lessons Learned from the Open Election Data Project). The lessons learned include a lack of corporate awareness about open data issues (presumably there is a little more awareness since the PM’s letter to councils on opening up data), a lack of web skills and web publishing resources, not to mention a very limited understanding of, and tools available for the handling of, Linked Data.

As to what data might be opened up, particularly by local councils, Paul Clarke identifies several different classes of data (There’s data, and there’s data):

- historical data;
- planning data;
- infrastructural data;
- operational data.

My own take on it can be seen here:


(An updated version of this presentation, with full annotations, should be available in a week or two!)

Looking around elsewhere, Local government data: lessons from London suggests:

- “don’t start hiring big, expensive consultancy firms for advice”;
- “do draw on the expertise and learning already there”;
– “do remember that putting the data out there of itself is not enough – it must be predicated on a model of engagement.”

Picking up on that last point, the debate regarding the “usefulness” of data has a long way to run, I think? Whilst I would advocate lowering barriers to entry (which means making data discoverable, queryable (so particular views over it can be expressed), and available in “everyday” CSV and Excel spreadsheet formats) there is also the danger that if we put the requirement for data to be demonstrably useful to publishers, this will deter them from opening up data they don’t perceive to be useful. In this respect, the Draft Public Data Principle that states:

Public data policy and practice will be clearly driven by the public and businesses who want and use the data, including what data is released when and in what form – and in addition to the legal Right To Data itself this overriding principle should apply to the implementation of all the other principles.

should help ensure that an element of “pull” can be used to ensure the release of data that others know how to make useful, or need to make something else useful…

On the “usefulness” front, it’s also worth checking out Ingrid Koehler’s post Sometimes you have to make useful yourself, which mentions existence value and accountability value as well as value arising from “meta” operations such as the ability to compare data across organisation operating in similar areas (such as local councils, or wards within a particular council).

For my own part, I’ve recently started looking at ways in which can can generate transparency in reporting and policy development by linking summary statistics back to the original data (e.g. So Where Do the Numbers in Government Reports Come From?), a point also raised in Open data requires responsible reporting… and the comments that follow it). Providing infrastructure that supports this linkage between summary reported data and the formulas used to generate those data summaries is something that I think would help make open data more useful for transparency purposes, although it sits at a higher level than the principles governing the straightforward publication and release of open public data.

See also:
Ben Goldacre on when the data just isn’t enough without the working…
Publishing itemised local authority expenditure – advice for comment

Written by Tony Hirst

June 30, 2010 at 10:32 am

Posted in Data

Tagged with ,

Whitelisted Hashtag Retweeter Pipe

leave a comment »

Last week, I got an email from Stuart with the following query:

I’m trying to find a way to enable people to post to the OU twitter account from their personal account by using a predefined hashtag. …
We agreed a hashtag #***** which kmi researchers are using from their account if they want to share information to the main OU account.  I pull an RSS feed of this into the OU account and retweet it.  I’m sure you can see the obvious loop that occurs!

… are you aware of anything that will let me retweet a hashtag and strip off that hashtag to avoid the loop?  It would be great to be able to add new hashtags in the future so it could be rolled out to other faculties who might wish to share their news via the OU account just by tweeting from individual faculty members’ accounts.

Here’s what I came up with…

hashtag filter pipe

The first part of the pipe takes the user defined hashtag and creates the URL that will run a search for that hashtag on twitter and the second part of the pipe fetches the feed. The Filter block will only pass through tweets that come from specified twitter users (actually, that isn’t quite true… this pipe is gameable/spammable becuase of the way I use “contains” in the whitelist filter block… Can you see how?!;-) The regular expression block strips the hashtag out of the retweeted tweets. (For the pipe to work and not get into an infinite loop, this isn’t actually necessary if we’re using the whitelist, because retweeters that make use of the pipe feed should not have their username in the whitelist… That is, if you’re running the whitelist, you can remove the regualr expression block and leave the hashtag in the retweet feed. Conversely, if you don’t want to run the whitelist, you can just remove the filter block, although in this case you will need the hashtag stripping regular expression block to prevent infinite retweets… Got that?!;-)

You can find the pipe here: Hashtag retweeter pipe

If you want a more “secure” version, i.e. one that does not reveal the identities of people in the whitelist, or the hashtag, use private string blocks (example pipe:

Making strings private to owner in Yahoo pipes

If you want to create your own hashtag retweeter pipe without having to clone and customise your own pipe, use this approach:

Customisable twitter retweet pipe

(NB if you leave either of the username slots blank, then tweets sent by anyone using the hastag will be passed through the pipe and made available for retweeting.)

Sigh… another claim… 2ZXZGU4TDXK2

Written by Tony Hirst

June 29, 2010 at 9:53 am

Posted in Pipework, Tinkering

Tagged with

So Where Do the Numbers in Government Reports Come From?

with 9 comments

Last week, the COI (Central Office of Information) released a report on the “websites run by ministerial and non-ministerial government departments”, detailing visitor numbers, costs, satisfaction levels and so on, in accordance with COI standards on guidance on website reporting (Reporting on progress: Central Government websites 2009-10).

As well as the print/PDF summary report (Reporting on progress: Central Government websites 2009-10 (Summary) [PDF, 33 pages, 942KB]) , a dataset was also released as a CSV document (Reporting on progress: Central Government websites 2009-10 (Data) [CSV, 66KB]).

The summary report is full of summary tables on particular topics, for example:

TABLE 1: REPORTED TOTAL COSTS OF DEPARTMENT-RUN WEBSITES
COI web report 2009-10 table 1

TABLE 2: REPORTED WEBSITE COSTS BY AREA OF SPENDING
COI web report 2009-10 table 2

TABLE 3: USAGE OF DEPARTMENT-RUN WEBSITES
COI website report 2009-10 table 3

Whilst I firmly believe it is a Good Thing that the COI published the data alongside the report, there is a still a disconnect between the two. The report is publishing fragments of the released dataset as information in the form of tables relating to particular reporting categories – reported website costs, or usage, for example – but there is no direct link back to the CSV data table.

Looking at the CSV data, we see a range of columns relating to costs, such as:

COI website report - costs column headings

and:

COI website report costs

There are also columns headed SEO/SIO, and HEO, for example, that may or may not relate to costs? (To see all the headings, see the CSV doc on Google spreadsheets).

But how does the released data relate to the summary reported data? It seems to me that there is a huge “hence” between the released CSV data and the summary report. Relating the two appears to be left as an exercise for the reader (or maybe for the data journalist looking to hold the report writers to account?).

The recently published New Public Sector Transparency Board and Public Data Transparency Principles, albeit in draft form, has little to say on this matter either. The principles appear to be focussed on the way in which the data is released, in a context free way, (where by “context” I mean any of the uses to which government may be putting the data).

For data to be useful as an exercise in transparency, it seems to me that when government releases reports, or when government, NGOs, lobbiests or the media make claims using summary figures based on, or derived from, government data, the transparency arises from an audit trail that allows us to see where those numbers came from.

So for example, around the COI website report, the Guardian reported that “[t]he report showed uktradeinvest.gov.uk cost £11.78 per visit, while businesslink.gov.uk cost £2.15.” (Up to 75% of government websites face closure). But how was that number arrived at?

The publication of data means that report writers should be able to link to views over original government data sets that show their working. The publication of data allows summary claims to be justified, and contributes to transparency by allowing others to see the means by which those claims were arrived at and the assumptions that went in to making the summary claim in the first place. (By summary claim, I mean things like “non-staff costs were X”, or the “cost per visit was Y”.)

[Just an aside on summary claims made by, or "discovered" by, the media. Transparency in terms of being able to justify the calculation from raw data is important because people often use the fact that a number was reported in the media as evidence that the number is in some sense meaningful and legitimately derived. ("According to the Guardian/Times/Telegraph/FT, etc etc etc". To a certain extent, data journalists need to behave like academic researchers in being able to justify their claims to others.]

In Using CSV Docs As a Database, I show how by putting the CSV data into a Google spreadsheet, we can generate several different views over the data using the using the Google Query language. For example, here’s a summary of the satisfaction levels, and here’s one over some of the costs:

COI website report - costs
select A,B,EL,EN,EP,ER,ET

We can even have a go at summing the costs:

COI summed website costs
select A,B,EL+EN+EP+ER+ET

In short, it seems to me that releasing the data as data is a good start, but the promise for transparency lays in being able to share queries over data sets that make clear the origins of data-derived information that we are provided with, such as the total non-staff costs of website development, or the average cost per visit to the blah, blah website.

So what would I like to see? Well, for each of the tables in the COI website report, a link to a query over the co-released CSV dataset that generates the summary table “live” from the original dataset would be a start… ;-)

PS In the meantime, to the extent that journalists and the media hold government to account, is there maybe a need for data journalysts (journalist+analyst portmanteau) to recreate the queries used to generate summary tables in government reports to find out exactly how they were derived from released data sets? Finding queries over the COI dataset that generate the tables published in the summary report is left as an exercise for the reader… ;-) If you manage to generate queries, in a bookmarkable form (e.g. using the COI website data explorer (see also this for more hints), please feel free to share the links in the comments below :-)

Written by Tony Hirst

June 28, 2010 at 9:22 am

Guardian Datastore MPs’ Expenses Spreadsheet as a Database

with 6 comments

Continuing my exploration of what is and isn’t acceptable around the edges of doing stuff with other people’s data(?!), the Guardian datastore have just published a Google spreadsheet containing partial details of MPs’ expenses data over the period July-Decoember 2009 (MPs’ expenses: every claim from July to December 2009):

thanks to the work of Guardian developer Daniel Vydra and his team, we’ve managed to scrape the entire lot out of the Commons website for you as a downloadable spreadsheet. You cannot get this anywhere else.

In sharing the data, the Guardian folks have opted to share the spreadsheet via a link that includes an authorisation token. Which means that if you try to view the spreadsheet just using the spreadsheet key, you won’t be allowed to see it; (you also need to be logged in to a Google account to view the data, both as a spreadsheet, and in order to interrogate it via the visualisation API). Which is to say, the Guardian datastore folks are taking what steps they can to make the data public, whilst retaining some control over it (because they have invested resource in collecting the data in the form they’re re-presenting it, and reasonably want to make a return from it…)

But in sharing the link that includes the token on a public website, we can see the key – and hence use it to access the data in the spreadsheet, and do more with it… which may be seen as providing a volume add service over the data, or unreasonably freeloading off the back of the Guardian’s data scraping efforts…

So, just pasting the spreadsheet key and authorisation token into the cut down Guardian datastore explorer script I used in Using CSV Docs As a Database to generate an explorer for the expenses data.

So for example, we can run for example run a report to group expenses by category and MP:

MP expesnes explorer

Or how about claims over 5000 pounds (also viewing the information as an HTML table, for example).

Remember, on the datastore explorer page, you can click on column headings to order the data according to that column.

Here’s another example – selecting A,sum(E), where E>0 group by A and order is by sum(E) then asc and viewing as a column chart:

Datastore exploration

We can also (now!) limit the number of results returned, e.g. to show the 10 MPs with lowest claims to date (the datastore blog post explains that why the data is incomplete and to be treated warily).

Limiting results in datstore explorer

Changing the asc order to desc in the above query gives possibly a more interesting result, the MPs who have the largest claims to date (presumably because they have got round to filing their claims!;-)

Datastore exploring

Okay – enough for now; the reason I’m posting this is in part to ask the question: is the this an unfair use of the Guardian datastore data, does it detract from the work they put in that lets them claim “You cannot get this anywhere else”, and does it impact on the returns they might expect to gain?

Sbould they/could they try to assert some sort of database collection right over the collection/curation and re-presentation of the data that is otherwise publicly available that would (nominally!) prevent me from using this data? Does the publication of the data using the shared link with the authorisation token imply some sort of license with which that data is made available? E.g. by accepting the link by clicking on it, becuase it is a shared link rather than a public link, could the Datastore attach some sort of tacit click-wrap license conditions over the data that I accept when I accept the shared data by clicking through the shared link? (Does the/can the sharing come with conditions attached?)

PS It seems there was a minor “issue” with the settings of the spreadsheet, a result of recent changes to the Google sharing setup. Spreadsheets should now be fully viewable… But as I mention in a comment below, I think there are still interesting questions to be considered around the extent to which publishers of “public” data can get a return on that data?

Written by Tony Hirst

June 25, 2010 at 12:51 pm

Using CSV Docs As a Database

with 5 comments

Earlier today, my twinterwebs were full of the story about the COI announcing a reduction in the number of government websites:

Multiple sources of news... not...

(The told the story differently, at least…)

A little while later(?), a press released appeared on the COI website: Clamp down on Government websites to save millions, although by that time, via @lesteph, I’d found a copy of the interim report and a CSV dataset Reporting on progress: Central Government websites 2009-10.

So what can we do with that CSV file? How about turning it into a database, simply by uploading it to Google docs? Or importing it live (essentially, synching a spreadsheet with it) from the source URL on the COI website?

<a href="COI website report - costs column headings” title=”Photo Sharing”>Import csv into google docs

I had a quick play pruning the code on my Guardian Datastore explorer, and put together this clunky query tool that lets you explore the COI website data as if it was a database.

COI Website review CSV data explorer

The explorer allows you to bookmark views over the data, to a limited extent (the ascending/descending views aren’t implemented:-(, so for example, we can see:

- websites with a non-zero “Very Poor Editorial” score
- Age profile of visitors (where available)
- Costs

(Feel free to share bookmarks to other views over the data in the comments to this post.)

Note that the displayed results table is an activie one so you can click on column headings to order the results by column values.

SOrting a table by colun

Note that there seem to be issues with columns not being recognised as containing numerical data (maybe something to do in part with empty cells in a column?), which means the chart views don’t work, but this page is not trying to do anything clever – it’s just a minimal query interface over the visualisation API from a spreadsheet. (To build a proper explorer for this dataset, we’d check the column data types were correct, and so on.)

Looking at the app, I think it’d probably useful to display a “human readable” version of the query too, that translates column identifiers to column headings for example, but that’s for another day…

GOTCHAS: use single quote (‘) rather than double quote (“) in the WHERE statements.

Related: Using Google Spreadsheets as a Database with the Google Visualisation API Query Language
Using Google Spreadsheets Like a Database – The QUERY Formula

Written by Tony Hirst

June 25, 2010 at 10:29 am

Posted in Data

Tagged with ,

What Can Google Do For You?

with one comment

Three or four years ago, I use to do a lot of presentations and workshops around “search”. At the University of Portsmouth Learning and Teaching conference yesterday, I returned to my roots with a presentation on some of many of the Google search tools that I use on a daily or weekly basis…

As ever, I’ve popped the presentation onto Slideshare: What Can Google Do for You?

And as for the links to the various services? You’ll just have to google them…!;-)

PS This presentation is a great complement to the above one:

[via the Arcadia Project blog]

Written by Tony Hirst

June 25, 2010 at 9:02 am

Posted in Library, Search

Tagged with

First Glimpses of the OUConf10 Hashtag Community

with 2 comments

Another conference, another hashtag, another opportunity to explore a shortlived emergent network… My default app for this is the (stalled under construction) OUseful hashtag community viewer, which as well as listing my friends and followers using the hashtag, also shows the folk I don’t necessarily share friend/follower relations with, along with a “report” about my reach into the community (and it’s reach into my Twitter traffic streams):

OUconf10 hashtag report

(Click through and hack the URL to see a report personalised around your account…)

As Martin W expressed an interest in doing some sort of “research” around the currently running OU (Online) Conference, I thought I’d have a little explore yesterday to see whether I could get pull a script or two together to look at the network connections across the OUConf hashtag community.

Using the Tweepy Python wrapper for the Twitter AP, authenticated to a Twitter account that I think I got whitelisted a year or so ago (so effectively no limit on the number of API calls per hour), I wrote a script that, for each of the users identified in a hashtaggers list (gleaned via an old hashtaggers Yahoo pipe (?!;-) I had laying around somewhere… as soon as I get started with a Twapperkeeper API key, I’ll use that as my source (although I guess I could use other Martin’s nifty Google spreadsheet twitter archiver in the meantime ;…) And what did that script do?

- pulled the list of the hashtaggers’ followers (as Twitter numeric IDs)
- pulled the list of the hashtaggers’ friends (as Twitter numeric IDs)

I then created three output file variants each based on the following sorts of network connections between individuals:

- hashtagger -> friend, so we can look at people friended by the hashtaggers;
- follower -> hashtagger, so we can look at people following the hashtaggers;

The three variants were:

- all connections;
- inner connections (that is, only show the connection if the hashtagger and the friend/follower is also a hashtagger); [so, err, maybe the inner friends and inner followers are the same...? I can't think straight!]
- outer connections (that is, only show the connection if the friend/follower is not a hashtagger).

I also created a file containing hashtaggers’ IDs and screennames by making API requests for the user details of each of the hashtaggers.

The snapshot I got of the hashtag community was grabbed yesterday afternoon and is only a partial one (i.e. it only contains about 50-60 of the hashtaggers, compared to the 100 or so we’re currently at…)

So here are a couple of views, using Gephi. Firstly, the internal friends connections:

ouconf10 inner friends

The size of the nodes relate to in-degree, which means that a large node corresponds to an individual who has been friended by lots of the other hashtaggers.

Looking at the betweenness centrality metric (a measure of the extent to which an individual is on the shortest path between two other individuals in the network), we can start to explore the structure of the community a little more.

ouconf10 inner friends betweenness

Here’s a look over the whole set of friends of people in the hashtag network (that is, the people who the hashtaggers may be influenced by):

ouconf friends indegree

By plotting in-degree as a the node size, we can see who is most friended by members of the hashtag network, including other members of the hashtag community. If we filter the view to show only nodes with an in-degree greater than 10, we see who is respected (by virtue of being friended by) members of the hashtag community:

ouconf10 freinds in degree more than 10

If we look at the “friends outer” network, we get a view over just the people outside the hashtag community who are followed by people inside it:

ouconf10 outerfriends indegree GT 10

As far as reach goes, that is, the number of people who may be seeing hashatg traffic via people they follow, we need to look the followers of the hashtaggers. Once again, if we look at the “outer” graph, we see the people who are seeing hashtag tweets from their friends, but who aren’t using the hashtag:

ouconf followers all biggest

We can also trivially see which members of the hashtag community have the largest number followers:

ouconf10 all followers in degree

If we implement an ego filter with depth 1, we can then look to see which followers are connected to a particular individual. (By changing the filter settings, eg going from one person (mweller, say) to another (gconole, say), we can see the similarities and differences between their followers.

Okay, that’s enough for now… no real “research”, but a few of really quick examples of how you can use Gephi to start to explore the structure of a hashtag network, and the size of the community around it.

Issues: the friends/followers lists are numeric IDs. Calling the API once per ID to get the screen names would be expensive, but there are a couple of possible heuristics. For example, you can pull back tweets from the 100 most recent friends or followers of an individual, including screen names and IDs) with a single API call, so we can use that to grab names for identifiers across the network. If we pick a hashtagger with largest spanning coverage over the network, we could pull this information for all their friends/followers, 100 at a time.

Alternatively, we could use something like Gephi to report on the most connected individuals whose names we don’t know, and use that to prioritise which user details we get first to further annotate the visualised graph.

Written by Tony Hirst

June 23, 2010 at 10:56 am

Posted in Visualisation

Tagged with ,

uTitle: Anytime Twitter Captioning of Youtube Videos

with one comment

The story so far… a long time ago now, I built a crude proof of concept showing how to annotate Youtube videos with captions extracted from hashtagged Twitter feeds. And now, every time I look at Martin Hawksey’s RSC MASHe blog, he’s pushed the idea on further…

So for example, the latest installment is anytime captioning of Youtube videos – simply start watching YouTube video in the uTitle environment, and you can tweet along to the video, captioning it as you do so (Convergence @youtube meets @twitter: In timeline commenting of YouTube videos using Twitter [uTitle]:

A great attraction of this service is that it allows a viewer to watch the video at any time, and yet drop twitter captions into the video at the appropriate point. (The original demo grabbed captions from a live hashtag stream to add to video recordings of live presentations, and set the zero time to the start time of the event/recording.)

uTitle integrates with Twapperkeeper, a Twitter archiving service that I think has received some amount of support from JISC, so it’ll be interesting to see if the uTitle use case helps drive innovation on that front as well as in video annotation. (So for example, at the moment, uTitle uses a Youtube video ID hashtag, as well as a time stamp, to identify tweets that are captioning a particular video. As Twitter opens up its annotation service, it’ll be interesting to see if the identifier can be pushed down to the annotation layer (maybe replaced by a blanket #utitle hashtag in the main tweet?) and Twapperkeeper support extended to include annotations. (I’d also be keen to see Twapperkeeper supporting the archiving of timestamped friends/followers lists, to allow for visualisations and analysis of the growth of networks over time. This may go against Twitter ToS of course (I haven’t checked…)).

Playing with the service just now, it struck me that if I was “live tweeting” along to a video I was watching, by the time I had written a tweet, the time stamp would have moved on. So by the time I post a tweet, it will appear as a caption maybe 10 or 20 seconds after the point in the video it refers to. A simple trick might be have a setting that would stop the timer in the tweet when someone starts typing a new tweet, so that on playback the tweet appears at the time in the video when the commenter started to write the tweet, rather than when it was finished and posted?

(Of course, it’s also possible to pause the video, and even move the playhead back to set the timestamp as required; but I think the above approach is more elegant?)

Another possibly useful tool might be something like the iPod “30s rewind” button, that just nudges the playhead back a few seconds (this might be useful for example if you’re typing a comment as the video plays, and you miss something you want to listen to again…)

There are probably lots of other “freeze time” options that make sense when capturing “live” comments against a recording, but none spring to mind just at the moment!;-)

PS As to where Martin might push uTitle next, I can’t wait to see…:-) Maybe Google will add the idea to Youtube along with Google Moderator and the new Youtube video editor? Or maybe martin will find some API dangly bits around the Youtube Replay it service that’s just started rolling out as a Google live search feature, and which allows you to “zoom to any point in time and “replay” what people were saying publicly about a topic on Twitter.”

Written by Tony Hirst

June 21, 2010 at 1:35 pm

Posted in Anything you want

Tagged with ,

A Couple of Things from Last Week’s Independent on Sunday…

with one comment

Really struggling to do anything creative or thought requiring this week, but as I declutter my pockets I find a couple of things I tore out from last weekend’s Independent on Sunday that I wanted to put a marker on.

Firstly, in an article about a fully fledged in-store Tesco Bank (Cheerio then, Sir Terry – you’re a hard act to follow) I spotted this:

Tesco Personal Finance, now wholly owned and renamed Tesco Bank, is being primed to become a major competitor to the high-street banks. Already, six million customers buy Tesco financial products and have deposited £4.4bn with the company, but later this year the group intends to launch a savings account and a range of mortgages; then in the second half of 2011 it will launch a current account.

[W]hereas firms such as Virgin or Metro are seeking branches to build banking empires, Tesco will use its existing stores. The bank will also exploit the data from customers’ Clubcards. [my emphasis]

Oh good… if anyone knows how that might work in practice, I’d love to know… The nightmare scenario (and one I’m sure won’t happen) would be Clubcard operator and data cruncher Dunnhumby trawling through your current account data and working out what you are spending, but not at Tesco, so they can send ever more targeted promotions to you. (So for example, I think they already count the calories you buy from Tesco so they can have a guess at how many you don’t buy… Knowing how much you’re spending at other supermarkets would be a nice trick to have up your sleeve, I should think.)

Spending information might also help with price setting, something it seems as if Dunhumby are looking to develop further if their acquisition of KSS Retail is anything to go by… (e.g. Tesco Clubcard company Dunnhumby buys KSS Retail).

Supermarkets are wary of price wars, of course (reduces margins and cuts into profits), so finding optimal pricing models (just ponder what “optimal” might mean there…;-), and using those models to also influence shopping behaviour, can generate useful returns.

The other clip I have from the IoS is also price related, a comment by Robert Chalmers in a profile of one time editor of The Sunday Times, Harold Evans (Harold Evans: ‘All I tried to do was shed a little light’):

“So how do you feel about the Murdoch empire now?”

Evans pauses. “I’m not that familiar with the British… OK. Let’s take an alternative scenario. Murdoch never arrives. I manage to take control of The Sunday Times with the management buyout. Then I get defeated by the unions. The Independent wouldn’t be here. Rival papers survived because they got the technology. Thanks to Murdoch.”

Thanks to a man who, by starting the price war, created a situation where profit is driven not by a newspaper’s retail price but by its advertising, to the point that advertisers risk dictating editorial content. Haute- couture houses don’t fancy the idea of photographs of dead Congolese babies next to their latest tanning oil, do they? [My emphasis]“

“Thanks to a man who, by starting the price war, created a situation where profit is driven not by a newspaper’s retail price but by its advertising” – brilliant…

My reading of that is that maybe at one time the papers had a pay wall of sorts that kept in balance their reliance on advertising income. Price wars increased the percentage returns from the advertising, and then the internet arrived. The first generation online ad container – banner ads – were as ineffective as any other sort of advertising, (I guess – maybe more so?) and I’m guessing didn’t pose a huge threat to ad spend in the newspapers (note to self: look at financial state of news sales and ad industry returns over last 20 years…); but the AdWords container did, because you could start to track what happened to any interest raised by the ad. With ever more sophisticated forms of personalised behavioural advertising (which isn’t just online – it’s what ClubCard does, right?) the route that newspaper advertising provides is threatened from the other side. (That is, AdWords provide trackability, which newspaper ads don’t; behavioural marketing provides more sophisticated segmentation than the crude ABC demographic reach that newspapers provide.)

I’ve no idea how the Times paywall is doing at the moment, but as News Corp makes moves on BSkyB, I wonder if we’re going to a see folk taking up (exclusive?) membership subscriptions to cross platform content providers, who maybe also run content access points (iDevices, Sky boxes, etc.) and optionally content generation/commissioning. And who would be in the running? Apple and NewsCorp/Sky at the very least. (Not Tesco, yet… Err… maybe: Tesco sets up film studio to adapt hit novels;-). I’m certainly watching out for signs of someone making moves on buying up online game distributor Steam, though…

Written by Tony Hirst

June 16, 2010 at 3:20 pm

Posted in Anything you want

Scribbled Ideas for “Research” Around the OU Online Conference…

with 5 comments

So it seems I missed a meeting earlier this week planning a research strategy around the OU’s online conference, which takes place in a couple of weeks or so… (sigh: another presentation to prepare…;-)…

For what it’s worth, here are a few random thoughts about things I’ve done informally around confs before, or have considered doing… I’ve got the lurgy, though, those this is pretty much a raw core dump and is likely to have more typos and quirky grammatical constructions than usual (can’t concentrate at all:-(

- Twitter hashtag communities: I keep thinking I should grab a bit more data (e.g. friends and followers details) around folk using a particular hashtag, and then do some social network analysis on the resulting network so I can write a “proper research paper” about it, but that would be selfish; because I suspect what would be more useful would be to spend that time making it easier for folk to decide whether or not they want to follow other hashtaggers, provide easy ways to create lists of hashtaggers, and so on. (That said, it would be really handy to get hold of the script that Dave Challis cobbled together around Dev8D (here and here) and then used to plot the growth of a twitter community over the course of that event. What’s required? Find the list of folk using the hashtag and then several times a day just grab a list of all their friends and followers (So we require two scripts: one to grab hashtaggers every hour or so and produce a list of “new” hashtaggers; one to grab the friends and followers of every hashtagger once an hour or so (or every half day; or whatever… if this is a research project, I guess it’d make sense to set quite a high sample rate and then learn from that what an effective sample rate would be?). Then at our leisure we can plot the network, and I guess run some SNA stats on it. (We could also use a hshtagger list to create a twitter map view of where folk might be participating from?) One interesting network view would be to show the growth of new connections between two time periods. I’m not sure if the temporal graphs Gephi supports would be handy here, but it’d be a good reason to learn how to use Gephi’s temporal views:-) If the conf is mainly hashtagged by OU users, then it won’t be interesting, because I suspect the OU hashtag community is already pretty densely interconnected. As the conference is being held (I think) in Ellumniate, it might be that a lot of the backchannel chatter occurs in that closed environment…? Is it possible to set up elluminate with a panel showing part of someone’s desktop that is streaming the conference hashtag, I wonder – ie showing backchannel chat within the elluminate environment using a backchannel that exists outside elluminate? (Thinks: would it be worth having a conference twitter user that autofollows anyone using the conf hashtag?) Other twitter stuff we can do is described in Topic and Event based Twittering – Who’s in Your Community?. Eg from the list of hashtaggers, we could see what other hashtags they were using/have recently used, helping identify situation of OU conf in other contexts according to the interests of people talking about the OU conf.

- Facebook communities might be another thing to look at. The Netvizz app will grab an individuals network, and the connections between members of that network (unless recent privacy changes have broken things?). This data is trivially visualised in Gephi, which can also determine various SNA stats. Again it would make sense to grab regular dumps of data in maybe two cases: 1) create a faux Facebook user and get folk to friend it, then grab a dump of it’s network every hour or so (is it possible to autofriend people back? Or maybe that’s a job for a research monkey…?! Alternatively, get folk to join a conference group and grab a dump of the members of the group every hour or so (or every whenever or so). The only problem with that is if the group has more than 200 members, you only get a dump of a randomly selected 200 members.

- link communities – by which I mean look at activity around links that are being shared via eg twitter (extract the links from the (archived) hashtag feed) , or bookmarked on delicious. I’ve doodled social life of URL ideas before that might help provide macroscopic views over what links folk are sharing, and who else might be interested in those links (e.g. delicious URL History: Users by Tag or edupunk chatter). From socially bookmarked links, we can also generate tag clouds.

- chatter clouds: term extraction word clouds based on stuff that’s being tweeted with a particular hashtag.

- blog post communities: just cobble together a list of blogs that contain posts written around conf sessions.

- googalytics, bit.lytics: not sure what Google analytics you’d collect from where, but an obvious thing to do with them would be to look at the incoming IP adddresses/domains to see whether the audience was largely coming in from educational institutions. (Is there a list of IP ranges for UK HEIs, I wonder?) If any links are shared in the conference context, eg by backchannel, it would might sense shortening all those links on bit.ly with a conf API key, so you could track all click throughs on bit.ly shortened versions of that target link. The point would be to just be able to produce a chart of something like “most clicked through links for this conf”.

Bleurghhhhh….

Written by Tony Hirst

June 11, 2010 at 12:30 pm

Posted in Anything you want

Tagged with

Follow

Get every new post delivered to your Inbox.

Join 126 other followers