Archive for December 2010
Who Can See Whose Conversations In-stream on Twitter?
A week or so ago, I posted a quick hack using the Google Social graph API showing how to generate a list Common Friends or Followers on Twitter, so that you could look up which folk would see, in their Twitter timeline, a conversation between two other people on Twitter. (A hosted version of the service is now available here.)
At the time I also generalised the code so that you could look up the extent to which any party could see the conversations between two other parties on twitter, in stream. This is another single page web app and it can be found hre: Twitter in-stream eavesdrop.
It looks like this:
Very simple, very quick… there is nothing more to it than what you find if you View Source… (In fact, there’s more than you need if you View Source – there is also a Google Analytics tracking code in there…)
Scholarly Communication in the Networked Age
Last week, I was fortunate enough to receive an invitation to attend the Texts and Literacy in the Digital Age: Assessing the future of scholarly communication at the Dutch National Library in Den Haag (a trip that ended up turning into a weekend break in Amsterdam when my flight was cancelled…)
The presentation can be found here and embedded below, if your feed reader supports it:
One thing I have tried to do is annotate each slide with a short piece of discursive text relating to the slide. I need to find a way of linearising slide shows prepared this way to see if I can find a way of generating blog posts from them, which is a task for next year…
The presentation draws heavily on Martin Belam’s news:rewired presentation from 2009 (The tyranny of chronology), as I try to tease out some of the structural issues that face the presentation of news media in an online networked age, and constrast (or complement) them with issues faced by scholoarly publishing.
One of the things I hope to mull over more next year, and maybe communicate in a more principled way rather than via occasional blog posts and tweets, are the ways in which news media and academia can work together to put the news into some sort of deeper context, and maybe even into a learning (resource) context…
Unmeasurable Impact
Lots of deleted stuff I might have regretted posting…
(I also apologise in advance for what some might take to be the self-aggrandising nature of this post…)
Anyway, that’s all as maybe… One of the ideas I started trying to develop in preparing the promotion case was the notion of “influence”, and how online, network based activities might result in payoff for someone else, through being influenced, that could in part trickle back through some sort of recognised acknowledgement, or feed forward into a payoff that makes the academic or host institution more productive.
So here are a handful of examples from the last week or so that provide anecdotal evidence about the influence and reach of posts appearing on OUseful.info:
I flashed up on screen a post from Tony Hirst’s OUseful blog where he confessed to ‘hassling’ Simon Rogers over the formats of some of the information in the Guardian Datastore.
Tony’s contributions are fantastically useful, and the team have now changed some of their workflows to try and include more universal identifiers. On datasets with country lists, for example, they now aim to provide the two letter ISO country code in order to get around confusion when comparing datasets that might feature Burma or Myanmar for example.
[C]hanged some of their workflows… right… so that might make it easier for others, such as academics stooping so low as to use news media published data rather than “original” sources in their own work. Or it might mean that folk who are not academics putting the data to work because it’s now easier for them to do so, and getting real value out of it.
(Academic bashing? Me? Surely not… Though of course, I have come to realise over the last year that I am absolutely not considered an academic by the academy…)
Here’s the second example, referring to some “work” that resulted from an open exchange over a weekend earlier this year which Cameron Neylon reviewed in A little bit of federated Open Notebook Science. The context is graphing user and compound interactions by extracting the appropriate bipartite graph from a set of open notebooks:
We are very fortunate that Don Pellegrino, an IST student at Drexel, has selected the analysis of networks within Open Notebooks as part of his Ph.D. work. He has started to report his progress on our wiki and is eager to receive any feedback as the work progresses (his FriendFeed account is donpellegrino).
Don’s first report is available here. He is using the Open Source software Gephi for visualization and has provided all of the data and code on the associated wiki page. (also see Tony Hirst’s description of mapping ONS work which provided some very useful insights) Don has provided a detailed report of his findings but I think the most important can be seen in the global plot below.
[S]ome very useful insights – right, a couple of approximately and quickly worked through examples that sketched out some possible ways of looking at this area, as well as crude proof of concept demos; it maybe also identified some dead-ends that might otherwise have been pursued?
Finally, this from Brian Kelly:
Niall Sclater made his point succinctly:
@mweller @psychemedia delicious. i rest my case.
The case Niall was making was, I suspect, that one shouldn’t be promoting use of Cloud services within institutions. This is an argument (although that might be putting it a bit too strongly) which Niall has been having over the past few years with Tony Hirst and Martin Weller, his colleagues at the Open University. As I described in a post on “When Two Tribes Go To War” back in 2007:
Niall Sclater, Director of the OU VLE Programme at the Open University recently pointed out that the Slideshare service was down, using this as an “attempt to inject some reality into the VLEs v Small Pieces debate“. His colleague at the Open University, Tony Hirst responded with a post entitled “An error has occurred whilst accessing this site” in which Tony, with “beautifully sweet irony“, alerted Niall to the fact that the OU’s Intranet was also down.
Back then the specifics related to the reliability of the Slideshare service, with Tony pointing out the the Slideshare service was actually more reliable that the Open University’s Intranet. But that was just a minor detail. The leaked news that Yahoo was, it appeared, intending to close a social bookmarking services which is highly regarded by many of its users, was clearly of much more significance. So is Niall correct to rest his case on this news? Or, as Niall wrote his tweet before we found that the news of Delicious’s death was greatly exaggerated, might we feel that the issue is now simply whether an alternative social bookmarking service should be used?
What this example shows, and maybe the one before it too, is that the very act of working in open and in public means that the process of the work/interaction as well as the “work” itself can become the focus of (authentic) stories in other people’s work. Brian has been telling the above story repeatedly over the last few years, which has the side-effect of raising the OU’s profile as an institution that is *really* engaged with these issues.
None of the above anecdotes has resulted in an academic citation for me, so none of it counts in academic terms. None of the above resulted in the OU being paid for the time I spent engaged in the related activities, so it none if it helped the OU bottom line directly (we’re really, really a business now, right?). None of the above ended up in any OU course materials (to my knowledge). It was all, from my perspective looking round my current institutional role, pointless…
PS it’s worth noting that, through trackbacks and email requests, I see these ephemeral “been influenced by” signals on my web radar as a matter of course. But my internal profile is largely below the radar, and these “influence signals” are likely to be even more invisible. This maybe suggests that my reach is only to folk who look outwards (from any institution), using the web, or the people who see me give a presentation (which I do once a month or so)… Hmm…
Broken RSS, and a Comment About Blog Comments
Originally posted as a comment on Brian Kelly’s Is It Too Late To Exploit RSS In Repositories?:
I used to advocate the adoption of RSS a lot, and came across some of the problems you mention repeatedly, such as the inability to consume certain pages in off-the-shelf feed consuming apps.
Many of the problems resulted from non-standard character encodings, or incorrectly encoded item.description text. Links/URLs were occasionally missing or pointless (e.g. pointing to the root domain from which the feed was served, rather than anything relating to the particular feed item). Generating sensible URLs for feed items could also turn up issues with the way pages were served, eg on sites where session variables or other arbitrary keys were required.
The reason the problems were allowed to slip through was because of the context in which the feeds were published. Eg request goes in for ‘we need a feed’; developer adds feed, runs it through validator, job done.
But the job isn’t done, just as the job isn’t done when a someone publishes a public/open data set but doesn’t do anything more than that, or someone publishes an OER and considers that now it’s public, it’s useful.
I spend way too much of my time trying to glue things together, and finding more often than not that they don’t play nice. For example, Guardian datastore data often falls just short of being easily combined with other data sets, even other Guardian datastore published datasets, though this is getting better all the time as workflows are tweaked ever so slightly…
One possible solution, where things are published /with the intentions that others re* them/ is for the publisher to demonstrate a simple remix or combination with at least one other information source.
If you publish an RSS feed, demonstrate one or two off-the-shelf ways of consuming it. This is what any user is likely to try first, so save them the grief of finding out it doesnlt work by making sure it does.
When releasing data, if you’re publishing data relating to countries, for example, see if you can use one of the many services for generating map mashups to map the data. IF you can’t, what is it in or missing from your data that’s making it hard to do.
If you’re publishing an OER, big or little, /how/ might you see it being remixed/reused with other OERs. If your content includes lots of diagrams, how easy is it for someone else to reuse that image (with attribution and in compliance with any other license requirements) in their own presentation. If they want to embed it in a blog post (generating not only more views of the content, but also trackable data that you can measure) just try giving a few examples of embedded use. If it’s hard for use as publisher to do the baby steps, why should anyone else bother? (Saying you’re publishing something because you don’t know how other people will use it is not the issue… if it’s hard to do the easy stuff, very few people will bother. The publisher needs to demonstrate the easy stuff, and see it as a way of getting a couple of pragmatic tests implemented as well as a quick tutorial in getting started with re*ing the warez.
PS one of the things I’m considering doing more next year is comment on other people’s posts directly. The danger with taking such an approach is that those responses get lost (i.e. I can’t easily search for them, and as the major user of this blog as personal notebook, searching over things I’ve previously written is an important feature). Of course, I could blog a response to other peoples’ posts, but this fractures the conversation somewhat. I also know from experience that whilst folk may read comments on a blog post, they may not always click through on trackbacked links, if such links exist.
So, I’m considering adding a new category to this blog – CommentedElsewhere – that captures the longer comments as reposts here, with a link back to the original comment, and the original context. Good plan, or not? Will it just make OUseful.info even harder to follow? Should I set-up a separate ‘OUsefulComments” blog, repost substantial comments there and then maybe draw a feed into the sidebar here? Your comments would be appreciated…:-)
@SOU_Airport No ads, thanks, just info… A Flight Tracking Autoresponder Would Be Handy Though…
A few minutes or so, @sou_airport tweeted:
Welcome to new followers of Southampton Airport. Since the snow, followers have doubled and we will keep you up to date with news and offers
With recent news stories deploring the state of information provision, the occasional tweets from @sou_airport regarding the status of the airport have been handy…
- “Southampton Airport is to open from 06:30am today- some knock-on delays are expected due to the weather. Pls check with airlines for info.”
- “Southampton Airport currently has capacity on Flybe flights to Amsterdam, Paris, Dusseldorf and Brussels up to Christmas.”
If they’re just going to start tweeting ads and offers, though, I’m not interested, and will likely unfollow… Just because a company suddenly opens up a comms channel that folk sign up doesn’t mean it needs to be a marketing channel – the payoff is in having fewer disgruntled passengers, and folk turning up to fly to find cancelled flights and adding to the airport’s problems…
If they want to consign me to following a backwater channel, such as @sou_airport_status, that’s fine… Just don’t add noise to the signal if all i want is signal…
Something that would be quite handy would be an autoresponder. The Southampton website already has live flight arrival/departure info, and a form that lets you enter either a flight number or a departure/destination airport for arrivals and departures respectively.
The “accessible” page provides a simpler view of the information, though the URL is not as friendly as it might be…:
http://www.southamptonairport.com/portal/controller/dispatcher.jsp?ChPath=Southampton^General^Flight%20information^Live%20flight%20departures
The search URL is even more hostile:
http://www.southamptonairport.com/portal/site/southampton/template.PAGE/menuitem.eae22a7fd8fc683c63f0ec109328c1a0/?javax.portlet.begCacheTok=token&javax.portlet.endCacheTok=token&javax.portlet.tpst=f8a931aeea8d03f4b03f78109328c1a0&javax.portlet.prp_f8a931aeea8d03f4b03f78109328c1a0_flightRoute=leeds&javax.portlet.prp_f8a931aeea8d03f4b03f78109328c1a0_flightNumber=&javax.portlet.prp_f8a931aeea8d03f4b03f78109328c1a0_flightTerminal=
which looks like it requires the presence of tokens or dynamically created data.
So what might an autoresponder look like? Even if all we do is redirect the flight status information, we might imagine an exchange like the following:
@sou_airport_flightStatus BE173
would return something like:
@example BE173 dep 15.40 Tues 21/10 LEEDS B/FORD SCHEDULED
or for BE3646:
@example BE3646 BERGERAC sch_dep 14:05 arr ESTIMATED 19:03
Alternatively, the airport could just re-factor its (paid for) SMS service (which I think is operated by BAA, as it seems to be available for several other UK airports):

Flying Messenger – check flights by SMS
Need flight updates on the move? Flying Messenger lets you check flights at Southampton Airport by mobile phone text message (SMS).How to use it
Text sou (for Southampton) plus your flight number to 82222.
So for example, if your flight number is BE1846, your text should read:
sou be1846
You’ll receive a reply giving the current status of your flight.When to use it
This service is available up to 12 hours ahead of the flight’s scheduled time, and up to four hours afterwards. Requests must be sent on the scheduled date.
If you need to set up an alert in advance, please see Flying Messenger PLUS.What it costs
Each Flying Messenger request costs 25p plus your network’s SMS message rate. If you’re using a pre-pay phone, you need to have enough credit to cover the cost of the service.
As to the amount of distress caused to folk traveling over the last few days, and the stress that several UK airports have been under because of travelers turning up for already cancelled flights, it amazes me that BAA aren’t willing to buy a bundle of free texts and offer a free SMS autoresponder information service…
Google Translate Equilibrium Finder and Google Books Ngrams
A few days ago, in the post Translate to Google Statistical (“Google Standard”?!) English?, Iwondered whether there were any apps that looked for convergence of phrases going from one language, to another, and back again until a limit was reached. A comment from Erik at digitalmethods.net posted a link to Translation Party, a single web page app that looks for limit cycles between English and Japanese (as a default).
Having a look at the source, it seems there’s a switch to let you search for limits between English and other languages too, as the following screenshot shows:
(Though I have to admit I don’t fully understand why the phrase in the above example appears to map to two different French translations?!)
Here’s another – timely – example, showing the dangers of this iterative approach to translation…
The switch is the URL argument lang=LANGUAGE_CODE, so for example, the French translation can be cued using http://www.translationparty.com/?lang=fr.
Another fun toy for the holiday break is the Google Books Ngrams trends viewer, that plots the occurrence of searched for phrases across a sample of books scanned as part of the Google Books project.
Here’s another one:
This is reminiscent of other trendspotting tools such as Google Trends (time series trends in Google search), or Trendistic ((time series trends in Twitter), which long-time readers may recall I’ve posted about before. (See also: Trendspotting, the webrhythms hashtag archive.)
On flickr, delicious and Yahoo Pipes…
According to Slideshare, it was four years ago that I ran a series of social bookmarking workshops in the OU:
At the time, I was a fan of delicious (still am), because it did what it did and it did it well enough. As part of the workshop, I tried to encourage folk to use delicious, but I also ran an “OU unofficial” version of Scuttle for folk to use if they preferred using a locally hosted social bookmarking app. (A few did, at first, but the folk who got value from social bookmarking tended to then move on to delicious, so I shut the open.ac.uk hosted version of Scuttle down.)
With the future of delicious uncertain, I wonder whether Scuttle has continued in development, and whether it’s worth setting up again?
As to the continuity of flickr – I guess I need to have a think about what to do if flickr goes down. As a paid up premium user, I have thousands of images on flickr, many of them screenshots which are served to this blog. If flickr were to die, I’d need to get the images moved elsewhere, and the links updated in this blog. I’m not sure how to do this? Anyone got any good ideas given this is a WordPress hosted blog (and the fact I don’t want to have to pay for image storage on WordPress – unless Automattic buy flickr???)
And then there’s Yahoo Pipes. As far as I know, it hasn’t been mentioned in any of the recent reports around Yahoo’s portfolio reorganisation, but who knows how safe it is? I’ve posted before wondering about what happens if yahoo pipes dies?, and thanks to Greg Gaughan there’s now an exporter and partial runner for pipes using Pipe2Py and the Google Apps Pipes Engine. All that’s needed now is for someone to come up with a UI that generates the Pipes JSON export format… There are a few possible candidates out there, but nothing that hits the sweet spot yet, so if you fancy having a go, let me know (I probably won’t be able to help with the code, but I can try out the UI and help test any outputs within Pipe2Py…)
Translate to Google Statistical (“Google Standard”?!) English?
Over the last three or four weeks, I’ve been finding myself on all manner of foreign language (i.e. non-English) web pages, and increasingly accepting Google Chrome’s offer to translate the page to English when it recognises the page isn’t in English…
It’s still a bit ropey (as a close inspection of the above might suggest (‘select your drive‘???!) but as the algorithm used is powered by a Google training algorithm, the quality is likely to improve as the Goog indexes more and better translations of documents:
Anyway – a couple of things came to mind:
- translations aren’t into native speaker English, or German, or French, they’re into Google Statistical English, Google Statistical French etc etc
- I hope that the Goog doesn’t treat it’s own translations as training documents (though it could end up with some intriguing mistranslations…)
- Mandelbrot comes to mind, and the question whether anyone has done a limit cycle translator that takes a foreign language document, translates it into English, back to French, back to English and so on unti the English translation is stable? If the translation at each (English) step was fed into a wiki, could the wiki history be used to compare versions of the document and ‘colour’ different parts of it depending on how quickly those areas of the document converge to a stable translation? Does convergence happen at a different rate if you translate through different routes that appear to be more stable (for example, Austrian-German-English rather than Austrian English?!)

- Google has started doing “reading levels” as an advanced search switch, so will we start seeing “translate this “advanced” English page into “basic” English? Or maybe Google will offer the ability to translate all pages, including those originally written in English, into Google Standard English?! The Babelfish browser – *every* page gets auto-translated to Google Standard Foo (where the language “foo” is auto-detected from the search terms you use and the content Google knows you’ve created. If you thinking Amazon’s wonky recommendations after a present shopping spree can be a little bit irritating, just imagine what would happen if for some reason Google started translating every page that appeared in your browser into Google Standard Teen Edition language?!;-)
PS See also this Twitter auto-translation pipe. (Hint: to translate tweets from a non-English speaking @example, use from:example as the search term.
PPS Note to self – keep an eye on translate.google.com to see when the English to English translation ceases to be a direct copy…
e.g. in this example:
I introduced some spelling mistakes… will Google Translate start down the path to Google Standard (or Google Reading Level X) translations by introducing a spellchecker and grammar checker?!
Open Data Sceptic(?!)
Answers appreciated in the comments below…;-)
PS a similar question comes to mind with OERs…;-)
PPS In a rare blog post(?!), @ambrouk reminds me of a recent post by Tom Steinberg about Open Data: How Not To Cock It Up. I must have been having a bad day, yesterday, and stand corrected… (or maybe I was wound up by one too many other tweets or blog posts about yet another open data launch making all sorts of vacuous promises and calls to the public for action around this data set that would obviously benefit them…. err…?!;-)
The Problem With Linked Data is That Things Don’t Quite Link Up
A v. quick post this one, because I have other stuff that really needs to be done, but it’s something I want to record as another couple of observations around the practical difficulties of engaging with Linked Data…
Firstly, identifiers for things most of us would probably call councils. The Guardian Datablog has just published data/details of the local council cuts. The associated Datastore Spreadsheet has a column containing council identifiers, as well as the council names:
Adding formal identifiers such as these is something I keep hassling Simon Rogers and the @datastore team about, so it’s great to see the use of a presumably standardised identifier there:-) Only – I can’t see how to join it up to any of the other myriad identifiers that seem to exist for council areas?
So for example, looking up Trafford on the National Statistics Linked Data endpoint identifies it as local-authority-district/00BU and Local education authority 358 – I can’t find R342 anywhere? Nor does R342 appear as an identifier on the OpenlyLocal page for Trafford Council, which is another default place I go to look at for bridging/linking information (but then, maybe a local authority is not a council?)
(A use case for the data might be taking the codes and using them to colour areas on an Ordnance Survey OpenSpace map (ans. 1.17)… This requires a bridge into the namespaces the OS mapping tools recognise.)
I can google “Trafford R342″ and find a couple of other references to this association, but I can’t find a way of linking to entities I know about in the Linked Data world?
But then, maybe the R*** areas don’t match any of the administrative areas that are recorded in any of the other data soruces I found…?
So I have an identifier, but I don’t know what it actually refers to/links to, and I donlt know how to make use of it?
And then there’s a second related problem – a mismatch between popular understanding of a term/concept, and it’s formal use in a defined ontology, which can cause all sorts of problems when naively trying to make use of formally defined data…
Take for example, the case of counties. Following a brief Twitter exchange this morning with the ever helpful @gothwin, it turns out that if you live in somewere like Southampton (or another unitary authority or metropolitan district), you don’t live in a county… (for example – compare the Ordnance Survey pages for postcode areas SO16 4GU and EX1 1HD). The notion of counties is apparently just a folk convention now, although the Association of British Counties is trying to “promote awareness of the continuing importance of the 86 historic (or traditional) Counties of Great Britain… contend[ing] that Britain needs a fixed popular geography, one divorced from the ever changing names and areas of local government but, instead, one rooted in history, public understanding and commonly held notions of cultural identity.” Which is why they “seek to fully re-establish the use of the Counties as the standard popular geographical reference frame of Britain and to further encourage their use as a basis for social, sporting and cultural activities”. (@gothwin did hint that OS might be “look[ing] at publishing a ‘people’s geography’ with traditional counties”.
As it is, for a naive developer, (or random tinkerer, such as myself), struggling to get to grips with the mechanics of Linked Data, it seems that to make any use at all of government Linked Data, you also need a pretty good grasp of the data models before you randomly try hacking together queries or linking stuff together, as the nightmare exposure I had to COINS Linked Data suggests… ;-)
In other words, there are at least two major barriers to entry to using government Linked Data: on the one hand, there’s getting comfortable enough with things like SPARQL to be able to navigate Linked Data datasets and put together sensible queries (the technical problem); on the other hand, there’s understanding the data model and the things it models well enough to articulate even natural language questions that might be asked of a dataset (a domain expertise problem). (And as we try to link across datasets, the domain expertise problem just compounds?) Then all that remains is mapping the natural language query onto the formal query, given the definitions of ontologies being used…
(I know, I know – it’s always rash to query data you don’t understand… but I think a point I’m trying to make is that getting your head round Linked Data is made doubly difficult when things don’t work not because of the way you’ve written the query, but because you don’t understand the way the data has been modeled… (which ends up meaning it is a problem with the way you wrote the query, just not the way you thought…!))












