OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Archive for September 2010

Structural Differences in Hashtag Communities: Highly Interconnected or Not?

with one comment

In several recent posts, I’ve shown a variety of network diagrams based on who’s following whom in various twitter hashtag community networks. In this post, I thought I show a couple more, demonstrating the power of the visual approach for getting a quick feel for the structure of a particular community.

First up, here’s the inner friends graph for the #cam23 hashtag, which is used predominantly by a handful of Cambridge Unibiversity librarians who opted in to their local 23 things programme:

A highly interconnected hashtag commnunity network - cam23

(Node size is proportional to the number of incoming friends links; colour is proportional to the number of outgoing links.)

So what do we see? Pretty much everyone in this network is following a large number of other folk in the network, and is being followed by a large number. The network is highly interconnected. Messages don’t necessarily need tagging in order to ensure that the message gets distributed across the network because most folk are connected most other folk.

(It’s never that simple of course. The likelihood of someone seeing a message from a particular person in their network is a function of, amongst other things, the number of people they follow, the frequency at which those people post, and so on.)

Now let’s look at a hashtag around a different sort of event – the Isle of Wight #Bestival. Here’s a sample from that hashtag community:

Bestival hashtag community - not so much a twitter community

In this case, we see lots of small blue dots, disconnected from other folk in the network. A couple of nodes are well connected, such as @ventnorblog, the Isle of Wight’s hyperlocal news site. Generally, if we wanted to broadcast a message to the #bestival hashtag community, the only way we could hope to would be by tagging a message appropriately and hoping they had a search running on that tag.

If we run the Gephi connected components tool, we can group nodes that are comnected to each other. In the image below, the large blue circle is the “collapsed” network centred around ventnorblog and redfunnel (the table of the left shows that the majority of twitterers sampled fall into this group). Another, smaller network, shown in exploded form, has also been identified:

bestival hashtag community - connected components

Now let’s go back to the cam23 community, and consider the sociability of everyone in the community. In the following image, node size is prooportional to the total number of friends, and colour is proportional to the total number of followers:

Cam23 sociability - node size is tot no. of friends, colour tot followers

So red nodes have a large number of followers/wide broadcast reach outside the hashtag community, and large nodes show that the node has a wide hinterland, and receives messages from a large number of folk outside the hashtag community.

If we now plot “1-total_friends” as the node size and in-degree/incoming links as the colour (incoming links are links from followers), we can get an indication of the extent to which the tweets an individual sees are dominated by tweets from the hashtag community (size) – that is, large size means the person’s network is dominated by folk in the hashtag network – and the extent to which the a person’s tweets reach out into the hashtag network (colour; more red means that person’s tweets are seen by more of the hashtag community).

cam23 - node size is 1- total friends; colour is in degree

Small red nodes mean that a person has wide reach into the hashtag community, but that they follow a lot of other people, so hashtagged tweets may be drowned out. Large red nodes show that a person’s friend network is dominated by the hashtag community and that they are widely followed within it.

(Note that originally there were a couple of nodes that looked like “rogue” nodes, so I sized all nodes with zero incoming links to zero size; alternatively I could have filtered the graph to only show nodes with at least one incoming link.)

PS in addition to explicitly opt-in communities, such as hashtag networks, it strikes me that we could also start considering the structure of incidental/passive inclusion topic networks by searching for folk who are using particular key terms, rather than searching over hashtags?)

Written by Tony Hirst

September 13, 2010 at 11:36 am

Posted in Visualisation

Tagged with

So What Do Simple Hashtag Community Visualisations Tell Us?

with 9 comments

A really quick post this one to add to exploit a couple of select hashtags and get people thinking about whether this approach is useful for anything…

The two hashtags are #alpsp and #jisclms, both things to do with academic libraries, publishing. I’ve plotted a couple of graphs (using Gephi) for each.

Firstly, the inner structure of the hashtag community, showing interconnectedness, node size proportional to the number of hshtaggers follower an individual, colour/hat proportional to the number of hashtaggers the person is following. In this case, large red means the individual follows and is followed by a large number of the other hashtaggers. Small red means the individual is following lots of the hashtaggers but not following many of them, small blue means the person has little connectedness with any of the hashtaggers, and large blue means lots of hashtaggers are following the individual but not many are following back.

And secondly, the twitterati graph (via @scottbw;-) where node size is proportional to the total number of followers and heat the total number of friends. In this case large red means lots of friends and followers overall, large blue is lots of followers but few friends, small red is lots of friends and few followers.

So here are the graphs for #jisclms; firstly the inner hashtag community graph:

Inner structure of the jisclms community

The twitterati graph:
Jisclms twitterati

And here are the graphs for #alpsp – again, hashtag community first:

A select group - the interconnectedness of the alpsp hashtag community

And then the twitterati graph:

ALPSP twitterati - size is total followers, heat is total friends
I have to go out now, so maybe folk would like to post a comment or two about what these graphs tell us, and I’ll feed on that in a postscript in a couple of days…

For starters, what sort of interaction do the publishers seem to have with the rest of the #alpsp community?!;-) Can folk be well connected in a hashtag community and insignificant in the twitterati stakes (and if so, what might that mean? The person has just started on twitter and they’re starting within that community?) And so on…

Written by Tony Hirst

September 10, 2010 at 10:09 am

Posted in Tinkering, Visualisation

Tagged with ,

Additional Thoughts on Generating a Persistent Context from an Event Tag

with 3 comments

In Deriving a Persistent EdTech Context from the ALTC2010 Twitter Backchannel (aka ‘From community folksonomy to epistemology in a few clicks: Possibly the most useful post (ever?)’, via @georgeroberts;-), I showed how we could mine the tweets surrounding an archived hashtag in order to generate a topic based context that would persist after the event had been long gone.

So what else might we do? Here are a couple of quick thoughts…

Firstly, some folk were tweeting links using the hashtag, so we can scrape these from the twapperkeeper archive and maybe use them to feed a facet of the search engine (e.g. relating to sites/links tweeted during the event). In this case, that part of the search engine would correspond to a fragmentary memory of links deemed important at the time of the original event.

Here are a couple of fragments that could form the basis of the generating script. Firstly, a link stripper to extract links from a tweet. Something like this should work:

string="@sdsd http://sds.sd/sd?&dsd http://sds.sd/?r+dsd"
print re.findall(r'(?:http://|www.)[^"\s]+',string)

Secondly, we need to post full links rather than shortened links to the search engine. I noticed @AJCann was using a bit.ly URL in his twitter profile, and also a lot of tweeted links are shortened using bit.ly; so for those at least we can expand the links via the bit.ly API:

import simplejson,urllib,re

bu='psychemedia'
bkey=''

urls=[]
urls.append('http://bit.ly/AJCann')

#bit.ly api call can take up to 15 &shorturl=URL pairs
for i in urls:
  url='http://api.bit.ly/v3/expand?shortUrl='+urllib.quote(i)+'&login='+bu+'&apiKey='+bkey+'&format=json'
  print 'url: '+url
  r=simplejson.load(urllib.urlopen(url))
  for j in r['data']['expand']:
    print 'long '+j['long_url']

(Anyone know of a service that can expand links from the most popular shortening services via a single API, rather than, say, having to call each shortened URL to see where it actually points to?)

In passing, we can probably look to other services, such as delicious, to see who has been bookmarking URLs with the particular tag, and maybe even use these links in a custom search engine (though that may go against Delicious’ terms and conditions.) Similarly for Slideshare.

Just considering delicious again, we could also look to that service to see who bookmarked the ALTC2010 homepage – they may be folk we want to add into our context- but I don’t think delicious profiles include a personal homepage URL? What we can do, though, is look to see what tags folk were using to tag the ALTC2010 homepage (and maybe other links tweeted during the conference) to identify folksonomic keywords to associate with the context? We can pull the tag data for a link in from the delicious API, I think? Here’s an example from years ago (More Hyperbolic Tree Visualisations – delicious URL History: Users by Tag) about the sort of thing we can extract – tags and users around a URL as bookmarked on a delicious:

Tags used to describe the altc2010 homepage on delicious

The outer leaves show the users who used the particular tag:

Altc2010 homepage tags by user on delicious

By the by, we can also look at users who bookmarked a link, and the tags they used, via another script (delicious URL History – Hyperbolic Tree Visualisation):

ALTC2010 homepage on delicious - tags by user

Going back to the hashtaggers’ Twitter IDs, we can use the Google social graph API to find “aliases” of twitterati on other services (see for example Time to Get Scared, People? to see what I can find out from just the twitter name “ajcann”). One useful feature of this API is the ability to discover the URLs of several blogs etc that may be associated with an individual, rather than just the single link we get from a person;s Twitter profile.

Finally, George Roberts pointed out that there were “splitters” from the the main hashtag who were using alternative forms. If those tag was being archived too, it would be easy enough to merge the two archives client side before creating the context. In order to discover those tags, it might be possible to use the delicious tags as a crib, and do a data limited search on the tags on Twapperkeeper just to see if any likely alternative tags turned up. To check the guessed at tags were relevant, we might do a quick social analysis of the friends/followers of folk using those tags to see if they have a significant overlap with folk using the “authorised” hashtag; if they do, we might reasonably assume the tag being used is an alternative one.

Written by Tony Hirst

September 9, 2010 at 8:12 am

Posted in Thinkses

Tagged with

Deriving a Persistent EdTech Context from the ALTC2010 Twitter Backchannel

with 14 comments

So you’ve been to an event where everyone was tweeting, and now what? That stuff’s all in the past, right? Wrong…

Earlier today, I published a short post describing how it was possible to do all sorts of wonderful things around a twitter hashtag community (well I think they’re wonderful – or some of them, at least…). In this post, I’ll give a couple of illustrations using the #altc2010 hashtag from this year’s ALTC conference.

First up, what does the inner structure of the hashtag community look like? That is, of the Twitter folk using the twitter hashtag (in fact, folk who’ve used the hashtag more than three times over the last couple of days), who follows whom? In the following graph, nodes are individual twitterers, edges go from a person to a person they follow, node size and label size is proportional to the number of hashtaggers following the named person (that is, the in degree of the node) and colour is proportional to the number of hashtaggers an individual is following (out degree; red is “hot”/high).

ALTC-2010 hashtag community

This graph was produced using Gephi, which can also run stats over the graph. So for example, if we size the nodes according to betweenness, we can see which twitterers in the community are likely to be most effective at getting a message out across that community.

ALC2010 BEtweenness centrality

Note that the ALT user is way and above the node with the highest betweenness score – the sizes of the other nodes are amplified just so we can see them…

If we grab the total number of followers and friends of each user (that is, including folk who have not used the hashtag and are not part of the hashtag community) and use that to set the size (number of followers) and colour (number of friends) of each user, we can see which twitterers are most likely to amplify the event outside of the community.

ALTC2010 total frinds/followers

Okay, so what else can we do?

One thing is create a twitter list containing the folk who’ve been using the ALTC2010 hashtag; you can find it here: ALTC2010 List

ALTC2010 hashtaggers list

We can also feed the address of this list into a Yahoo pipe (described here) that will search through recent tweets visible through the list for hashtags. In this way we can use the folk who were twittering around ALTC2010 to act as an early warning beacon for other hashtags or hashtagged events in the educational technology area.

ALTC2010 hashtag community - what else is hot?

Something else we can do via the twitter list is grab everyone’s personal homepage URL, as declared on their twitter profile, and use these URLs to seed an ALTC2010 custom search engine.

ALTC2010 hashtaggers search engine

That is, a search engine over a good proportion of the personal pages of HE related UK educational technologists, as they declared themselves over Twitter circa September 2010.

[UPDATE: and here's an example of why the community defined custom search engine might be interesting... via @eingang: Ouch! David White And The Dragon Slaying]

So, there we have it. The scripts are in place, so generating the screenshots, and writing this post, took waaaaaaaaaay longer than mining the twapperkeeper archive, setting up the lists, generating the graph files (though I still had to load them into gephi, lay them out and render them “by hand” i.e. by clicking a couple of buttons…) and seeding the custom search engine (which also had to be initially set up by hand).

But why bother? Well, my developing idea is that we can mine events to define (automatically) a context around a particular subject area or domain (for example, a set of people interested in an expert in the area), and then draw on this context for search and discovery at a later date (e.g. through monitoring their twitter feeds via an auto-generated list to see what they – as a group of independent individuals – are talking about severally together, or by searching over just their personal webpages).

PS odds on some f*****r has patented this approach; if they have, this was all my own work, and it was bleedin’ obvious, so s***w you, m**********r… sue me.

PPS idly mulling over what else I could do with the custom search engine, I seem to remember that it’s possible to tweak the ranking factors of results returned from particular sites in the CSE definition file… which means we could take things like the number of twitter followers, or the betweenness centrality of everyone within the hashtag community, and use this as a ranking factor? That is, we might use the twitter “reputation” of an individual, either in general terms (overall number of followers, say), or within a community (e.g. betweenness centrality) to boost or reduce the tanking of results returned from their pages within the custom search engine. And if anyone else out there thinks they have a patent on that idea, they can f**k right off too, cos I haven’t got the idea from you, either…

PPPS for a few immediate thoughts about where next with all of this, see Additional Thoughts on Generating a Persistent Context from an Event Tag

Written by Tony Hirst

September 8, 2010 at 8:19 pm

Posted in Tinkering, Visualisation

Tagged with ,

Discovering Context: Event Focusing

with one comment

For ever, it seems as if we have had a problem of “information overload”. Way back when, in their dusty cells, scholarly monks would spend their days writing digests of of books that had gone before, because there were so many books (?!) very few people would be able to read them all, and know what insights they contained. And then, it seems, Shirky came along, amplified by Jarvis, Weinberger and other net culture thinkers and commentators, popularising the notion of “filter failure”.

We’ve also been hearing a lot lately about event amplification

Almost three years ago now (three years, do you remember what the web was like, and what hadn’t been invented yet, three years ago?!), I presented what felt like an old idea to me, even at the time, at ILI2007 on the topic of “Search Hubs and Custom Search”. The idea was that there are lots of places where we context already exists that can used to mine links that might serve as custom search engines. That there were contexts that by their very nature brought together people and content relating to a particular topic, or domain. At that time, I demoed a Google Custom Search engine that searched over third party content linked to from an OU OpenLearn course, as well as something I’d been dabbling with for over a year even at that time: searchfeedr, a search engine that searched over domains listed in the links of an RSS feed pulled in from wherever (I think it still works? http://searchfeedr.com/), and which itself had origins in a hack I’d called deliSearch, that would search over sites tagged in a particular way on delicious.

Yesterday, whilst reading a post on GigaOm (The Web of Intent is Coming (Sooner Than You Think)), yet another post on how “[t]here’s an emerging opportunity for content publishers (and the publishing technologies they rely upon) to dramatically improve how they filter the stream for the consumers they serve”, it struck me that I’ve never really moved on from thinking about where we might find “discovered search engines” (similar to the sense of “found objects”; I always did like Duchamp…;-).

So for example, over the weekend, I made some glue, pulling together a few scripts I had around what I’ve been calling “hashtag communities”, those groups of people who use a particular hashtag on Twitter, often around an event, and putting together a few new scripts (never more than a few lines of code each).

The scripts were, variously:

- a script for grabbing a hashtag archive via the Twapperkeeper API, and pulling out all the people who had sent more than a certain number of tweets using the particular hashtag;
- a script for taking a list of Twitter user IDs, and grabbing the lists of their friends of followers from the Twitter API;
- a script for identifying the friends and followers of an individual who had used a particular hashtag, for use in the creation of a hashtag community graphs, showing the links between folk using a particular hashtag;
- a script for creating a Twitter list containing the folk who had used a particular hashtag from a list of Twitter user names (e.g. as grabbed from Twapperkeeper);
- a script for grabbing the details of members on a Twitter list, and grabbing the number of their friends and followers to add further depth to a hashtag community graph;
- a variant of the Twitter list member detail grabbing script, that pulled out the homepage URLs used on Twitter profiles and generated a Google custom search definition file (so you can easily search over the websites of folk listed in a Twitter list).

So now, given a Twitter hashtag, assuming it’s been archived on Twapperkeeper, I can easily generate network graphs showing the interconnections of folk on Twitter using a particular hashtag, create Twitter lists based on hashtag users, create a custom search engine around the folk who used a particular hashtag.

And what I haven’t done, but could quite easily do, are things like:
- generate a custom search engine around the links tweeted in the context of a hashtag (cf. DeliSearch, where searches were created around links tagged in a particular way in delicious);
- monitor the Twitter list of hashtag users and pull out hashtags they use in the future.

If we look at how a particular tag is used more widely, e.g. on Slideshare, or delicious, maybe as part of an event amplification strategy, then it’s easy enough to see how we might start to concentrate all these resources: the new Lanyrd service looks like it might start to do something of this, and to a certain extent, Cloudworks also does so. It’s easy enough to hack something similar together too – simply generate a set of URLs around a tag for the RSS feeds from services like delicious, Slideshare, flickr, and so on, generate an OPML file, and for half a dozen lines of code or so you’ve built yourself a source file for something like Netvibes that can provide you with a readymade event monitoring dashboard.

So the dashboard, then.might be seen as some sort of concentrator of activity around the event. But whereas the dashboard may provide a snapshot of an event, and may be useful in an archival context. the tools I’m interested in are ones where we can mine events to provide a context that can be used in the future: so for example, using this week’s ALTC2010 as an example, I can trivially generate a search engine around UK educational technologists using the recipe described above (If they participate in the use of the appropriate hashtag on Twitter), trivially create a list of ed tech Twitterers that I can monitor for future related events (by extracting hashtags currently in use by that community) and so on.

Whether this is another form of amplification, I’m not sure? I see it more as a way of using an event to focus on, or define, a context that may continue to be useful even when the original event has long been forgotten…

[PS I was to going to pepper this post with links, but it'll take too long to add them all; if you pick out likely phrases and search for them on the OUseful search engine, they should (?) turn related blog posts up.]

PPS for a “worked example” of some of the above, see Deriving a Persistent EdTech Context from the ALTC2010 Twitter Backchannel

Written by Tony Hirst

September 8, 2010 at 9:22 am

Posted in Search, Thinkses

A Quick Visualisation of Pingbacked Posts in OUseful.info Using Gephi

with 2 comments

In a couple of recent posts, I’ve shown how it’s possible to extract and visualise the internal link structure (the “autopingback graph”) of a WordPress blog. It’s also easy enough to extract linkage information from a WordPress export file that shows who’s been linking to which posts…

Building on the code I posted in The Structure of OUseful.Info, we can extract the external links data as follows (the else part):

if commentInfo['type']=='pingback' and commentInfo['url'].find('http://blog.ouseful.info')!=-1:
	cID=commentInfo['url'].rstrip('/')
	cID=cID.rpartition('/')
	rID=post["link"].rstrip('/')
	rID=rID.rpartition('/')
	f.write('"'+cID[2]+'"->"'+rID[2]+'"\n')
	f2.write('"'+cID[2]+'","'+rID[2]+'"\n')
	#post['comments'].append(comments)
else:
	xID=commentInfo['url'].lstrip('http://')
	xID=xID.partition('/')
	rID=post["link"].rstrip('/')
	rID=rID.rpartition('/')
	f3.write('"'+xID[0]+'","'+rID[2]+'"\n')

To simplify matters, I only record the domain that generated the incoming link, rather than the URL of the particular blog post making the link, for example. Edges go from the linking domain to individual blog posts on OUseful..info.

We can then visualise the graph in Gephi. So here, for example, I have sized the nodes according to the number of inlinks (i.e. in degree of each node); that is, in proportion to the number of times posts on OUseful info have been linked to from external sites:

OUseful.info - most linked to posts

Alternatively, we can size the nodes according to out degree to see which domains link most frequently to OUSeful info posts:

WHo's linking to OUseful.info?

If we apply an ego filter to a particular domain, we can see which posts it has linked to:

Ego network filter - which posts has a domain linked to?

We can then increase the depth of the filter to see which other domains have linked to the posts that a particular domain has linked to:

Ouseful info - pingback depth 2

When I first started using Gephi, I don’t think I saw it as a tool for exploring the information environment around a blog, but I do now:-)

PS Hmmm…. I wonder if I should have a go at writing a Gephi plugin to consume WordPress export files and visualise the pingback structures contained within them…?

Written by Tony Hirst

September 2, 2010 at 9:48 am

Posted in Tinkering, Visualisation

Tagged with

“Top Level” URL Conventions in Local Council Open Data Websites

with 3 comments

A few days ago, I had reason to start pondering URI schemes for open data released by educational institutions. The OU, like a couple of other HEIs, is looking at structuring – and opening up – various sorts of data, and there are also mutterings around what a data.ac.uk styled site might have to offer.

Being a lazy sort, it seems to me that in figuring out how we might collate data from across the ac.uk environment, we could look to the gov.uk environment. So for example, data.gov.uk as a central index over data from both central and local government, which each have their own concerns, and within a type, are likely to share some common features: all local councils will have some of the same sort of data to share, government departments might share some requirements for consistent, centralised reporting (such as website costs and usage) as well their own peculiar data releases, and so on. In the ac.uk context, we have the HEIs (and FE colleges) in one set, research councils and other project related funding bodies in another.

If we look to local council data, we can also spot intermediate layers appearing that apply a canonical structure to a range of variously published data from the local councils. For example, Openly Local is making a play to act as the canonical source for a whole range of local council data across all the UK’s councils; the Police API “allows you to retrieve information about neighbourhood areas in all 43 English & Welsh police forces”, RateMyPlace is a “one stop shop for information on Food Safety Inspections in Staffordshire”, aggregating information from several councils and representing it via a single API, and so on. (For an example of how different councils can publish ostensibly the same data in a wie variety of formats, see Library Location Data on data.gov.uk).

Looking at the list of local councils with open data sites as collected on the OpenlyLocal open data scoreboard (and as extracted from theOpenlyLocal API via a Yahoo Pipe), are any conventions appearing to emerge in the location of local council open data homepages?

- http://www.aberdeencity.gov.uk/open_data/open_data_home.asp (Aberdeen City Council)
- http://www.bournemouth.gov.uk/Data/ (Bournemouth Borough Council)
- http://www.bristol.gov.uk/opendata (Bristol City Council)
- http://www.darlington.gov.uk/Generic/Info/opendata.htm (Darlington Borough Council)
- http://www.eaststaffsbc.gov.uk/opendata/Pages/default.aspx (East Staffordshire Borough Council)
- http://eastsussex.gov.uk/about/standards/opendata.htm (East Sussex County Council)
- http://www.eden.gov.uk/about-this-site/open-data/ (Eden District Council)
- http://data.london.gov.uk/ (Greater London Authority)
- http://picandmix.org.uk/ (Kent County Council)
- http://www2.lichfielddc.gov.uk/data/ (Lichfield District Council)
- http://data.lincoln.gov.uk/ (Lincoln City Council)
- http://www.brent.gov.uk/xml (London Borough of Brent)
- http://www.hillingdon.gov.uk/data (London Borough of Hillingdon)
- http://www.sutton.gov.uk/index.aspx?articleid=10077 (London Borough of Sutton)
- http://www.rbwm.gov.uk/web/transparency.htm (Royal Borough of Windsor and Maidenhead)
- http://www.salford.gov.uk/opendata.htm (Salford City Council)
- http://www.stratford.gov.uk/opendata (Stratford-on-Avon)
- http://www.sunderland.gov.uk/localpublicdata (Sunderland City Council)
- http://www.trafford.gov.uk/opendata/ (Trafford Council)
- http://opendata.walsall.org.uk/ (Walsall Metropolitan Borough Council)
- http://opendata.warwickshire.gov.uk/ (Warwickshire County Council)
- http://www.westberks.gov.uk/index.aspx?articleid=20365 (West Berkshire Council)

With only a small number of councils fully engaged, as yet, with open data, no dominant top level naming scheme has yet appeared, although there are a couple of early runners:

  • /opendata [3] (e.g. http://www.stratford.gov.uk/opendata)
  • /data [2] (e.g. http://www.hillingdon.gov.uk/data)
  • data. [2] (e.g. http://data.london.gov.uk/)
  • opendata. [2] (e.g. http://opendata.walsall.org.uk/)

As yet, there is no agreement on the following naming approaches:

  • /opendata/Pages [1] (e.g. http://www.eaststaffsbc.gov.uk/opendata/Pages/default.aspx)
  • /Data (e.g. http://www.bournemouth.gov.uk/Data/)
  • /localpublicdata (e.g. http://www.sunderland.gov.uk/localpublicdata)
  • /xml (e.g. http://www.brent.gov.uk/xml)
  • open_data/ (e.g. http://www.aberdeencity.gov.uk/open_data/)

Several other councils appear to be offering a specific page to handle (at the moment) open data issues (e.g. http://www.salford.gov.uk/opendata.htm or http://www.westberks.gov.uk/index.aspx?articleid=20365), or even separate domains for their data site (e.g. http://picandmix.org.uk/)

Does any of this matter? At the top level, I’m not sure it does, except in setting expectations and providing a sound footing for a scaleable URI scheme. The Cabinet Office Guidance on designing URI sets, which outlines many considerations that need to be taken into account when defining URI schemes particularly for use as identifiers in RDF inspired Linked Data, suggests that domains should “[e]xpect to be maintained in perpetuity” and that “the choice of domain should provide the confidence to the consumer, …, the domain itself … convey[ing] an assurance of quality and longevity.”

In the foreseeable future, I suspect that (pragmatically) it is likely that the majority of data that will be released in the short term will be published as Excel spreadsheets or inforamlly formatted CSV/TSV data, with some sites publishing raw XML. (As Library Location Data on data.gov.uk describes, even when councils ostensibly release the same sort of data, there is no guarantee that they will do it in similar ways: of the 5 councils publishing the locations of local libraries, 5 different data formats were used… ) It is unlikely that councils will be early adopters of Linked Data across the board. (If they were, it might be seen as excluding users in the short term, because while many people are familiar with working with spreadsheets (a widely adopted “end user” technology for people who work with data in their day job), familiar routes in to and out of Linked Data stores are not there yet…) That said, if local councils do end up wanting to publish data with well formed URIs into the Linked Data space, it would be handy if their current URI scheme was designed with that in mind, and in such a way that the minting of future Linked Data URIs isn’t likely to conflict or clash.

Written by Tony Hirst

September 1, 2010 at 12:44 pm

Posted in Policy

Tagged with ,

Follow

Get every new post delivered to your Inbox.

Join 126 other followers