Data DOIs

Okay, here’s another Friday twitter brainstorm capture post, this time arising from my responses to @jimdowning who made a query about in response to a tweet I made about an interesting looking “DOIs for data” proposal…

Here’s what I pondered:

– why might it be useful? Err, “eg allows to resolve either to Guardian data blog data on google docs or national stats copy of a data set?” That is, several of the data sets that have been republished by the Guardian on google docs duplicate (I think) data from National Statistics. A data DOI service could resolve to either of these data sets depending on a user’s preferences…

Hmmm… ;-)

But I can also imagine derived data DOIs that extend eg journal paper DOIs in a standardised way, and then point to data that relates to the original journal article. So for example, an article DOI such as doi:nnn-nnn.n might be used to generate a data DOI that extends the original DOI, such as doi:nnn-nnn.n-data; or we might imagine a parallel data DOI resolution service that reuses the same DOI: data-doi:nnn-nnn.n.

Where multiple data sets are associated with an article, it might be pragmatic to add a suffix to the doi, such as data-doi:nnn-nnn.n-M to represent the M’th dataset associated with the article? For only one dataset, it could be identified as data-doi:nnn-nnn.n-0 (always count from 0, right?;-), with data-doi:nnn-nnn.n (i.e. no suffix) returning the number of data sets associated with the article?

PS hmmm this reminds me of something the name of which I forget (cf. image extraction from PDFs), where assets associated with an article are unbundled from the article (images, tabulated data and so on); how are these things referenced? Are references derivable from the parent URI?

PPS Maybe related? (I really need to get round to reading this…) How Data RSS Might Workl.