Posts Tagged ‘getthedata’
Using GetTheData to Organise Your Data/API FAQs?
It’s generally taken as read that folk hate doing documentation*. This is as true of documenting data and APIs as it is of code. I’m not sure if anyone has yet done a review of “what folk want from published datasets” (JISC? It’s probably worth a quick tender call…?), but there have certainly been a few reports around what developers are perceived to expect of an API and its associated documentation and community support (e.g. UKOLN’s JISC Good APIs Management Report and API Good Practice reports, and their briefing docs on APIs).
* this is one reason why I think bloggers such as myself, Martin Hawksey and Liam Green Hughes offer a useful service: we do quick demos and geting started walkthroughs of newly launched services, demonstrating their application in a “real” context…
At a recent technical advisory group meeting in support of the Resource Discovery Taskforce UK Discovery initiative (which is aiming to improve the discoverability of information resources through the publication of appropriate metadata, and hopefully a bit of thought towards practical SEO…) I suggested that a Q and A site might be in order to support developer activities: content is likely to be relevant, pre-SEOd (blending naive language questions with technical answers), and maintained and refreshed by the community:-)
In much the same way that JISCPress arose organically from the ad hoc initiative between myself and Joss Winn that was WriteToReply, I suggested that the question and answer site with a focus on data that I set up with Rufus Pollock might provide a running start to UK Discovery Q&A site: GetTheData.
API connections to OSQA, the codebase that underpins GetTheData, are still lacking, but there are mechanisms for syndicating content from RSS feeds (for example, it’s easy enough to get a feed out of tagged questions out, or questions and answers relating to a particular search query); which is to say – we could pull in ukdiscovery tagged questions and answers in to the UK Discovery website developers’ area.
Another issue relates to whether or not developers would actually engage in the asking and answering of questions around UK Discovery technical issues. Something I’ve been mulling over is the extent to which GetTheData could actually be used to provide QandA styled support documentation for published data or data APIs, concentrating a wide range of data related Q&A content on GetTheData (and hence helping building community/activity through regularly refreshed content and a critical mass of active users) and then syndicating specific content to a publisher’s site.
So for example: if a data/api publisher wants to use GetTheData as a way of supporting their documentation/FAQ effort, we could set them up as an admin and allow them rights over the posting and moderation of questions and answers on the site. (Under the current permissions model, I think we’d have to take it on trust that they wouldn’t mess with other bits of the site in a reckless or malevolent way…;-)
API/data publishers could post FAQ style questions on GetTheData and provide canned, accepted (“official”) answers. Of course, the community could also submit additional answers to the FAQs, and if they improve on the official answer be promoted to accepted answers. Through syndication feeds, maybe using a controlled tag filtered through a question submitter filter (i.e. filtering questions by virtue of who posted them), it would be possible to get a “maintained” lists of questions out of GetTheData that could then be pulled in via an RSS feed into a third party site – such as the FAQ area of a data/api publisher’s website.
Additional activity (i.e. community sourced questions and answers) around the data/API on GetTheData could also be selectively pulled in to the official support site. (We may also be able to pull out the lists of people who are active around a particular tag???) In the medium term, it might also be possible to find a way of supporting remote question submission that could be embedded on the API/data site…
If any data/API publishers would like to explore how they might be able to use GetTheData to power FAQ areas of their developer/documentation sites, please get in touch:-)
And if anyone has comments about the extent to which GetTheData, or OSQA, either is or isn’t appropriate for discovery.ac.uk, please feel free to air them below…:-)
DataFriday on GetTheData…
I’ve hereby arbitrarily decided it’s #dataFriday on GetTheData.org, the question and answer / Q&A site for all your data needs (whether it’s finding a data set, working a data set, parsing a data set, or looking for ways of analysing or visualising a data set).
The idea behind dataFriday is that while we’re growing the community, we need an occasional sprint of Q’ing and A’ing (not least in the hope we might broker a few connections between people by means of a couple of rapid asking and answering exchanges).
So if you work with data, are struggling with data, or are publishing data that meets the needs of some who’s looking for that data, pop over to getTheData.org right now… (go on, you know you want to… and it’s Friday, right…? Too late to start that new work project, but the perfect opportunity to spend 5 minutes doing a good deed for the web and the day…;-)
First Inklings of a Small Contract Market Around Data Services? And a concern…
A few days ago I was tipped off to a “bounty” request on Scraperwiki, offering 50 quid for a scrape of the DVLA test centres. The request had been posted on the Scraperwiki, and a bounty offered (on which Scraperwiki seems to add a commission).
Scraperwiki also appears to be offering a “private scraper” service as a business model. Maybe visualisation design around a wiki will be next to be offered on the market?!
Another hint that folk may be willing to pay to get data into a useable form appeared on GetTheData in a request for information about currency data from a professional, non-coder journalist that suggested a payment may be in the offing for anyone who could help.
Given that a lot of data that is apparently out there, is readily scrapeable, but is actually subject to non-commercial, personal use only end user licences, I do wonder if there will be a black market in unlicensed data that gets laundered through a series of steps that don’t respect attribution, let alone other, more stringent license conditions.
On the other hand, I wonder whether or not GetTheData should have a facility for associating a bounty with a particular query?
And the concern? It’s to do with the ethics of scraping or aggregating large amounts of personal – albeit public – data from folk on social networks. For example, it’s easy enough to find out who’s being wished a happy birthday on Twitter, and I have more than a few tools for grabbing friends and follower lists around hashtags, search terms, Twitter lists and usernames, and so on. Once we start mining data, it may be possible to discover things about folk from the public context they inhabit that maybe reveals something about them they didn’t realise could be deduced from the context? So what should our response be if we get a request on GetTheData asking someone how to mine public social data around a named individual… It may not be phone tapping, but something about that sort of request, should it ever occur, wouldn’t feel quite right to me…?
A Few More Thoughts on GetTheData.org
As we come up to a week in on GetTheData.org, there’s already an interesting collection of questions – and answers – starting to appear on the site, along with a fledgling community (thanks for chipping in, folks:-), so how can we maintain – and hopefully grow – interest in the site?
A couple of things strike me as the most likely things to make the site attractive to folk:
- the ability to find an appropriate – and useful – answer to your question without having to ask it, for example because someone has already asked the same, or a similar, question;
- timely responses to questions once asked (which leads to a sense of community, as well as utility).
I think it’s also worth bearing in mind the context that GetTheData sits in. Many of the questions result in answers that point to data resources that are listed in other directories. (The links may go to either the data home page or its directory page on a data directory site.)
Data Recommendations
One thing I think is worth exploring is the extent to which GetTheData can both receive and offer recommendations to other websites. Within a couple of days of releasing the site, Rufus had added a recommendation widget that could recommend datasets hosted on CKAN that seem to be related to a particular question.
What this means is that even before you get a reply, a recommendation might be made to you of a dataset that meets your requirements.
(As with many other Q&A sites, GetTheData also tries to suggest related questions to you when you enter you question, to prompt you to consider whether or not your question has already been asked – and answered.)
I think the recommendation context is something we might be able to explore further, both in terms of linking to recommendations of related data on other websites, but also in the sense of reverse links from GetTheData to those sites.
For example:
- would it be possible to have a recommendation widget on GetTheData that links to related datasets from the Guardian datastore, or National Statistics?
- are there other data directory sites that can take one or more search terms and return a list of related datasets?
- could a getTheData widget be located on CKAN data package pages to alert package owners/maintainers that a question possibly related to the dataset had been posted on GetTheData? This might encourage the data package maintainer to answer the question on the getTheData site with a link back to the CKAN data package page.
As well as recommendations, would it be useful for GetTheData to syndicate new questions asked on the site? For example, I wonder if the Guardian Datastore blog would be willing to add the new questions feed to the other datablogs they syndicate?;-) (Disclosure: data tagged posts from OUseful.info get syndicated in that way.)
Although I don’t have any good examples of this to hand from GetTheData, it strikes me that we might start to see questions that relate to obtaining data which is actually a view over a particular data set. This view might be best obtained via a particular query onto a particular data set. such as a specific SPARQL query on a Linked Data set, or a particular Google query language request to the visualisation API against a particular Google spreadsheet.
If we do start to see such queries, then it would be useful to aggregate these around the datastores they relate to, though I’m not sure how we could best do this at the moment other than by tagging?
News announcements
There are a wide variety of sites publishing data independently, and a fractured networked of data directories and data catalogues. Would it make sense for GetTheData to aggregate news announcements relating to the release of new data sets, and somehow use these to provide additional recommendations around data sets?
Hackdays and Data Fridays
As suggested in Bootstrapping GetTheData.org for All Your Public Open Data Questions and Answers:
If you’re running a hackday, why not use GetTheData.org to post questions arising in the scoping the hacks, tweet a link to the question to your event backchannel and give the remote participants a chance to contribute back, at the same time adding to the online legacy of your event.
Alternatively, how about “Data Fridays”, on the first Friday in the month, where folk agree to check GetTheData two or three times that day and engage in something of a distributed data related Question and Answer sprint, helping answer unanswered questions, and maybe pitching in a few new ones?
Aggregated Search
It would be easy enough to put together a Google custom search engine that searches over the domains of data aggregation sites, and possibly also offer filetype search limits?
So What Next?
Err, that’s it for now…;-) Unless you fancy seeing if there’s a question you can help out on right now at GetTheData.org



