Over the weekend, I managed to catch up with open data advocate Rufus Pollock for a bit of a chat on all manner of things. One of the things that came up in conversation related to a practical issue around the ability to preview data quickly and easily without having to download and open large data files that might turn out not to contain the data you were looking for.
When building data handling applications, it can also be useful to get your code working on a small number of sampled data rows, rather than a complete data file.
Anyway, here’s one possible solution if your data is in a Google Spreadsheet – a URL pattern that will provide you with an HTML preview of the first ten lines of a sheet:
What it does is look-up a particular sheet in a public/published Google spreadsheet, select every column (select *) and then limit the display to the first 10 rows (limit 10 – just change the number to preview more or less rows. If there is only one sheet in the spreadsheet, or to display the first sheet, you can remove the &gid=SHEET_NUMBER part of the URL).
And if you’d rather have CSV data than an HTML preview, just change the out:html switch to out:csv
It seems to me that if data is being published in a Google doc, then in some situations it might also make sense to either link to, or display, a sample of the data so that folk can check that it meets their expectations before they download it. (I’ve noticed, for example, that even with CSV, many browsers insist on downloading the doc in response to MIME type or server streaming settings so that you then have to open it up in another application, rather than just letting you preview it in the browser. Which is to say, if you have to keep opening up docs elsewhere, it makes browsing hard and can be a huge time waster. It can also be particularly galling if the downloaded file contains data that you’re not interested in, particularly when it’s a large file you’ve downloaded.)
Just by the by, I had thought that Google did a spreadsheet previewer that could take a link to an online spreadsheet or CSV document and preview it in Google Spreadsheets, but I must have misremembered. The Google Docs Viewer only seems to preview “PDF documents, PowerPoint presentations, and TIFF files”. For reference, the URL pattern is of the form http://docs.google.com/viewer?url=ESCAPED_PDF_URL
However, the Zoho Excel Viewer does preview public online Excel docs, along with CSV and OpenOffice Calc docs, using the URL pattern: http://sheet.zoho.com/view.do?url=SPREADSHEET_URL. (Apparently, you can also import docs into your Zoho account using this construction/: http://sheet.zoho.com/import.do?url=SPREADSHEET_URL). So for example, here’s a preview of the meetings that Cabinet Office ministers have had recently (via the new Number 10 Transparency website):
Finally, one of the issues we’ve been having on WriteToReply is how to handle data nicely. The theme we’re using was conflicting (somehow) with Google spreadsheet embed codes, but when I came across this shortcode plugin earlier today, it struck me that it might help…? It simple takes a Google spreadsheet key, (using the pattern [gdoc key=”ABCDEFG”]) and then inserts the necessary HTML embedding tags: WordPress plugin: inline Google Spreadsheet Viewer
However, it also struck me that a few tweaks to that plugin could probably also provide a preview view of the data, showing for example the first 10 lines or so of data in a data file. Providing such a preview view over a sample of data in a data file, maybe in a by default collapsed section of a page, might be something worth exploring in CKAN and data.gov.uk data catalogue pages?
PS it just occurred to me: Scraperwiki offers just this sort of data preview:
Maybe the UI patterns are starting to form and will then start to fall into place…?;-)
UPDATE 28/11/10: I just noticed that the data.gov.uk site is now offering a link to preview xls docs at least on Zoho: