OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Publishing Stats for Analytic Reuse – FAOStat Website and R Package

How can stats and data publishers, from NGOs and (inter)national statistics agencies to scientific researchers, publish their data in a way that supports its analysis directly, as well as in combination with other datasets?

Here’s one approach I learned about from Michael Kao of the UN Food and Agriculture Organisation statistics division, FAOStat.

At first glimpse, the FAOStat website offers a rich website that supports data downloads, previews and simple analysis tools around a wide variety of international food related datasets:

FAOStat website

FAOstat - graphical tools

faostat - inline data preview

FAOStat - ddata analysis

One problem with having so many controls and fields available is that it can be hard to know where (or how) to get started – a bit like the problem of being presented with an empty SPARQL query box…

It would be quite handy to be able to set – and save with meaningful labels – preference sets about the countries you’re interested in so you don’t have to keep keep scrolling through long country lists looking for the countries you want to generate reports for? (Support for “standard” groupings of countries might also be useful?) Being able to share URLs to predefined reports might also be handy? But this would possibly make the site even more complex to use!

One easier way of working with FAOStat data, particularly if you access the FAO datasets regularly, might be to take a programmatic route using the FAOStat R package. Making datasets available in ways that bring that data directly into a desktop analysis environment where they can be worked on without requiring cleaning or other forms of tidying up (which is often the case when data is made available via Excel spreadsheets or CSV files) is a trend I hope we see more of. (That is not to say that data shouldn’t also be published in “generic” document formats…). If you are using a reproducible research strategy, queries to original datasources provide implicit, self-describing metadata about the data source and the query used to return a particular dataset, metadata that is all to easy to lose, or otherwise detach from a dataset when working with downloaded files.

I haven’t had chance to play with this package yet – it’s still in testing anyway, I think? – but it looks quite handy at a first glance (I need to do a proper review…). As well as providing a way of running data grab queries over theFAO FAOSTAT and World Bank WDI APIs, it seems to provide support for “linkage”. As the draft vignette suggests, “Merge is a typical data manipulation step in daily work yet a non-trivial exercise especially when working with different data sources. The built in mergeSYB function enables one to merge data from different sources as long as the country coding system is identified. … Data from any source with [a] classification [supported by the package] can be supplied to mergeSYB in order to obtain a single merged data. (sic)“. Supported formats currently include: United Nations M49 country standard [UN_CODE]; FAO country code scheme [FAOST_CODE]; FAO Global Administrative Unit Layers (GAUL) [ADM0_CODE]; ISO 3166-1 alpha-2 [ISO2_CODE]; ISO 3166-1 alpha-2 (World Bank) [ISO2_WB_CODE]; ISO 3166-1 alpha-3 [ISO3_CODE]; ISO 3166-1 alpha-3 (World Bank) [ISO3_WB_CODE].

By releasing an “official” R package to access the FAOStat API, it occurs to me that this makes it much easier to start building sector specific Shiny applications around particular datasets? I wonder whether the FAOstat folk have considered whether there is a possibility of developing a small Shiny app or custom client ecosystem around their data, even if it just takes the form of a curated set of gists that can be downloaded directly into RStudio, for example, using runGist?

I don’t know whether the Eurostat EC Statistics database has an associated R package too? (If so, it could be quite interesting trying to tie them together?! I do note, however, that Eurostat data is available for download (though I haven’t read the terms/license conditions…).

I also note that a Linked Data/SPARQL way in to Eurostat data appears to be available? Eurostat Linked Data.

[Man flu, hence the brevity of the post... skulks back off to sick bed...]

PS BY the by, I notice that the NHS are experimenting with making some data releases available via Google Public Data Explorer [scroll down...]

PPS See also this package – Smarter Poland – which provides an API to the Eurostat database.

Written by Tony Hirst

March 8, 2013 at 2:45 pm

Posted in Rstats

Tagged with , ,

3 Responses

Subscribe to comments with RSS.

  1. Hi Tony,

    Thank you for the post, and your suggestion for a specific country grouping is very valuable, our developers also had a look at shiny from your last suggestion and they are very interested and will try to implement these in the future.

    Thank you again for everything!

    Michael

    March 10, 2013 at 1:12 pm

    • Hi Michael – happy to bounce ideas further if you or your dev team want to chat…:-)

      Tony Hirst

      March 10, 2013 at 6:24 pm

  2. Hi Tony, you might want to take a look at the application at http://stats.270a.info/ . Paper at http://csarven.ca/linked-statistical-data-analysis . GitHub: http://github.com/csarven/lsd-analysis . In a nutshell, it uses Shiny, Shiny Server, R, Apache Jena (for TDB and Fuseki to do federated SPARQL queries). It is “Linked Data” friendly i.e., you can dereference the analysis URIs and get a serialization in your preference. Analysis results are stored in RDF store and data is publicly accessible at http://stats.270a.info/ . The service contains some FAO datasets (retrieved whatever I was able to get from FAO’s [no longer maintained AFAIK] SDMX service). See also http://270a.info/ to get the bigger picture.

    Sarven Capadisli

    January 21, 2014 at 9:10 am


Comments are closed.

Follow

Get every new post delivered to your Inbox.

Join 841 other followers

%d bloggers like this: