<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>OUseful.Info, the blog... &#187; toolchain</title>
	<atom:link href="http://blog.ouseful.info/tag/toolchain/feed/?withoutcomments=1" rel="self" type="application/rss+xml" />
	<link>http://blog.ouseful.info</link>
	<description>Trying to find useful things to do with emerging technologies in open education</description>
	<lastBuildDate>Tue, 21 May 2013 08:45:26 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.ouseful.info' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>OUseful.Info, the blog... &#187; toolchain</title>
		<link>http://blog.ouseful.info</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.ouseful.info/osd.xml" title="OUseful.Info, the blog..." />
	<atom:link rel='hub' href='http://blog.ouseful.info/?pushpress=hub'/>
		<item>
		<title>Reshaping Horse Import/Export Data to Fit a Sankey Diagram</title>
		<link>http://blog.ouseful.info/2013/02/18/reshaping-horse-importexport-data-to-fit-a-sankey-diagram/</link>
		<comments>http://blog.ouseful.info/2013/02/18/reshaping-horse-importexport-data-to-fit-a-sankey-diagram/#comments</comments>
		<pubDate>Mon, 18 Feb 2013 10:31:35 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Infoskills]]></category>
		<category><![CDATA[Rstats]]></category>
		<category><![CDATA[ddj]]></category>
		<category><![CDATA[schoolofdata]]></category>
		<category><![CDATA[toolchain]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=9840</guid>
		<description><![CDATA[As the food labeling and substituted horsemeat saga rolls on, I&#8217;ve been surprised at how little use has been made of &#8220;data&#8221; to put the structure of the food chain into some sort of context* (or maybe I&#8217;ve just missed those stories?). One place that can almost always be guaranteed to post a few related [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9840&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>As the food labeling and substituted horsemeat saga rolls on, I&#8217;ve been surprised at how little use has been made of &#8220;data&#8221; to put the structure of the food chain into some sort of context* (or maybe I&#8217;ve just missed those stories?). One place that can almost always be guaranteed to post a few related datasets is the Guardian Datastore, who use <a href="https://docs.google.com/spreadsheet/ccc?key=0ArwVnOqE20IkdGFRU3ZxREg4NUttRUp5YllHY095X1E&amp;usp=sharing#gid=3">EU horse import/export data</a> to produce <a href="http://www.guardian.co.uk/uk/datablog/interactive/2013/feb/15/europe-trade-horsemeat-map-interactive">interactive map of the European trade in horsemeat</a></p>
<p><small>*One for the to do list &#8211; a round up of &#8220;#ddj&#8221; stories around the episode.)</small></p>
<p><a href="http://www.guardian.co.uk/uk/datablog/interactive/2013/feb/15/europe-trade-horsemeat-map-interactive"><img src="http://ouseful.files.wordpress.com/2013/02/guardian-datablog-eu-trade-in-horsemeat.png?w=700&#038;h=715" alt="Guardian datablog - EU trade in horsemeat" width="700" height="715" class="alignnone size-full wp-image-9841" /></a></p>
<p>(The article describes the source of the data as the <a href="http://epp.eurostat.ec.europa.eu/newxtweb/">Eurpoean Union Unistat statistics website</a>, although I couldn&#8217;t find a way of recreating the Guardian spreadsheet from that source. When I asked Simon Rogers how he&#8217;d come by the data, he <a href="https://twitter.com/smfrogers/statuses/302577122705293312">suggested</a> putting questions into the Eurostat press office;-)</p>
<p>The data published by the Guardian datastore is a matrix showing the number of horse imports/exports between EU member countries (as well as major traders outside the EU) in 2012:</p>
<p><a href="https://docs.google.com/spreadsheet/ccc?key=0ArwVnOqE20IkdGFRU3ZxREg4NUttRUp5YllHY095X1E&amp;usp=sharing#gid=3"><img src="http://ouseful.files.wordpress.com/2013/02/guardian-datablog-horsemeat-importexport-data.png?w=700&#038;h=377" alt="Guardian Datablog horsemeat importexport data" width="700" height="377" class="alignnone size-full wp-image-9842" /></a></p>
<p>One way of viewing this data structure is as an <em>edge weighted adjacency matrix</em> that describes a graph (a network) in which the member countries are nodes and the cells in the matrix define edge weights between country nodes. The weighted edges are also directed, signifying the flow of animals <em>from</em> one country <em>to</em> another.</p>
<p>Thinking about trade as <em>flow</em> suggests a variety of different visualisation types that build on the metaphor of flow, such as a Sankey diagram. In a Sankey diagram, edges of different thicknesses connect different nodes, with the edge thickness dependent on the amount of &#8220;stuff&#8221; flowing through that connection. (The Guardan map above also uses edge thickness to identify trade volumes.) Here&#8217;s an example of a Sankey diagram I created around the horse export data:</p>
<p><a href="https://views.scraperwiki.com/run/eu_horse_imports_sankey_diagram/?"><img src="http://ouseful.files.wordpress.com/2013/02/horse-exports-eu-sankey-demo.png?w=700&#038;h=403" alt="Horse exports - EU - Sankey demo" width="700" height="403" class="alignnone size-full wp-image-9844" /></a></p>
<p>(The layout is a little rough and ready &#8211; I was more interested in finding a recipe for creating the base graphic &#8211; <em>sans</em> design tweaks;-) &#8211; from the data as supplied.)</p>
<p>So how did I get to the diagram from the data?</p>
<p>As already mentioned, the data came supplied as an adjacency matrix. The Sankey diagram depicted above was generated by passing data in an appropriate form to the <a href="https://github.com/d3/d3-plugins/tree/master/sankey">Sankey diagram plugin</a> to Mike Bostock&#8217;s d3.js graphics library. The plugin requires data in a JSON data format that describes a graph. I happen to know that that the Python networkx library can <a href="http://networkx.github.com/documentation/latest/reference/readwrite.json_graph.html">generate an appropriate data object</a> from a graph modeled using networkx, so I know that if I can generate a graph in networkx I can create a basic Sankey diagram &#8220;for free&#8221;.</p>
<p>So how can we create the graph from the data?</p>
<p>The networkx documentation describes a method &#8211; <a href="http://networkx.github.com/documentation/latest/reference/generated/networkx.readwrite.edgelist.read_weighted_edgelist.html">read_weighted_edgelist</a> &#8211; for reading in a weighted adjacency matrix from a text file, and creating a network from it. If I used this to read the data in, I would get a directed network with edges going into and out of country nodes showing the number of imports and exports. However, I wanted to create a diagram in which the &#8220;import to&#8221; and &#8220;export from&#8221; nodes were distinct so that exports could be seen to flow across the diagram. The approach I took was to transform the two-dimensional adjacency matrix into a <em>weighted edge list</em> in which each row has three columns: <em>exporting country, importing country, amount</em>.</p>
<p>So how can we do that?</p>
<p>One way is to use R. Cutting and pasting the export data of interest from the spreadsheet and into a text file (adding in the missing first column header as I did so) gives a <a href="https://dl.dropbox.com/u/1156404/horseexportsEU.txt">source data file</a> that looks something like this:</p>
<p><a href="https://dl.dropbox.com/u/1156404/horseexportsEU.txt"><img src="http://ouseful.files.wordpress.com/2013/02/horse-export-source-data.png?w=700" alt="horse export source data"   class="alignnone size-full wp-image-9846" /></a> </p>
<p>In contrast, the edge list looks something like this:<br />
<a href="http://ouseful.files.wordpress.com/2013/02/reshaped-horse-data.png"><img src="http://ouseful.files.wordpress.com/2013/02/reshaped-horse-data.png?w=700" alt="reshaped horse data"   class="alignnone size-full wp-image-9845" /></a></p>
<p>So how do we get from one to the other?</p>
<p>Here&#8217;s the R script I used &#8211; it reads the file in, does a bit of fiddling to remove commas from the numbers and turn the result into integer based numbers, and then uses the <em>melt</em> function from the <em>reshape</em> library to generate the edge list, finally filtering out edges where there were no exports:</p>
<pre class="brush: r; title: ; notranslate">#R code

horseexportsEU &lt;- read.delim(&quot;~/Downloads/horseexportsEU.txt&quot;)
require(reshape)
#Get a &quot;long&quot; edge list
x=melt(horseexportsEU,id='COUNTRY')
#Turn the numbers into numbers by removing the comma, then casting to an integer
x$value2=as.integer(as.character(gsub(&quot;,&quot;, &quot;&quot;, x$value, fixed = TRUE) ))
#If we have an NA (null/empty) value, make it -1
x$value2[ is.na(x$value2) ] = -1
#Column names with countries that originally contained spaces convert spaces dots. Undo that. 
x$variable=gsub(&quot;.&quot;, &quot; &quot;, x$variable, fixed = TRUE)
#I want to export a subset of the data
xt=subset(x,value2&gt;0,select=c('COUNTRY','variable','value2'))
#Generate a text file containing the edge list
write.table(xt, file=&quot;foo.csv&quot;, row.names=FALSE, col.names=FALSE, sep=&quot;,&quot;)</pre>
<p>(Another way of getting a directed, weighted edge list from an adjacency table might be to import it into networkx from the weighted adjacency matrix and then export it as weighted edge list. R also has graph libraries available, such as <em>igraph</em>, that can do similar things. But then, I wouldn&#8217;t have go to show the &#8220;melt&#8221; method to reshaping data;-)</p>
<p>Having got the data, I now use a Python script to generate a network, and then export the required JSON representation for use by the d3js Sankey plugin:</p>
<pre class="brush: python; title: ; notranslate">#python code

import StringIO
import csv

#Bring in the edge list explicitly
#rawdata = '''&quot;SLOVENIA&quot;,&quot;AUSTRIA&quot;,1200
#&quot;AUSTRIA&quot;,&quot;BELGIUM&quot;,134600
#&quot;BULGARIA&quot;,&quot;BELGIUM&quot;,181900
#&quot;CYPRUS&quot;,&quot;BELGIUM&quot;,200600
#... etc
#&quot;ITALY&quot;,&quot;UNITED KINGDOM&quot;,12800
#&quot;POLAND&quot;,&quot;UNITED KINGDOM&quot;,129100'''

#We convert the rawdata string into a filestream
f = StringIO.StringIO(rawdata)
#..and then read it in as if it were a CSV file..
reader = csv.reader(f, delimiter=',')

def gNodeAdd(DG,nodelist,name):
    node=len(nodelist)
    DG.add_node(node,name=name)
    #DG.add_node(node,name=name)
    nodelist.append(name)
    return DG,nodelist

nodelist=[]

DG = nx.DiGraph()

#Here's where we build the graph
for item in reader:
    #Even though import and export countries have the same name, we create a unique version depending on
    # whether the country is the importer or the exporter.
    importTo=item[0]+'.'
    exportFrom=item[1]
    amount=item[2]
    if importTo not in nodelist:
        DG,nodelist=gNodeAdd(DG,nodelist,importTo)
    if exportFrom not in nodelist:
        DG,nodelist=gNodeAdd(DG,nodelist,exportFrom)
    DG.add_edge(nodelist.index(exportFrom),nodelist.index(importTo),value=amount)

json = json.dumps(json_graph.node_link_data(DG))
#The &quot;json&quot; serialisation can then be passed to a d3js containing web page...
</pre>
<p>Once the JSON object is generated, it can be handed over to d3.js. The whole script is available here: <a href="https://scraperwiki.com/views/eu_horse_imports_sankey_diagram/">EU Horse imports Sankey Diagram</a>.</p>
<p>What this recipe shows is how we can chain together several different tools and techniques (Google spreadsheets, R, Python, d3.js) to create a visualisation with too much effort (honestly!). Each step is actually quite simple, and with practice can be achieved quite quickly. The trick to producing the visualisation becomes one of decomposing the problem, trying to find a path from the format the data is in to start with, to a form in which it can be passed directly to a visualisation tool such as the d3js Sankey plugin.</p>
<p>PS In passing, as well as the data tables that can be searched on Eurostat, I also found the <a href="http://epp.eurostat.ec.europa.eu/portal/page/portal/publications/eurostat_yearbook_2012">Eurostat Yearbook</a>, which (for the most recent release at least), includes data tables relating to reported items:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/02/eurostat-yearbook.png"><img src="http://ouseful.files.wordpress.com/2013/02/eurostat-yearbook.png?w=700&#038;h=510" alt="Eurostat Yearbook" width="700" height="510" class="alignnone size-full wp-image-9843" /></a></p>
<p>So it seems that the more I look, the more and more places seems to making data that appears in reports available <em>as data</em>&#8230;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/9840/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/9840/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9840&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/02/18/reshaping-horse-importexport-data-to-fit-a-sankey-diagram/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/guardian-datablog-eu-trade-in-horsemeat.png" medium="image">
			<media:title type="html">Guardian datablog - EU trade in horsemeat</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/guardian-datablog-horsemeat-importexport-data.png" medium="image">
			<media:title type="html">Guardian Datablog horsemeat importexport data</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/horse-exports-eu-sankey-demo.png" medium="image">
			<media:title type="html">Horse exports - EU - Sankey demo</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/horse-export-source-data.png" medium="image">
			<media:title type="html">horse export source data</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/reshaped-horse-data.png" medium="image">
			<media:title type="html">reshaped horse data</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/02/eurostat-yearbook.png" medium="image">
			<media:title type="html">Eurostat Yearbook</media:title>
		</media:content>
	</item>
	</channel>
</rss>
