<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>OUseful.Info, the blog... &#187; Visualising Networks in Gephi via a Scraperwiki Exported GEXF File</title>
	<atom:link href="http://blog.ouseful.info/2012/04/03/visualising-networks-in-gephi-via-a-scraperwiki-exported-gexf-file/feed/?withoutcomments=1" rel="self" type="application/rss+xml" />
	<link>http://blog.ouseful.info</link>
	<description>Trying to find useful things to do with emerging technologies in open education</description>
	<lastBuildDate>Sat, 18 May 2013 08:40:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.ouseful.info' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>OUseful.Info, the blog... &#187; Visualising Networks in Gephi via a Scraperwiki Exported GEXF File</title>
		<link>http://blog.ouseful.info</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.ouseful.info/osd.xml" title="OUseful.Info, the blog..." />
	<atom:link rel='hub' href='http://blog.ouseful.info/?pushpress=hub'/>
		<item>
		<title>Visualising Networks in Gephi via a Scraperwiki Exported GEXF File</title>
		<link>http://blog.ouseful.info/2012/04/03/visualising-networks-in-gephi-via-a-scraperwiki-exported-gexf-file/</link>
		<comments>http://blog.ouseful.info/2012/04/03/visualising-networks-in-gephi-via-a-scraperwiki-exported-gexf-file/#comments</comments>
		<pubDate>Tue, 03 Apr 2012 09:39:15 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[OU2.0]]></category>
		<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[gephi]]></category>
		<category><![CDATA[scraperwiki]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=7435</guid>
		<description><![CDATA[How do you visualise data scraped from the web using Scraperwiki as a network using a graph visualisation tool such as Gephi? One way is to import the a two-dimensional data table (i.e. a CSV file) exported from Scraperwiki into Gephi using the Data Explorer, but at times this can be a little fiddly and [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=7435&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>How do you visualise data scraped from the web using Scraperwiki as a network using a graph visualisation tool such as Gephi? One way is to import the a two-dimensional data table (i.e. a CSV file) exported from Scraperwiki into Gephi using the Data Explorer, but at times this can be a little fiddly and may require you to mess around with column names to make sure they&#8217;re the names Gephi expects. Another way is to get the data into a graph based representation using an appropriate file format such as GEXF or GraphML that can be loaded directly (and unambiguously) into Gephi or other network analysis and visualisation tools.</p>
<p>A quick bit of backstory first&#8230;</p>
<p>A couple of related key features for me of a &#8220;data management system&#8221; (eg the joint post from Francis Irving and Rufus Pollock on <a />From CMS to DMS: C is for Content, D is for Data</a>) are the ability to put data into shapes that play nicely with predefined analysis and visualisation routines, and the ability to export data in a variety of formats or representations that allow that data to be be readily imported into, or used by, other applications, tools, or software libraries. Which is to say, I&#8217;m into <em>glue</em>&#8230;</p>
<p>So here&#8217;s some glue &#8211; a recipe for generating a GEXF formatted file that can be loaded directly into Gephi and used to visualise networks like this one of how OpenLearn units are connected by course code and top level subject area:</p>
<p><a href="http://ouseful.files.wordpress.com/2012/04/openlearn-units-graph-example.png"><img src="http://ouseful.files.wordpress.com/2012/04/openlearn-units-graph-example.png?w=700" alt="" title="OpenLearn Units graph example"   class="alignnone size-full wp-image-7436" /></a></p>
<p>The inspiration for this demo comes from a couple of things: firstly, noticing that <em>networkx</em> is one of the <a href="https://scraperwiki.com/docs/python/python_libraries/">third party supported libraries on ScraperWiki</a> (as of last night, I think the <em>igraph</em> library is also available; thanks @frabcus ;-); secondly, having broken ground for myself on how to get Scraperwiki views to emit data feeds rather than HTML pages (eg <a href="https://scraperwiki.com/views/openlearnglossaryjson_1/">OpenLearn Glossary Items as a JSON feed</a>).</p>
<p>As a rather contrived demo, let&#8217;s look at the data from this <a href="https://scraperwiki.com/scrapers/openlearn-units/">scrape of OpenLearn units</a>, as visualised above:</p>
<p><a href="http://ouseful.files.wordpress.com/2012/04/openlearn-units-on-scraperwiki.png"><img src="http://ouseful.files.wordpress.com/2012/04/openlearn-units-on-scraperwiki.png?w=700&#038;h=201" alt="" title="openlearn units on scraperwiki" width="700" height="201" class="alignnone size-full wp-image-7437" /></a></p>
<p>The data is available from the <em>openlearn-units</em> scraper in the table <em>swdata</em>. The columns of interest are <em>name</em>, <em>parentCourseCode</em>, <em>topic</em> and <em>unitcode</em>. What I&#8217;m going to do is generate a graph file that represents which unitcodes are associated with which parentCourseCodes, and which topics are associated with each parentCourseCode. We can then visualise a network that shows parentCourseCodes by topic, along with the child (unitcode) course units generated from each Open University parent course (parentCourseCode).</p>
<p>From previous dabblings with the networkx library, I knew it&#8217;d be easy enough to generate a graph representation from the data in the Scraperwiki data table. Essentially, two steps are required: 1) create and label nodes, as required; 2) tie nodes together with edges. (If a node hasn&#8217;t been defined when you use it to create an edge, netwrokx will create it for you.)</p>
<p>I decided to create and label some of the nodes in advance: unit nodes would carry their name and unitcode; parent course nodes would just carry their parentCourseCode; and topic nodes would carry an newly created ID and the topic name itself. (The topic name is a string of characters and would make for a messy ID for the node!)</p>
<p>To keep gephi happy, I&#8217;m going to explicitly add a <em>label</em> attribute to some of the nodes that will be used, by default, to label nodes in Gephi views of the network. (Here are some <a href="http://networkx.lanl.gov/tutorial/tutorial.html#nodes">hints on generating graphs in networkx</a>.)</p>
<p>Here&#8217;s how I built the graph:</p>
<pre class="brush: python; title: ; notranslate">import scraperwiki
import urllib
import networkx as nx

scraperwiki.sqlite.attach( 'openlearn-units' )
q = '* FROM &quot;swdata&quot;'
data = scraperwiki.sqlite.select(q)

G=nx.Graph()

topics=[]
for row in data:
    G.add_node(row['unitcode'],label=row['unitcode'],name=row['name'],parentCC=row['parentCourseCode'])
    topic=row['topic']
    if topic not in topics:
        topics.append(topic)
    tID=topics.index(topic)
    topicID='topic_'+str(tID)
    G.add_node(topicID,label=topic,name=topic)     
    G.add_edge(topicID,row['parentCourseCode'])
    G.add_edge(row['unitcode'],row['parentCourseCode'])</pre>
<p>Having generated a representation of the data as a graph using networkx, we now need to export the data. <a href="http://networkx.lanl.gov/search.html?q=output&amp;check_keywords=yes&amp;area=default">networkx supports a variety of export formats</a>, including GEXF. Looking at the <a href="http://networkx.lanl.gov/reference/generated/networkx.readwrite.gexf.write_gexf.html">documentation for the GEXF exporter</a>, we see that it offers methods for exporting the GEXF representation to a file. But for scraperwiki, we want to just print out a representation of the file, not actually save the printed representation of the graph to a file. So how do we get hold of an XML representation of the GEXF formatted data so we can print it out? A peek into the <a href="https://networkx.lanl.gov/trac/browser/networkx/networkx/readwrite/gexf.py">source code for the GEXF exporter</a> (other exporter file sources <a href="https://networkx.lanl.gov/trac/browser/networkx/networkx/readwrite/">here</a>) suggests that the functions we need can be found in the <em>networkx.readwrite.gexf</em> file: a constructor (<em>GEXFWriter</em>), and a method for loading in the graph (<em>.add_graph()</em>). An XML representation of the file can then be obtained and printed out using the ElementTree <em>tostring</em> function.</p>
<p>Here&#8217;s the code I hacked out as a result of that little investigation:</p>
<pre class="brush: python; title: ; notranslate">import networkx.readwrite.gexf as gf

writer=gf.GEXFWriter(encoding='utf-8',prettyprint=True,version='1.1draft')
writer.add_graph(G)

scraperwiki.utils.httpresponseheader(&quot;Content-Type&quot;, &quot;text/xml&quot;)

from xml.etree.cElementTree import tostring
print tostring(writer.xml)</pre>
<p>Note the use of the <em>scraperwiki.utils.httpresponseheader</em> to set the MIMEtype of the view. If we don&#8217;t do this, scraperwiki will by default publish an HTML page view, along with a Scraperwiki logo embedded in the page.</p>
<p>Here&#8217;s the <a href="https://scraperwiki.com/views/openlearn_units_graph/">full code for the view</a>.</p>
<p>And here&#8217;s <a href="https://views.scraperwiki.com/run/openlearn_units_graph/">the GEXF view</a>:</p>
<p><a href="http://ouseful.files.wordpress.com/2012/04/scrpaerwiki-gexf-export.png"><img src="http://ouseful.files.wordpress.com/2012/04/scrpaerwiki-gexf-export.png?w=700&#038;h=201" alt="" title="Scrpaerwiki gexf export" width="700" height="201" class="alignnone size-full wp-image-7438" /></a></p>
<p>Save this file with a <em>.gexf</em> suffix and you can then open the file directly into Gephi.</p>
<p>Hopefully, what this post shows is how you can generate your own, potentially complex, output file formats within Scraperwiki that can then be imported directly into other tools.</p>
<p>PS see also <a href="http://blog.ouseful.info/2012/04/03/exporting-and-displaying-scraperwiki-datasets-using-the-google-visualisation-api/">Exporting and Displaying Scraperwiki Datasets Using the Google Visualisation API</a>, which shows how to generate a Google Visualisation API JSON from Scraperwiki, allowing for the quick and easy generation of charts and tables using Google Visualisation API components.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/7435/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/7435/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=7435&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2012/04/03/visualising-networks-in-gephi-via-a-scraperwiki-exported-gexf-file/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/04/openlearn-units-graph-example.png" medium="image">
			<media:title type="html">OpenLearn Units graph example</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/04/openlearn-units-on-scraperwiki.png" medium="image">
			<media:title type="html">openlearn units on scraperwiki</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/04/scrpaerwiki-gexf-export.png" medium="image">
			<media:title type="html">Scrpaerwiki gexf export</media:title>
		</media:content>
	</item>
	</channel>
</rss>
