<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>OUseful.Info, the blog... &#187; Tinkering</title>
	<atom:link href="http://blog.ouseful.info/category/tinkering/feed/?withoutcomments=1" rel="self" type="application/rss+xml" />
	<link>http://blog.ouseful.info</link>
	<description>Trying to find useful things to do with emerging technologies in open education</description>
	<lastBuildDate>Wed, 19 Jun 2013 12:19:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.ouseful.info' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>OUseful.Info, the blog... &#187; Tinkering</title>
		<link>http://blog.ouseful.info</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.ouseful.info/osd.xml" title="OUseful.Info, the blog..." />
	<atom:link rel='hub' href='http://blog.ouseful.info/?pushpress=hub'/>
		<item>
		<title>Critiquing Data Stories: Working LibDems Job Creation Data Map with OpenRefine</title>
		<link>http://blog.ouseful.info/2013/06/15/working-jobs-data-with-openrefine/</link>
		<comments>http://blog.ouseful.info/2013/06/15/working-jobs-data-with-openrefine/#comments</comments>
		<pubDate>Sat, 15 Jun 2013 10:49:20 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[School_Of_Data]]></category>
		<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[openrefine]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=10793</guid>
		<description><![CDATA[As well as creating data stories, should the role of a data journalist be to critique data stories put out by governments, companies, and political parties? Via a tweet yesterday I saw a link to a data powered map from the Lib Dems (A Million Jobs), which claimed to illustrate how, through a variety of [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10793&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><em>As well as creating data stories, should the role of a data journalist be to critique data stories put out by governments, companies, and political parties?</em></p>
<p>Via a tweet yesterday I saw a link to a data powered map from the Lib Dems  (<a href="http://www.amillionjobs.org/map">A Million Jobs</a>), which claimed to illustrate how, through a variety of schemes, they had contributed to the creation of a million private sector jobs across the UK. Markers presumably identify where the jobs were created, and a text description pop up provides information about the corresponding scheme or initiative.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/06/libdems-million-jobs.png"><img src="http://ouseful.files.wordpress.com/2013/06/libdems-million-jobs.png?w=700&#038;h=713" alt="libdems million jobs" width="700" height="713" class="alignnone size-full wp-image-10808" /></a></p>
<p>If we view source on the page, we can see where the map &#8211; and maybe the data being used to power it, comes from&#8230;</p>
<p><a href="http://ouseful.files.wordpress.com/2013/06/libdems-jobs-view-source.png"><img src="http://ouseful.files.wordpress.com/2013/06/libdems-jobs-view-source.png?w=700&#038;h=386" alt="libdems jobs view source" width="700" height="386" class="alignnone size-full wp-image-10807" /></a></p>
<p>Ah ha &#8211; it&#8217;s an embedded map from a Google Fusion Table&#8230;</p>
<p><a href="https://www.google.com/fusiontables/embedviz?q=select+col0+from+1whG2X7lpAT5_nfAfuRPUc146f0RVOpETXOwB8sQ&amp;viz=MAP&amp;h=false&amp;lat=52.5656923458786&amp;lng=-1.0353351498047232&amp;t=1&amp;z=7&amp;l=col0&amp;y=2&amp;tmplt=3"><tt><br />
https://www.google.com/fusiontables/embedviz?q=select+col0+from+1whG2X7lpAT5_nfAfuRPUc146f0RVOpETXOwB8sQ&#038;viz=MAP&#038;h=false&#038;lat=52.5656923458786&#038;lng=-1.0353351498047232&#038;t=1&#038;z=7&#038;l=col0&#038;y=2&#038;tmplt=3<br />
</tt></a></p>
<p>We can view the table itself by grabbing the key &#8211; <tt>1whG2X7lpAT5_nfAfuRPUc146f0RVOpETXOwB8sQ</tt> &#8211; and poppiing it into a standard URL (grabbed from viewing another Fusion Table within Fusion Tables itself) of the form:</p>
<p><tt><br />
<a href="https://www.google.com/fusiontables/DataSource?docid=" rel="nofollow">https://www.google.com/fusiontables/DataSource?docid=</a><br />
<strong>1whG2X7lpAT5_nfAfuRPUc146f0RVOpETXOwB8sQ</strong></tt></p>
<p><a href="http://ouseful.files.wordpress.com/2013/06/lib-dems-jobs-fusion-tables.png"><img src="http://ouseful.files.wordpress.com/2013/06/lib-dems-jobs-fusion-tables.png?w=700&#038;h=495" alt="Lib dems jobs Fusion tables" width="700" height="495" class="alignnone size-full wp-image-10806" /></a></p>
<p>The description data is curtailed, but we can see the full description on the card view:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/06/lib-dems-fusion-tables-card.png"><img src="http://ouseful.files.wordpress.com/2013/06/lib-dems-fusion-tables-card.png?w=700&#038;h=324" alt="Lib dems fusion tables card" width="700" height="324" class="alignnone size-full wp-image-10805" /></a></p>
<p>Unfortunately, downloads of the data have been disabled, but with a tiny bit of thought we can easily come up with a tractable, if crude, way  of getting the data&#8230; You may be able to work out how when you see what it looks like when I load it into <a href="http://openrefine.org">OpenRefine</a>.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/06/lib-dems-jobs-data-in-openrefine.png"><img src="http://ouseful.files.wordpress.com/2013/06/lib-dems-jobs-data-in-openrefine.png?w=700&#038;h=313" alt="lib dems jobs data in OpenRefine" width="700" height="313" class="alignnone size-full wp-image-10804" /></a></p>
<p>This repeating pattern of rows is one that we might often encounter in data sets pulled from reports or things like PDF documents. To be able to usefully work with this data, it would be far easier if it was arranged by column, with the groups-of-three row records arranged instead as a single row spread across three columns.</p>
<p>Looking through the OpenRefine column tools menu, we find a transpose tool that looks as if it may help with that:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/06/openrefine-transpose-cell-rows-to-cols2.png"><img src="http://ouseful.files.wordpress.com/2013/06/openrefine-transpose-cell-rows-to-cols2.png?w=700&#038;h=386" alt="OpenRefine transpose cell rows to cols2" width="700" height="386" class="alignnone size-full wp-image-10802" /></a></p>
<p>And as if by magic, we have recreated a workable table:-)</p>
<p><a href="http://ouseful.files.wordpress.com/2013/06/openrefine-rows-transposed-to-cols.png"><img src="http://ouseful.files.wordpress.com/2013/06/openrefine-rows-transposed-to-cols.png?w=700&#038;h=216" alt="Openrefine rows transposed to cols" width="700" height="216" class="alignnone size-full wp-image-10801" /></a></p>
<p>If we generate a text facet on the descriptions, we can look to see how many markers map onto the same description (presumably, the same scheme?</p>
<p><a href="http://ouseful.files.wordpress.com/2013/06/openrefinelibdem-jobs-text-facet.png"><img src="http://ouseful.files.wordpress.com/2013/06/openrefinelibdem-jobs-text-facet.png?w=700&#038;h=488" alt="openrefinelibdem jobs text facet" width="700" height="488" class="alignnone size-full wp-image-10800" /></a></p>
<p>If we peer a bit more closely, we see that some of the numbers relating to job site locations as referred to in the description don&#8217;t seem to tally with the number of markers? So what do the markers represent, and how do they relate to the descriptions? And furthermore &#8211; what do the actual postcodes relate to? And where are the links to formal descriptions of the schemes referred to?</p>
<p><a href="http://ouseful.files.wordpress.com/2013/06/counting-job-sites.png"><img src="http://ouseful.files.wordpress.com/2013/06/counting-job-sites.png?w=700" alt="counting job sites"   class="alignnone size-full wp-image-10799" /></a></p>
<p>What this &#8220;example&#8221; of data journalistic practice by the Lib Dems shows is how it can generate a whole wealth of additional questions, both from a critical reading just of the data itself, (for example, trying to match mentions of job locations with the number of markers on the map or rows referring to that scheme in the table), as we all question that lead on from the data &#8211; where can we find more details about the local cycling and green travel scheme that was awarded £590,000, for example?</p>
<p>Using similar text processing techniques to those described in <a href="http://schoolofdata.org/2013/06/04/analysing-uk-lobbying-data-using-openrefine/">Analysing UK Lobbying Data Using OpenRefine</a>, we can also start trying to pull out some more detail from the data. For example, by observation we notice that the phrase <em>Summary: Lib Dems in Government have given a £</em> starts many of the descriptions:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/06/libdems-have-given-text.png"><img src="http://ouseful.files.wordpress.com/2013/06/libdems-have-given-text.png?w=700&#038;h=525" alt="libdems - have given text" width="700" height="525" class="alignnone size-full wp-image-10798" /></a></p>
<p>Using a regular expression, we can pull out the amounts that are referred to in this way and create a new column containing these values:</p>
<p><tt>import re<br />
tmp=value<br />
tmp = re.sub(r'Summary: Lib Dems in Government have given a £([0-9,\.]*).*', r'\1', tmp)<br />
if value==tmp: tmp=''<br />
tmp = tmp.replace(',','')<br />
return tmp</tt></p>
<p><a href="http://ouseful.files.wordpress.com/2013/06/libdems-have-given-amount.png"><img src="http://ouseful.files.wordpress.com/2013/06/libdems-have-given-amount.png?w=700" alt="libdems have given amount"   class="alignnone size-full wp-image-10797" /></a></p>
<p>Note that there may be other text conventions describing amounts awarded that we could also try to extract as part of thes column creation.</p>
<p>If we cast these values to a number:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/06/openrefine-convert-given-to-number.png"><img src="http://ouseful.files.wordpress.com/2013/06/openrefine-convert-given-to-number.png?w=700" alt="openrefine convert given to number"   class="alignnone size-full wp-image-10796" /></a></p>
<p>we can then use a numeric facet to help us explore the amounts.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/06/libdems-value-numeric-facet.png"><img src="http://ouseful.files.wordpress.com/2013/06/libdems-value-numeric-facet.png?w=700&#038;h=538" alt="libdems value numeric facet" width="700" height="538" class="alignnone size-full wp-image-10795" /></a></p>
<p>In this case, we notice that there weren&#8217;t that many <em>distinct</em> factors containing the text construction we parsed, so we may need to do a little more work there to see what else we can extract. For example:</p>
<ul>
<li><em>Summary: Lib Dems in Government have secured a £73,000 grant for &#8230;</em></li>
<li><em>Summary: Lib Dems in Government have secured a share of a £23,000,000 grant for &#8230;</em> &#8211;  we might not want to pull this into a &#8220;full value&#8221; column if they only got a <em>share</em> of the grant?</li>
<li><em>Summary: Lib Dems in Government have given local business AJ Woods Engineering Ltd a £850,000 grant &#8230;</em></li>
<li><em>Summary: Lib Dems in Government have given £982,000 to &#8230;</em></li>
</ul>
<p>Here&#8217;s an improved regular expression for parsing out some more of these amounts:</p>
<p><tt>import re<br />
tmp=value<br />
tmp=re.sub(r'Summary: Lib Dems in Government have given (a )?£([0-9,\.]*).*',r'\2',tmp)<br />
tmp=re.sub(r'Summary: Lib Dems in Government have secured a ([0-9,\.]*).*',r'\1',tmp)<br />
tmp=re.sub(r'Summary: Lib Dems in Government have given ([^a]).* a £([0-9,\.]*) grant.*',r'\2',tmp)</p>
<p>if value==tmp:tmp=''<br />
tmp=tmp.replace(',','')<br />
return tmp</tt></p>
<p>So now we can start to identify some of the bigger grants&#8230;</p>
<p><a href="http://ouseful.files.wordpress.com/2013/06/libdems-jobs-big-amounts.png"><img src="http://ouseful.files.wordpress.com/2013/06/libdems-jobs-big-amounts.png?w=700&#038;h=145" alt="libdems jobs big amounts" width="700" height="145" class="alignnone size-full wp-image-10794" /></a></p>
<p><em>More to add? eg around:<br />
- <tt>...have secured a £150,000 grant...</tt><br />
- <tt>Summary: Lib Dems have given a £1,571,000 grant...</tt><br />
- <tt>Summary: Lib Dems in Government are giving £10,000,000 to...</tt> (though maybe this should go in an &#8216;are giving&#8217; column, rather than &#8216;have given&#8217;, cf. &#8220;will give&#8221; also&#8230;?)<br />
- Here&#8217;s another for a &#8216;possible spend&#8217; column? <tt>Summary: Lib Dems in Government have allocated £300,000 to...</tt></em></p>
<p><em>Note: once you start poking around at these descriptions, you find a wealth of things like: &#8220;Summary: Lib Dems in Government have allocated £300,000 to fund the M20 Junctions 6 to 7 improvement, Maidstone , helping to reduce journey times and create 10,400 new jobs. The project will also help build 8,400 new homes.&#8221; Leading to ask the question: how many of the &#8220;one million jobs&#8221; arise from improvements to road junctions&#8230;?</em></p>
<p><a href="http://ouseful.files.wordpress.com/2013/06/how-many-jobs-from-road-junction-improvements.png"><img src="http://ouseful.files.wordpress.com/2013/06/how-many-jobs-from-road-junction-improvements.png?w=700" alt="how many jobs from road junction improvements?"   class="alignnone size-full wp-image-10818" /></a></p>
<p>In order to address this question, we might to start have a go at pulling out the number of jobs that it is claimed various schemes will create, as this column generator starts to explore:</p>
<p><tt>import re<br />
tmp=value<br />
tmp = re.sub(r'.* creat(e|ing) ([0-9,\.]*) jobs.*', r'\2', tmp)<br />
if value==tmp:tmp=''<br />
tmp=tmp.replace(',','')<br />
return tmp</tt></p>
<p><a href="http://ouseful.files.wordpress.com/2013/06/lib-dems-jobs-created.png"><img src="http://ouseful.files.wordpress.com/2013/06/lib-dems-jobs-created.png?w=700" alt="Lib dems jobs created"   class="alignnone size-full wp-image-10813" /></a></p>
<p>If we start to think analytically about the text, we start to see there may be other structures we can attack&#8230; For example:</p>
<ul>
<li><em>£23,000,000 grant for local business ADS Group. &#8230;</em> &#8211; here we might be able to pull out what an amount was awarded for, or to whom it was given.</li>
<li><em>£950,000 to local business/project A45 Northampton to Daventry Development Link &#8211; Interim Solution A45/A5 Weedon Crossroad Improvements to improve local infastructure, creating jobs and growth</em> &#8211; here we not only have the recipient but also the reason for the grant</li>
</ul>
<p>But that&#8217;s for another day&#8230;</p>
<p><em>If you want to play with the data yourself, you can <a href="https://gist.github.com/psychemedia/5787573">find it here</a>.</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/10793/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/10793/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10793&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/06/15/working-jobs-data-with-openrefine/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/06/libdems-million-jobs.png" medium="image">
			<media:title type="html">libdems million jobs</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/06/libdems-jobs-view-source.png" medium="image">
			<media:title type="html">libdems jobs view source</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/06/lib-dems-jobs-fusion-tables.png" medium="image">
			<media:title type="html">Lib dems jobs Fusion tables</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/06/lib-dems-fusion-tables-card.png" medium="image">
			<media:title type="html">Lib dems fusion tables card</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/06/lib-dems-jobs-data-in-openrefine.png" medium="image">
			<media:title type="html">lib dems jobs data in OpenRefine</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/06/openrefine-transpose-cell-rows-to-cols2.png" medium="image">
			<media:title type="html">OpenRefine transpose cell rows to cols2</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/06/openrefine-rows-transposed-to-cols.png" medium="image">
			<media:title type="html">Openrefine rows transposed to cols</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/06/openrefinelibdem-jobs-text-facet.png" medium="image">
			<media:title type="html">openrefinelibdem jobs text facet</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/06/counting-job-sites.png" medium="image">
			<media:title type="html">counting job sites</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/06/libdems-have-given-text.png" medium="image">
			<media:title type="html">libdems - have given text</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/06/libdems-have-given-amount.png" medium="image">
			<media:title type="html">libdems have given amount</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/06/openrefine-convert-given-to-number.png" medium="image">
			<media:title type="html">openrefine convert given to number</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/06/libdems-value-numeric-facet.png" medium="image">
			<media:title type="html">libdems value numeric facet</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/06/libdems-jobs-big-amounts.png" medium="image">
			<media:title type="html">libdems jobs big amounts</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/06/how-many-jobs-from-road-junction-improvements.png" medium="image">
			<media:title type="html">how many jobs from road junction improvements?</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/06/lib-dems-jobs-created.png" medium="image">
			<media:title type="html">Lib dems jobs created</media:title>
		</media:content>
	</item>
		<item>
		<title>To What Extent Do Candidates Support Each Other Redux &#8211; A One-Liner, Thirty Second Route to the Info</title>
		<link>http://blog.ouseful.info/2013/05/08/to-what-extent-do-candidates-support-each-other-redux-a-one-liner-thirty-second-route-to-the-info/</link>
		<comments>http://blog.ouseful.info/2013/05/08/to-what-extent-do-candidates-support-each-other-redux-a-one-liner-thirty-second-route-to-the-info/#comments</comments>
		<pubDate>Wed, 08 May 2013 10:50:40 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[Tutorial]]></category>
		<category><![CDATA[schoolofdata]]></category>
		<category><![CDATA[scraperwiki]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[sqlite]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=10643</guid>
		<description><![CDATA[In More Storyhunting Around Local Elections Data Using Gephi – To What Extent Do Candidates Support Each Other? I described a visual route to finding out which local council candidates had supported each other on their nomination papers. There is also a thirty second route to that data that I should probably have mentioned;-) From [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10643&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In <a href="http://blog.ouseful.info/2013/05/08/more-storyhunting-around-local-elections-data-using-gephi-to-what-extent-do-candidates-support-each-other/">More Storyhunting Around Local Elections Data Using Gephi – To What Extent Do Candidates Support Each Other?</a> I described a visual route to finding out which local council candidates had supported each other on their nomination papers. There is also a thirty second route to that data that I should probably have mentioned;-)</p>
<p>From the <a href="https://scraperwiki.com/scrapers/iw_poll_notices_scrape/">Scraperwiki database</a>, we need to interrogate the API:</p>
<p><a href="https://scraperwiki.com/scrapers/iw_poll_notices_scrape/"><img src="http://ouseful.files.wordpress.com/2013/05/scraperwiki-api.png?w=700" alt="scraperwiki api"   class="alignnone size-full wp-image-10645" /></a></p>
<p>To do this, we&#8217;ll use a database query language &#8211; SQL.</p>
<p>What we need to ask the database is which of the assentors (members of the <em>support</em> column) are also candidates (members of the <em>candinit</em> column, and just return those rows. The SQL command is simply this:</p>
<p><tt>select * from support where support in (select candinit from support)</tt></p>
<p>Note that &#8220;support&#8221; refers to two things here &#8211; these are columns:</p>
<p><tt>select <strong>*</strong> from support where <strong>support</strong> in (select <strong>candinit</strong> from support)</tt></p>
<p>and these are the table the columns are being pulled from:</p>
<p><tt>select * from <strong>support</strong> where support in (select candinit from <strong>support</strong>)</tt></p>
<p>Here&#8217;s the result of <em>Run</em>ing the query:</p>
<p><a href="https://scraperwiki.com/docs/api?name=iw_poll_notices_scrape#sqlite"><img src="http://ouseful.files.wordpress.com/2013/05/sql-select-on-scraperwiki.png?w=700" alt="sql select on scraperwiki"   class="alignnone size-full wp-image-10644" /></a></p>
<p>We can also get a <a href="https://api.scraperwiki.com/api/1.0/datastore/sqlite?format=htmltable&amp;name=iw_poll_notices_scrape&amp;query=select%20*%20from%20%60support%60%20where%20support%20in%20(select%20candinit%20from%20support)">direct link to a tabular view of the data</a> (or generate a link to a CSV output etc from the <em>format</em> selector).</p>
<p><a href="https://api.scraperwiki.com/api/1.0/datastore/sqlite?format=htmltable&amp;name=iw_poll_notices_scrape&amp;query=select%20*%20from%20%60support%60%20where%20support%20in%20(select%20candinit%20from%20support)"><img src="http://ouseful.files.wordpress.com/2013/05/candidates-mutual-table.png?w=700&#038;h=241" alt="candidates mutual table" width="700" height="241" class="alignnone size-full wp-image-10646" /></a></p>
<p>There are 15 rows in this result compared to the 15 edges/connecting lines discovered in the Gephi approach, so each method corroborates the other:</p>
<p><a href="http://blog.ouseful.info/2013/05/08/more-storyhunting-around-local-elections-data-using-gephi-to-what-extent-do-candidates-support-each-other/"><img src="http://ouseful.files.wordpress.com/2013/05/tidier-intra-candidate-support-map.png?w=700&#038;h=618" alt="Tidier intra-candidate support map" width="700" height="618" class="alignnone size-full wp-image-10602" /></a></p>
<p>Simples:-)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/10643/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/10643/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10643&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/05/08/to-what-extent-do-candidates-support-each-other-redux-a-one-liner-thirty-second-route-to-the-info/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/scraperwiki-api.png" medium="image">
			<media:title type="html">scraperwiki api</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/sql-select-on-scraperwiki.png" medium="image">
			<media:title type="html">sql select on scraperwiki</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/candidates-mutual-table.png" medium="image">
			<media:title type="html">candidates mutual table</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/tidier-intra-candidate-support-map.png" medium="image">
			<media:title type="html">Tidier intra-candidate support map</media:title>
		</media:content>
	</item>
		<item>
		<title>More Storyhunting Around Local Elections Data Using Gephi &#8211; To What Extent Do Candidates Support Each Other?</title>
		<link>http://blog.ouseful.info/2013/05/08/more-storyhunting-around-local-elections-data-using-gephi-to-what-extent-do-candidates-support-each-other/</link>
		<comments>http://blog.ouseful.info/2013/05/08/more-storyhunting-around-local-elections-data-using-gephi-to-what-extent-do-candidates-support-each-other/#comments</comments>
		<pubDate>Wed, 08 May 2013 09:05:06 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[gephi]]></category>
		<category><![CDATA[schoolofdata]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=10599</guid>
		<description><![CDATA[In Questioning Election Data to See if It Has a Story to Tell I started to explore various ways in which we could start to search for stories in a dataset finessed out of a set of poll notices announcing the recent Isle of Wight Council elections. In this post, I&#8217;ll do a little more [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10599&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In <a href="http://blog.ouseful.info/2013/05/05/questioning-election-data-to-see-if-it-has-a-story-to-tell/">Questioning Election Data to See if It Has a Story to Tell</a> I started to explore various ways in which we could start to search for stories in a dataset finessed out of a set of poll notices announcing the recent Isle of Wight Council elections. In this post, I&#8217;ll do a little more questioning, especially around the assentors (proposers, seconders etc) who supported each candidate, looking to see whether there are any social structures in there resulting from candidates supporting each others&#8217; applications. The essence of what we&#8217;re doing is some simple social network analysis around the candidate/assentor network. (For an alternative route to the result, see <a href="http://blog.ouseful.info/2013/05/08/to-what-extent-do-candidates-support-each-other-redux-a-one-liner-thirty-second-route-to-the-info/">To What Extent Do Candidates Support Each Other Redux – A One-Liner, Thirty Second Route to the Info</a>.)</p>
<p>This is what we&#8217;ll be working towards:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/tidier-intra-candidate-support-map.png"><img src="http://ouseful.files.wordpress.com/2013/05/tidier-intra-candidate-support-map.png?w=700&#038;h=618" alt="Tidier intra-candidate support map" width="700" height="618" class="alignnone size-full wp-image-10602" /></a></p>
<p>If you want to play along, you can get the data from my <a href="https://scraperwiki.com/scrapers/iw_poll_notices_scrape/">IW poll notices scrape</a> on ScraperWiki, specifically the <em>support</em> table.</p>
<p><a href="https://scraperwiki.com/scrapers/iw_poll_notices_scrape/"><img src="http://ouseful.files.wordpress.com/2013/05/scraperwiki-council-elections-assentors.png?w=700&#038;h=275" alt="scraperwiki council elections - assentors" width="700" height="275" class="alignnone size-full wp-image-10620" /></a></p>
<p>Here&#8217;s a reminder of what the <a href="http://www.iwight.com/azservices/documents/1174-Notice%20of%20Poll%20-%20IOWC%20May%202013.pdf">original PDF</a> doc looked like (<a href="https://dl.dropboxusercontent.com/u/1156404/1174-Notice%20of%20Poll%20-%20IOWC%20May%202013.pdf">archive copy</a>):</p>
<p><a href="http://www.iwight.com/azservices/documents/1174-Notice%20of%20Poll%20-%20IOWC%20May%202013.pdf"><img src="http://ouseful.files.wordpress.com/2013/05/iw-poll-notice-assentors.png?w=700&#038;h=538" alt="IW poll notice assentors" width="700" height="538" class="alignnone size-full wp-image-10619" /></a></p>
<p>Checking the extent to which candidates supported each other is something we could do by hand, looking down each candidate&#8217;s list of  assentors for names of other candidates, but it would be a laborious job. It&#8217;s far easier(?!;-) to automate it&#8230;</p>
<p>When we want to compare names using a computer programme or script, the simplest approach is to do an <strong>exact string match</strong> (a <em>string</em> is a list of characters). Two strings match if they are exactly the same, so for example: <em>This string</em> is the same as <em>This string</em>, but not <em>this string</em> (they differ in their first character &#8211; upper case <em>T</em> in the first example as compared with lower case <em>t</em> in the last. We&#8217;ll be using exact string matching to identify whether a candidate has the same name as any of the assentors, so on the scraper, I did a little fiddling around with the names, in particular generating a new column that recasts the name of the candidate into the same presentation form used to identify the assentors (<em>Firstname I. Lastname</em>).</p>
<p>We can download a <a href="https://api.scraperwiki.com/api/1.0/datastore/sqlite?format=csv&amp;name=iw_poll_notices_scrape&amp;query=select+*+from+`support`&amp;apikey=">CSV representation of the data</a> from the scraper directly:</p>
<p><a href="https://api.scraperwiki.com/api/1.0/datastore/sqlite?format=csv&amp;name=iw_poll_notices_scrape&amp;query=select+*+from+`support`&amp;apikey="><img src="http://ouseful.files.wordpress.com/2013/05/scraperwiki-csv-download.png?w=700" alt="Scraperwiki CSV download"   class="alignnone size-full wp-image-10626" /></a></p>
<p>The first thing I want to explore is the extent to which candidates support other candidates to see if we can identify any political groupings. The tool I&#8217;m going to use to visualise the data is Gephi, an open-source cross-platform application (requires Java) that you can download for free from <a href="http://gephi.org">gephi.org</a>.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/gephi-org.png"><img src="http://ouseful.files.wordpress.com/2013/05/gephi-org.png?w=700&#038;h=337" alt="Gephi.org" width="700" height="337" class="alignnone size-full wp-image-10622" /></a></p>
<p>To view the data in Gephi, it&#8217;s easiest if we rename a couple of columns so that Gephi can recognise relations between supporters and candidates; if we open the CSV download file in a text editor, we can rename the <em>candinit</em> as <em>target</em> and the <em></em> column as <em>Source</em> to represent an arrow going from an assentor to a candidate, where the arrow reads something along the lines of &#8220;is a supporter of&#8221;.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/csv-rename.png"><img src="http://ouseful.files.wordpress.com/2013/05/csv-rename.png?w=700" alt="csv rename"   class="alignnone size-full wp-image-10618" /></a></p>
<p>Start Gephi, select Data Laboratory tab and then New Project from the File menu.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/geohi-data-lab-new-project.png"><img src="http://ouseful.files.wordpress.com/2013/05/geohi-data-lab-new-project.png?w=700&#038;h=279" alt="geohi data lab new project" width="700" height="279" class="alignnone size-full wp-image-10617" /></a></p>
<p>You should now see a toolbar that includes an &#8220;Import Spreadsheet option&#8221;:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/gephi-import-spreadsheet.png"><img src="http://ouseful.files.wordpress.com/2013/05/gephi-import-spreadsheet.png?w=700&#038;h=49" alt="gephi import spreadsheet" width="700" height="49" class="alignnone size-full wp-image-10616" /></a></p>
<p>Import the CSV file as such, identifying it as an <em>Edges Table</em>:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/import-data-into-gephi-ata-laboaratory.png"><img src="http://ouseful.files.wordpress.com/2013/05/import-data-into-gephi-ata-laboaratory.png?w=700" alt="import data into gephi data laboaratory"   class="alignnone size-full wp-image-10615" /></a></p>
<p>You should notice that the Source and Target columns have been identified as such and we have the choice to import the other column or not &#8211; let&#8217;s bring them in&#8230;</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/source-and-target-recognised.png"><img src="http://ouseful.files.wordpress.com/2013/05/source-and-target-recognised.png?w=700" alt="SOurce and Target recognised"   class="alignnone size-full wp-image-10614" /></a></p>
<p>You should now see the data has been loaded in to Gephi&#8230;</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/data-loaded-in.png"><img src="http://ouseful.files.wordpress.com/2013/05/data-loaded-in.png?w=700" alt="Data loaded in"   class="alignnone size-full wp-image-10613" /></a></p>
<p>If you click on the <em>Overview</em> tab button, you should see a mass of nodes/circles representing candidates and assentors with arrows going from assentors to candidates.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/mess1.png"><img src="http://ouseful.files.wordpress.com/2013/05/mess1.png?w=700" alt="mess..."   class="alignnone size-full wp-image-10628" /></a></p>
<p>Let&#8217;s see how they connect &#8211; we can <em>Run</em> the <em>Force Atlas 2</em> <strong>Layout</strong> algorithm for starters. I tweaked the <em>Scaling</em> value and ticked on <em>Stronger Gravity</em> to help shape the resulting layout:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/force-layout-tweaks.png"><img src="http://ouseful.files.wordpress.com/2013/05/force-layout-tweaks.png?w=700&#038;h=492" alt="force layout tweaks" width="700" height="492" class="alignnone size-full wp-image-10611" /></a></p>
<p>If you look closely, you&#8217;ll be able to see that there are many separate groupings of connected circles  &#8211; this represent candidates who are supported by folk who are not also candidates (sometimes a node sits on top of a line so it looks as if two noes are connected when in fact they aren&#8217;t&#8230;)</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/close-up-simple-patterns1.png"><img src="http://ouseful.files.wordpress.com/2013/05/close-up-simple-patterns1.png?w=700" alt="Close up simple patterns"   class="alignnone size-full wp-image-10629" /></a></p>
<p>However, there are also other groupings in which one candidate may support another:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/candidate-support1.png"><img src="http://ouseful.files.wordpress.com/2013/05/candidate-support1.png?w=700" alt="candidate support"   class="alignnone size-full wp-image-10630" /></a></p>
<p>These connections may allow us to see grouping of candidates supporting each other along party lines.</p>
<p>One of the powerful things about Gephi is that it allows us to construct quite complex, nested filters that we can apply to the data based on the properties of the network the data describes so that we can focus on particular aspects of the network I&#8217;m going to filter the network so that it shows only those individuals who are supported by at least one person (in-degree 1 or more) <em>and</em> who support at least one person (out-degree one or more) &#8211; that is, folk who are candidates (in-degree 1 or more) who also supported (oit degree 1 or more) another candidate. Let&#8217;s also turn labels on to see which candidates the filter identifies, and colour the edges along party lines. We can now see some information about the connectedness a little more clearly:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/lots-going-on.png"><img src="http://ouseful.files.wordpress.com/2013/05/lots-going-on.png?w=700&#038;h=483" alt="lots going on" width="700" height="483" class="alignnone size-full wp-image-10608" /></a></p>
<p>Hmmm.. how about if we extend out filter to see who&#8217;s connected to these nodes (this might include other candidates who do not themselves assent to another candidate), and also rezise the nodes/labels so we can better see the candidates&#8217; names. The Neigbours Network filter takes the nodes we have and then also finds the nodes that are connected to them to depth 2 in this case (that is, it brings in nodes connected to the candidates who are also supporters (depth 1), and the nodes connected to those nodes (depth two). Which is to say, it will being in the candidates who are supported by candidates, and their supporters:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/a-few-more-tweaks.png"><img src="http://ouseful.files.wordpress.com/2013/05/a-few-more-tweaks.png?w=700&#038;h=401" alt="A few more tweaks" width="700" height="401" class="alignnone size-full wp-image-10607" /></a></p>
<p>That&#8217;s a bit clearer, but there are still overlapping lines, so it may make sense to layout the network again:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/improve-the-layout.png"><img src="http://ouseful.files.wordpress.com/2013/05/improve-the-layout.png?w=700&#038;h=390" alt="improve the layout" width="700" height="390" class="alignnone size-full wp-image-10606" /></a></p>
<p>We can also experiment with other colourings &#8211; if we go to the Statistics panel, we can run a <em>Connected Components</em> filter that tries to find nodes that are connected into distinct groups. We can then colour each of the separate groups uniquely:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/colour-the-groups.png"><img src="http://ouseful.files.wordpress.com/2013/05/colour-the-groups.png?w=700&#038;h=410" alt="colour the groups" width="700" height="410" class="alignnone size-full wp-image-10634" /></a></p>
<p>Let&#8217;s reset the colours and go back to colourings along party lines:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/gephi-reset-colours.png"><img src="http://ouseful.files.wordpress.com/2013/05/gephi-reset-colours.png?w=700" alt="Gephi reset colours"   class="alignnone size-full wp-image-10633" /></a></p>
<p>If we go to the <em>Preview</em> view, we can generate a prettified view of the network:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/preview-layout.png"><img src="http://ouseful.files.wordpress.com/2013/05/preview-layout.png?w=700&#038;h=450" alt="Preview layout" width="700" height="450" class="alignnone size-full wp-image-10605" /></a></p>
<p>In it, we can clearly see groupings along party lines (inside the blue boxes). There is something odd, though? There appears to be a connection between UKIP and Independent groupings? Let&#8217;s zoom in:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/this-is-odd.png"><img src="http://ouseful.files.wordpress.com/2013/05/this-is-odd.png?w=700" alt="this is odd"   class="alignnone size-full wp-image-10604" /></a></p>
<p>Going back to the Graph view and zooming in, we see that <em>Paul G. taylor</em> appears to be supporting two candidates of different parties&#8230; Hmm &#8211; I wonder: are there actually <em>two</em> Paul G. Taylors, I wonder, with different political preferences? (Note to self: check on Electoral Commission website what regulations there are about assenting. Can you only assent to one person, and then only within the ward in which you are registered to vote? For local elections, could you be registered to vote in more than one electoral division within the same council area?)</p>
<p>To check that there are no other names that support more than one candidate, we can create another, simple filter that just selects nodes with out-degree 2 or more &#8211; that is, who support 2 or more other nodes:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/filter-on-nodes-out-degree-2.png"><img src="http://ouseful.files.wordpress.com/2013/05/filter-on-nodes-out-degree-2.png?w=700" alt="Filter on nodes out degree 2"   class="alignnone size-full wp-image-10600" /></a></p>
<p>Just that one then&#8230;</p>
<p>Looking at the fuller chart, it&#8217;s still rather scruffy. We could tidy it by removing assentors who are not themselves candidates (that is, there are no arrows pointing in to them). The way Gephi filters work support chaining. If you look at the filters, you will see they are nested, much like a nested comment thread in a forum. Filters at the bottom of the tree act on the graph and pass the filtereed network to date up the tree to the next filter. This means we can pass the network as shown above into another filter layer that removes folk who are &#8220;just&#8221; assentors and not candidates.</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/nested-filters.png"><img src="http://ouseful.files.wordpress.com/2013/05/nested-filters.png?w=700" alt="nested filters"   class="alignnone size-full wp-image-10601" /></a></p>
<p>Here&#8217;s the result:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/nesting-filters-in-gephi.png"><img src="http://ouseful.files.wordpress.com/2013/05/nesting-filters-in-gephi.png?w=700&#038;h=396" alt="Nesting filters in gephi" width="700" height="396" class="alignnone size-full wp-image-10603" /></a></p>
<p>And again we can go into Preview mode to generate a nice vectorised version of the graph:</p>
<p><a href="http://ouseful.files.wordpress.com/2013/05/tidier-intra-candidate-support-map.png"><img src="http://ouseful.files.wordpress.com/2013/05/tidier-intra-candidate-support-map.png?w=700&#038;h=618" alt="Tidier intra-candidate support map" width="700" height="618" class="alignnone size-full wp-image-10602" /></a></p>
<p>This quite clearly shows several mutual support networks between Labour candidates (red edges), Conservative candidates (blue edges), independents (black edges) and a large grouping of UKIP candidates (purple edges).</p>
<p>So there we have it a quick tour of how to use Gephi to look at the co-support structure of group of local election candidates. Were the highlighted candidates to be successful in their election, it could signify possible factions or groupings within the council, particular amongst the independents? Along the way we saw how to make use of filters, and spotted something we need to check (whether the same person supported two candidates (if that isn&#8217;t allowed?) or whether they are two different people sharing the same name.</p>
<p>If this all seems like too much effort, remembers that there&#8217;s always the <a href="http://blog.ouseful.info/2013/05/08/to-what-extent-do-candidates-support-each-other-redux-a-one-liner-thirty-second-route-to-the-info/">One-Liner, Thirty Second Route to the Info</a>.</p>
<p>PS by the by, a recent FOI request on WhatDoTheyKnow suggests another possible line of enquiry around possible candidates &#8211; if they have been elected to the council before, <a href="https://www.whatdotheyknow.com/request/charles_chapman_former_councillo">how good was their attendance record</a>? (I don&#8217;t think OpenlyLocal scrapes this information? Presumably it is available somewhere on the council website?)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/10599/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/10599/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10599&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/05/08/more-storyhunting-around-local-elections-data-using-gephi-to-what-extent-do-candidates-support-each-other/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/tidier-intra-candidate-support-map.png" medium="image">
			<media:title type="html">Tidier intra-candidate support map</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/scraperwiki-council-elections-assentors.png" medium="image">
			<media:title type="html">scraperwiki council elections - assentors</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/iw-poll-notice-assentors.png" medium="image">
			<media:title type="html">IW poll notice assentors</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/scraperwiki-csv-download.png" medium="image">
			<media:title type="html">Scraperwiki CSV download</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/gephi-org.png" medium="image">
			<media:title type="html">Gephi.org</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/csv-rename.png" medium="image">
			<media:title type="html">csv rename</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/geohi-data-lab-new-project.png" medium="image">
			<media:title type="html">geohi data lab new project</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/gephi-import-spreadsheet.png" medium="image">
			<media:title type="html">gephi import spreadsheet</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/import-data-into-gephi-ata-laboaratory.png" medium="image">
			<media:title type="html">import data into gephi data laboaratory</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/source-and-target-recognised.png" medium="image">
			<media:title type="html">SOurce and Target recognised</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/data-loaded-in.png" medium="image">
			<media:title type="html">Data loaded in</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/mess1.png" medium="image">
			<media:title type="html">mess...</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/force-layout-tweaks.png" medium="image">
			<media:title type="html">force layout tweaks</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/close-up-simple-patterns1.png" medium="image">
			<media:title type="html">Close up simple patterns</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/candidate-support1.png" medium="image">
			<media:title type="html">candidate support</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/lots-going-on.png" medium="image">
			<media:title type="html">lots going on</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/a-few-more-tweaks.png" medium="image">
			<media:title type="html">A few more tweaks</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/improve-the-layout.png" medium="image">
			<media:title type="html">improve the layout</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/colour-the-groups.png" medium="image">
			<media:title type="html">colour the groups</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/gephi-reset-colours.png" medium="image">
			<media:title type="html">Gephi reset colours</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/preview-layout.png" medium="image">
			<media:title type="html">Preview layout</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/this-is-odd.png" medium="image">
			<media:title type="html">this is odd</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/filter-on-nodes-out-degree-2.png" medium="image">
			<media:title type="html">Filter on nodes out degree 2</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/nested-filters.png" medium="image">
			<media:title type="html">nested filters</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/nesting-filters-in-gephi.png" medium="image">
			<media:title type="html">Nesting filters in gephi</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/05/tidier-intra-candidate-support-map.png" medium="image">
			<media:title type="html">Tidier intra-candidate support map</media:title>
		</media:content>
	</item>
		<item>
		<title>Simple Map Making With Google Fusion Tables</title>
		<link>http://blog.ouseful.info/2013/05/01/simple-map-making-with-google-fusion-tables/</link>
		<comments>http://blog.ouseful.info/2013/05/01/simple-map-making-with-google-fusion-tables/#comments</comments>
		<pubDate>Wed, 01 May 2013 21:03:29 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[School_Of_Data]]></category>
		<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[Fusion Table]]></category>
		<category><![CDATA[Google Fusion Table]]></category>
		<category><![CDATA[recipe]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=10450</guid>
		<description><![CDATA[A quicker than quick recipe to make a map from a list of addresses in a simple text file using Google Fusion tables&#8230; Here&#8217;s some data (grabbed from The Gravesend Reporter via this recipe) in a simple two column CSV format; the first column contains address data. Here&#8217;s what it looks like when I import [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10450&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A quicker than quick recipe to make a map from a list of addresses in a simple text file using Google Fusion tables&#8230;</p>
<p><a href="https://dl.dropboxusercontent.com/u/1156404/gravesendPollingStations.csv">Here&#8217;s some data</a> (grabbed from <a href="http://www.gravesendreporter.co.uk/news/find_your_polling_station_ahead_of_the_kent_county_council_elections_1_2174988">The Gravesend Reporter</a> via <a href="http://blog.ouseful.info/2013/05/01/a-simple-openrefine-example-tidying-cutnpaste-data-from-a-web-page/">this recipe</a>) in a simple two column CSV format; the first column contains address data. Here&#8217;s what it looks like when I import it into Google Fusion Tables:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8698522451/" title="data in a fusion table by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8134/8698522451_dcf1c1a51e_z.jpg" width="640" height="193" alt="data in a fusion table"></a></p>
<p>Now let&#8217;s map it:-)</p>
<p>First of all we need to tell the application which column contains the data we want to geocode &#8211; that is, the addrerss we want Fusion Tables to find the latitude and longitude co-ordinates for&#8230;</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8699647770/" title="tweak the column by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8536/8699647770_beddbd001c.jpg" width="330" height="236" alt="tweak the column"></a></p>
<p>Then we say we want the column to be recognised as a column type:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8699652382/" title="change name make location by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8263/8699652382_610512e32d.jpg" width="500" height="363" alt="change name make location"></a></p>
<p>Computer says yes, highlighting the location type cells with a yellow background:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8699653502/" title="fusion table.. yellow... by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8551/8699653502_f42978c6d9.jpg" width="488" height="324" alt="fusion table.. yellow..."></a></p>
<p>As if by magic a Map tab appears (though possibly not if you are using Google Fusion Tables as apart of a Google Apps account&#8230;) The geocoder also accepts hints, so we can make life easier for it by providing one;-)</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8699657324/" title="map tab... by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8115/8699657324_cbc17b648f_z.jpg" width="640" height="335" alt="map tab..."></a></p>
<p>Once the points have been geocoded, they&#8217;re placed onto a map:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8699661716/" title="mapped by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8115/8699661716_ff9a6c0973.jpg" width="455" height="460" alt="mapped"></a></p>
<p>We can now publish the map in preparation for sharing it with the world&#8230;</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8698539091/" title="publish map by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8260/8698539091_21ed56f84c.jpg" width="259" height="258" alt="publish map"></a></p>
<p>We need to change the visibility of the map to something folk can see!</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8698544583/" title="privacy and link by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8259/8698544583_4a53f5901a.jpg" width="453" height="316" alt="privacy and link"></a></p>
<p>Public on the web, or just via a shared link &#8211; your choice:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8699669550/" title="make seeable by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8267/8699669550_f5f00e8c69.jpg" width="496" height="313" alt="make seeable"></a></p>
<p><a href="https://www.google.com/fusiontables/DataSource?docid=1dofdML42B5Jd5wjOgYRI9q-5CmxXejZmM4Bf2CY">Here&#8217;s my map:-)</a></p>
<p><em>The data used to generate this map was originally grabbed from the Gravesend Reporter: <a href="http://www.gravesendreporter.co.uk/news/find_your_polling_station_ahead_of_the_kent_county_council_elections_1_2174988">Find your polling station ahead of the Kent County Council elections</a>. A walkthrough of how the data was prepared can be found here: <a href="http://blog.ouseful.info/2013/05/01/a-simple-openrefine-example-tidying-cutnpaste-data-from-a-web-page/">A Simple OpenRefine Example – Tidying Cut’n&#8217;Paste Data from a Web Page</a>.</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/10450/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/10450/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10450&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/05/01/simple-map-making-with-google-fusion-tables/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8134/8698522451_dcf1c1a51e_z.jpg" medium="image">
			<media:title type="html">data in a fusion table</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8536/8699647770_beddbd001c.jpg" medium="image">
			<media:title type="html">tweak the column</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8263/8699652382_610512e32d.jpg" medium="image">
			<media:title type="html">change name make location</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8551/8699653502_f42978c6d9.jpg" medium="image">
			<media:title type="html">fusion table.. yellow...</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8115/8699657324_cbc17b648f_z.jpg" medium="image">
			<media:title type="html">map tab...</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8115/8699661716_ff9a6c0973.jpg" medium="image">
			<media:title type="html">mapped</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8260/8698539091_21ed56f84c.jpg" medium="image">
			<media:title type="html">publish map</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8259/8698544583_4a53f5901a.jpg" medium="image">
			<media:title type="html">privacy and link</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8267/8699669550_f5f00e8c69.jpg" medium="image">
			<media:title type="html">make seeable</media:title>
		</media:content>
	</item>
		<item>
		<title>A Simple OpenRefine Example &#8211; Tidying Cut&#8217;n&#039;Paste Data from a Web Page</title>
		<link>http://blog.ouseful.info/2013/05/01/a-simple-openrefine-example-tidying-cutnpaste-data-from-a-web-page/</link>
		<comments>http://blog.ouseful.info/2013/05/01/a-simple-openrefine-example-tidying-cutnpaste-data-from-a-web-page/#comments</comments>
		<pubDate>Wed, 01 May 2013 20:23:14 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[OpenRefine]]></category>
		<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[data cleaning]]></category>
		<category><![CDATA[openrefine]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=10446</guid>
		<description><![CDATA[Here&#8217;s a quick walkthrough of how to use OpenRefine to prepare a simple data file. The original data can be found on a web page that looks like this (h/t/ The Gravesend Reporter): Take a minute or two to try to get your head round how this data is structured&#8230; What do you see? I [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10446&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Here&#8217;s a quick walkthrough of how to use <a href="http://openrefine.org/">OpenRefine</a> to prepare a simple data file. The original data can be found on <a href="http://www.gravesendreporter.co.uk/news/find_your_polling_station_ahead_of_the_kent_county_council_elections_1_2174988">a web page that looks like this</a> (h/t/ The Gravesend Reporter):</p>
<p><a href="http://www.gravesendreporter.co.uk/news/find_your_polling_station_ahead_of_the_kent_county_council_elections_1_2174988" title="polling station list by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8126/8699573116_895e6e57d4_z.jpg" width="373" height="640" alt="polling station list"></a></p>
<p>Take a minute or two to try to get your head round how this data is structured&#8230; What do you see? I see different groups of addresses, one per line, separated by blank lines and grouped by &#8220;section headings&#8221; (ward names perhaps?). The ward names (if that&#8217;s what they are) are uniquely identified by the colon that ends the line they&#8217;re on. <em>None of the actual address lines contain a colon.</em></p>
<p>Here&#8217;s how I want the data to look after I&#8217;ve cleaned it:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8698522451/" title="data in a fusion table by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8134/8698522451_dcf1c1a51e_z.jpg" width="640" height="193" alt="data in a fusion table"></a></p>
<p>Can you see what needs to be done? <em>Somehow, we need to:</p>
<p>- remove the blank lines;<br />
- generate a second column containing the name of the ward each address applies to;<br />
- remove the colon from the ward name;<br />
- remove the rows that contained the original ward names.</em></p>
<p>If we highlight the data in the web page, copy it and paste it into a text editor, it looks like this:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8699586136/" title="polling stations by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8266/8699586136_7436bb9248_z.jpg" width="607" height="251" alt="polling stations"></a></p>
<p>We can also paste the data into a new OpenRefine Project:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8698470377/" title="paste data into OpenRefine by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8557/8698470377_9d453f0b48_z.jpg" width="640" height="302" alt="paste data into OpenRefine"></a></p>
<p>We can use OpenRefine&#8217;s import data tools to clean the blank lines out of the original pasted data:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8698472505/" title="OpenRefine parse line data by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8393/8698472505_41889d0061_z.jpg" width="640" height="289" alt="OpenRefine parse line data"></a></p>
<p>But how do we get rid of the section headings, and use them as second column entries so we can see which area each address applies to?</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8698477241/" title="OpenRefine data in - more cleaning required by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8262/8698477241_9cf3c6296c_z.jpg" width="526" height="328" alt="OpenRefine data in - more cleaning required"></a></p>
<p>Let&#8217;s start by filtering to data to only show rows containing the headers, which we note that we could identify because those rows were the only rows to contain a colon character. Then we can create a second column that duplicates these values.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8699608072/" title="cleaning data part 1 by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8268/8699608072_5a2664aba8_z.jpg" width="640" height="336" alt="cleaning data part 1"></a></p>
<p>Here&#8217;s how we create the new column, which we&#8217;ll call &#8220;Wards&#8221;; the cell contents are simply a duplicate of the original column.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8699609782/" title="open refine leave the data the same by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8258/8699609782_d558801dd2_z.jpg" width="640" height="314" alt="open refine leave the data the same"></a></p>
<p>If we delete the filter that was selecting rows where the Column 1 value included a colon, we get the original data back along with a second column.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8699612906/" title="delete the filter by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8397/8699612906_45ebccec13_z.jpg" width="640" height="230" alt="delete the filter"></a></p>
<p>Starting at the top of the column, the &#8220;Fill Down&#8221; cell operation will fill empty cells with the value of the cell above.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8698490919/" title="fill down by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8264/8698490919_766f10a3ce.jpg" width="400" height="310" alt="fill down"></a></p>
<p>If we now add the &#8220;colon filter&#8221; back to Column 1, to just show the area rows, we can highlight all those rows, then delete them. We&#8217;ll then be presented with the two column data set without the area rows.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8699625432/" title="reset filter, star rows, then remove them... by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8268/8699625432_3eccb1c157_z.jpg" width="640" height="254" alt="reset filter, star rows, then remove them..."></a></p>
<p>Let&#8217;s just tidy up the Wards column too, by getting rid of the colon. To do that, we can transform the cell&#8230;</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8698507279/" title="we're going to tidy by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8533/8698507279_b8e539e171.jpg" width="322" height="275" alt="we're going to tidy"></a></p>
<p>&#8230;by replacing the colon with nothing (an empty string).</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8698505239/" title="tidy the column by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8395/8698505239_3b844f8f21_z.jpg" width="640" height="470" alt="tidy the column"></a></p>
<p>Here&#8217;s the data &#8211; neat and tidy:-)</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8699635838/" title="Neat and tidy... by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8140/8699635838_e5715f1c1c_z.jpg" width="612" height="227" alt="Neat and tidy..."></a></p>
<p>To finish, let&#8217;s export the data.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8698513997/" title="prepare to export by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8553/8698513997_80bc162e86.jpg" width="286" height="332" alt="prepare to export"></a></p>
<p>How about sending it to a Google Fusion table (you may be asked to authenticate or verify the request).</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8699639894/" title="upload to fusion table by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8406/8699639894_523e6eb7e4.jpg" width="365" height="214" alt="upload to fusion table"></a></p>
<p>And here it is:-)</p>
<p><a href="http://www.flickr.com/photos/psychemedia/8698522451/" title="data in a fusion table by psychemedia, on Flickr"><img src="http://farm9.staticflickr.com/8134/8698522451_dcf1c1a51e_z.jpg" width="640" height="193" alt="data in a fusion table"></a></p>
<p>So &#8211; that&#8217;s a quick example of some of the data cleaning tricks and operations that OpenRefine supports. There are many, many more, of course&#8230;;-)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/10446/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/10446/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10446&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/05/01/a-simple-openrefine-example-tidying-cutnpaste-data-from-a-web-page/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8126/8699573116_895e6e57d4_z.jpg" medium="image">
			<media:title type="html">polling station list</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8134/8698522451_dcf1c1a51e_z.jpg" medium="image">
			<media:title type="html">data in a fusion table</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8266/8699586136_7436bb9248_z.jpg" medium="image">
			<media:title type="html">polling stations</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8557/8698470377_9d453f0b48_z.jpg" medium="image">
			<media:title type="html">paste data into OpenRefine</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8393/8698472505_41889d0061_z.jpg" medium="image">
			<media:title type="html">OpenRefine parse line data</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8262/8698477241_9cf3c6296c_z.jpg" medium="image">
			<media:title type="html">OpenRefine data in - more cleaning required</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8268/8699608072_5a2664aba8_z.jpg" medium="image">
			<media:title type="html">cleaning data part 1</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8258/8699609782_d558801dd2_z.jpg" medium="image">
			<media:title type="html">open refine leave the data the same</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8397/8699612906_45ebccec13_z.jpg" medium="image">
			<media:title type="html">delete the filter</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8264/8698490919_766f10a3ce.jpg" medium="image">
			<media:title type="html">fill down</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8268/8699625432_3eccb1c157_z.jpg" medium="image">
			<media:title type="html">reset filter, star rows, then remove them...</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8533/8698507279_b8e539e171.jpg" medium="image">
			<media:title type="html">we&#039;re going to tidy</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8395/8698505239_3b844f8f21_z.jpg" medium="image">
			<media:title type="html">tidy the column</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8140/8699635838_e5715f1c1c_z.jpg" medium="image">
			<media:title type="html">Neat and tidy...</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8553/8698513997_80bc162e86.jpg" medium="image">
			<media:title type="html">prepare to export</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8406/8699639894_523e6eb7e4.jpg" medium="image">
			<media:title type="html">upload to fusion table</media:title>
		</media:content>

		<media:content url="http://farm9.staticflickr.com/8134/8698522451_dcf1c1a51e_z.jpg" medium="image">
			<media:title type="html">data in a fusion table</media:title>
		</media:content>
	</item>
		<item>
		<title>A Few More Thoughts on the Forensic Analysis of Twitter Friend and Follower Timelines in a MOOCalytics Context</title>
		<link>http://blog.ouseful.info/2013/04/22/a-few-more-thoughts-on-the-forensic-analysis-of-twitter-friend-and-follower-timelines-in-a-moocalytics-context/</link>
		<comments>http://blog.ouseful.info/2013/04/22/a-few-more-thoughts-on-the-forensic-analysis-of-twitter-friend-and-follower-timelines-in-a-moocalytics-context/#comments</comments>
		<pubDate>Mon, 22 Apr 2013 14:35:20 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Thinkses]]></category>
		<category><![CDATA[Tinkering]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=10374</guid>
		<description><![CDATA[Immediately after posting Evaluating Event Impact Through Social Media Follower Histories, With Possible Relevance to cMOOC Learning Analytics, I took the dog out for a walk to ponder the practicalities of constructing follower (or friend) acquisition charts for accounts with only a low number of followers, or friends, as might be the case for folk [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10374&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Immediately after posting <a href="http://blog.ouseful.info/2013/04/21/evaluating-event-impact-through-social-media-follower-histories-with-possible-relevance-for-mooc-learning-analytics/">Evaluating Event Impact Through Social Media Follower Histories, With Possible Relevance to cMOOC Learning Analytics</a>, I took the dog out for a walk to ponder the practicalities of constructing follower (or friend) acquisition charts for accounts with only a low number of followers, or friends, as might be the case for folk taking a MOOC or who have attended a particular event. One aim I had in mind was to probe the extent to which a MOOC may help developing social ties between folk taking a MOOC,  whether MOOC participants know each other prior taking the MOOC, or whether they come to develop social links after taking the MOOC. Another aim was simply to see whether we could identify from changes in velocity or makeup of follower acquisition curves whether particular events led either to growth in follower numbers or community development between followers.</p>
<p>To recap on the approach used for constructing follower acquisition charts (as described in <a href="http://blog.ouseful.info/2013/04/05/estimated-follower-accession-charts-for-twitter/">Estimated Follower Accession Charts for Twitter</a>, and which also works (in principle!) for plotting when Twitter users started following folk):</p>
<ul>
<li>you can&#8217;t start following someone on Twitter until you join Twitter;</li>
<li>follower lists on Twitter are reverse chronological statements of the order in which folk started following the corresponding account;</li>
<li>starting with the first follower of an account (the bottom end of the follower list), we can estimate when they started following the account from the most recent account creation date seen so far amongst people who started following before that user.</li>
</ul>
<p>A methodological problem arises when we have a low number of followers, because we don&#8217;t necessarily have enough newly created (follower) accounts starting to follow a target account soon after the creation of the follower account to give us solid basis for estimating when folk started following the target account. (If someone creates a new account and then immediately uses it to follow a target account, we get a good sample in time relating to when that follower started following the target account&#8230;If you have lots of people following an account there&#8217;s more of a chance that some of them will be quick-after-creation to start following the target account.)</p>
<p>There may also be methodological problems with trying to run an analysis over a short period of time (too much noise/lack of temporal definition in the follower acquisition curve over a limited time range).</p>
<p>So with low follower numbers, where can we get our timestamps from?</p>
<p>In the context of a MOOC, let&#8217;s suppose that there is a central MOOC account with lots of followers, and those followers don&#8217;t have many friends or followers (certainly not enough for us to be able to generate smooth &#8211; and reliable &#8211;  acquisition curves).</p>
<p>If the MOOC account has lots of followers, let&#8217;s suppose we can generate a reasonable follower acquisition curve from them.</p>
<p>This means that for each follower, fo_i, we can associate with them a time when they started following the MOOC account, fo_i_t. Let&#8217;s write that as fo(MOOC, fo_i)=fo_i_t, where fo(MOOC, fo_i) reads &#8220;the estimated time when MOOC is followed by fo_i&#8221;.</p>
<p>(I&#8217;m making this up as I&#8217;m going along&#8230;;)</p>
<p>If we look at the friends of fo_i (that is, the people they follow), we know that they started following the MOOC account at time fo_i_t. So let&#8217;s write that as fr(fo_i, MOOC)=fo_i_t, where fr(fo_i, MOOC) reads &#8220;the estimated time when fo_i friends MOOC&#8221;.</p>
<p>Since public friend/follower relationsships are symmetrical on Twitter (if A friends B, then B is at that instant followed by A), we can also write fr(fo_i, MOOC) = fo(MOOC, fo_i), which is to say that the time when fo_i friends MOOC is the same time as when MOOC is followed by fo_i.</p>
<p>Got that?!;-) (I&#8217;m still making this up as I&#8217;m going along&#8230;!)</p>
<p>We now have a sample in time for calibrating at least a single point in the friend acquisition chart for fo_i. If fo_i follows other &#8220;celebrity&#8221; accounts for which we can generate reasonably sound follower acquisition charts, we should be able to add other timestamp estimates into the friend acquisition timeline.</p>
<p>If fo_i follows three accounts A,B,C in that order, with fr(fo_i,A)=t1 and fr(fo_i,C)=t2, we know that fr(fo_i,B) lies somewhere between t1 and t2, where t1 &lt; t2, let&#8217;s call that [t1,t2], reading it as [not earlier than t1, not later than t2]. Which is to say, fr(fo_i,B)=[t1,t2], or &#8220;fo_i makes friends with B not before t1 and not after t2&#8243;, or more simply &#8220;fo_i makes friends with B somewhen between t1 and t2&#8243;.</p>
<p>Let&#8217;s now look at fo_j, who has only a few followers, one of whom is fo_i. Suppose that fo_j is actually account B. We know that fo(fo_j,fo_i), and furthermore that fo(fo_j,fo_i)=fr(fo_i,fo_j). Since we know that fr(fo_i,B)=[t1,t2], and B=fo_j, we know that fr(fo_i,fo_j)=[t1,t2]. (Just swap the symbols in and out of the equations&#8230;) But what we now also have is a timestamp estimate into the followers list for fo_j, that is: fo(fo_j,fo_i)=[t1,t2].</p>
<p>If MOOC has lots of friends, as well as lots of followers, and MOOC has a policy of following back followers immediately, we can use it to generate timestamp probes into the friend timelines of its followers, via fo(MOOC,X)=fr(X,MOOC), and its friends, via  fr(MOOC,Y)=fo(Y,MOOC). (We should be able to use other accounts with large friend or follower accounts and reasonably well defined acquisition curves to generate additional samples?)</p>
<p>We can possibly also start to play off the time intervals from friend and follower curves against each other to try and reduce the uncertainty within them (that is, the range of them).</p>
<p>For example, if we have fr(fo_i,B)=[t1,t2], and from fo(B,fo_i)=[t3,t4], if t3 &gt; t1, we can tighten up fr(fo_i,B)=[t3,t2]. Similarly, if t2 &lt; t4, we can tighten up fo(B,fo_i)=[t3,t2]. Which I think in general is:</p>
<p><tt>if fr(A,B)=[t1,t2] and fo(B,A)=[t3,t4], we can tighten up to fr(A,B) = fo(B,A) = [ greater_of(t1,t3), lesser_of(t2,t4) ]</tt></p>
<p>Erm, maybe? (I should probably read through that again to check the logic!) Things also get a little more complex when we only have time range estimates for most of the friends or followers, rather than good single point timestamp estimates for when they were friended or started to follow&#8230;;-) I&#8217;ll leave it as an exercise for the reader to figure hout how to write that down and solve it!;-)]</p>
<p>If this thought experiment does work out, then a several rules of thumb jump out if we want to maximise our chances of generating reasonably accurate friend and follower acquisition curves:</p>
<p>- set up your MOOC Twitter account close to the time you want to start using it so it&#8217;s creation date is as late as possible;<br />
- encourage folk to follow the MOOC account, and follow back, to improve the chances of getting reasonable resolution in the follower acquisition curve for the MOOC account. These  connections also provide time-estimated probes into follower acquisition curves of friends and friend acquisition curves of followers;<br />
- consider creating new &#8220;fake&#8221; timestamp Twitter accounts than can immediately on creation follow and be friended by the MOOC account to place temporal markers into the acquisition curves;<br />
- if followers follow other celebrity accounts (or are followed (back) by them), we should be able to generate timestamp samples by analysing the celebrity account acquisition curves.</p>
<p>I think I need to go and walk the dog again.</p>
<p>PS a couple more trivial fixed points: for a target account, the earliest time at which they were first followed or when they first friended another account is the creation date of the target account; the latest possible time they acquired their most recent friend or follower is the time at which the data was collected.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/10374/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/10374/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10374&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/04/22/a-few-more-thoughts-on-the-forensic-analysis-of-twitter-friend-and-follower-timelines-in-a-moocalytics-context/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>
	</item>
		<item>
		<title>By Me, on the Scraperwiki Blog: Glue Logic and Flowable Data</title>
		<link>http://blog.ouseful.info/2013/03/06/by-me-on-the-scraperwiki-blog-glue-logic-and-flowable-data/</link>
		<comments>http://blog.ouseful.info/2013/03/06/by-me-on-the-scraperwiki-blog-glue-logic-and-flowable-data/#comments</comments>
		<pubDate>Wed, 06 Mar 2013 13:13:11 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[elsewhere]]></category>
		<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[scraperwiki]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=10040</guid>
		<description><![CDATA[Regular readers will know I quite often make use of Scraperwiki for grabbing datasets and hosting views over scraped scraped data. A few days ago, I contributed a guest post to the Scraperwiki blog: As well as being a great tool for scraping and aggregating content from third party sites, Scraperwiki can be used as [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10040&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Regular readers will know I quite often make use of Scraperwiki for grabbing datasets and hosting views over scraped scraped data. A few days ago, I contributed a guest post to the Scraperwiki blog:</p>
<blockquote><p>As well as being a great tool for scraping and aggregating content from third party sites, Scraperwiki can be used as a transformational “glue logic” tool:  joining together applications that utilise otherwise incompatible data formats. Typically, we might think of using a scraper to pull data into one or more Scraperwiki database tables and then a view to develop an application style view over the data. Alternatively, we might just download the data so that we can analyse it elsewhere. There is another way of using Scraperwiki, though, and that is to give life to data as <em>flowable web data</em>.</p></blockquote>
<p>Read the whole thing here: <a href="http://blog.scraperwiki.com/2013/03/04/glue-logic-and-flowable-data/">Glue Logic and Flowable Data</a>.</p>
<p>PS I hope to write more about &#8220;flowable data&#8221;, feeds, and feed enrichment in a later post here on <a href="http://blog.ouseful.info">OUseful.info</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/10040/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/10040/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=10040&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/03/06/by-me-on-the-scraperwiki-blog-glue-logic-and-flowable-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>
	</item>
		<item>
		<title>Further Dabblings with the Cloudworks API</title>
		<link>http://blog.ouseful.info/2013/01/17/further-dabblings-with-the-cloudworks-api/</link>
		<comments>http://blog.ouseful.info/2013/01/17/further-dabblings-with-the-cloudworks-api/#comments</comments>
		<pubDate>Thu, 17 Jan 2013 23:00:21 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Anything you want]]></category>
		<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[cloudworks]]></category>
		<category><![CDATA[oldsmooc]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=9529</guid>
		<description><![CDATA[Picking up on A Couple of Proof of Concept Demos with the Cloudworks API, and some of the comments that came in around it (thanks Sheila et al:-), I spent a couple more hours tinkering around it and came up with the following&#8230; A prettier view, stolen from Mike Bostock (I think?) I also added [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9529&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Picking up on <a href="http://blog.ouseful.info/2013/01/16/a-couple-of-proof-of-concept-demos-with-the-cloudworks-api/">A Couple of Proof of Concept Demos with the Cloudworks API</a>, and some of the comments that came in around it (thanks Sheila et al:-), I spent a couple more hours tinkering around it and came up with the following&#8230;</p>
<p>A prettier view, stolen from Mike Bostock (I think?)</p>
<p><a href="https://views.scraperwiki.com/run/cloudworks_network_d3js_force_directed_view_pretti/?userID=1174&amp;filterNdegree=2&amp;viewtype=usercloudcloudscapefollower"><img src="http://ouseful.files.wordpress.com/2013/01/prettier-view-d3js-force-directed-layout.png?w=700&#038;h=377" alt="prettier view d3js force directed layout" width="700" height="377" class="alignnone size-full wp-image-9530" /></a></p>
<p>I also added a slider to tweak the layout (opening it up by increasing the repulsion between nodes) [h/t @mhawksey for the trick I needed to make this work] but still need to figure this out a bit more&#8230;</p>
<p>I also added in some new parameterised ways of accessing various different views over Cloudworks data using the root <tt><br />
<a href="https://views.scraperwiki.com/run/cloudworks_network_d3js_force_directed_view_pretti/" rel="nofollow">https://views.scraperwiki.com/run/cloudworks_network_d3js_force_directed_view_pretti/</a><br />
</tt></p>
<p>Firstly, we can make calls of the form: <a href="https://views.scraperwiki.com/run/cloudworks_network_d3js_force_directed_view_pretti/?cloudscapeID=2451&amp;viewtype=cloudscapecloudcloudscape"><tt>?cloudscapeID=2451&amp;viewtype=cloudscapecloudcloudscape</tt></a></p>
<p><a href="http://ouseful.files.wordpress.com/2013/01/cloudworks-cloudscapes-by-cloud.png"><img src="http://ouseful.files.wordpress.com/2013/01/cloudworks-cloudscapes-by-cloud.png?w=700&#038;h=543" alt="cloudworks cloudscapes by cloud" width="700" height="543" class="alignnone size-full wp-image-9533" /></a></p>
<p>This grabs the clouds associated with a particular cloudscape (given the cloudscape ID), and then constructs the network containing those clouds and all the cloudscapes they are associated with.</p>
<p>The next view uses a parameter set of the form <a href="https://views.scraperwiki.com/run/cloudworks_network_d3js_force_directed_view_pretti/?cloudscapeID=2451&amp;viewtype=cloudscapecloudtags"><tt>cloudscapeID=2451&amp;viewtype=cloudscapecloudtags</tt></a> and displays the clouds associated with a particular cloudscape (given the cloudscape ID), along with the tags associated with each cloud:</p>
<p><a href="https://views.scraperwiki.com/run/cloudworks_network_d3js_force_directed_view_pretti/?cloudscapeID=2451&amp;viewtype=cloudscapecloudtags"><img src="http://ouseful.files.wordpress.com/2013/01/cloudworks-cloudscape-cloud-tags.png?w=700&#038;h=475" alt="cloudworks cloudscape cloud tags" width="700" height="475" class="alignnone size-full wp-image-9534" /></a></p>
<p>Even though there aren&#8217;t many nodes or edges, this is quite a cluttered view, so I maybe need to rethink how best to visualise this information?</p>
<p>I&#8217;ve also done a couple of views that make use of follower data. For example, here&#8217;s how to call on a view that visualises how the folk who follow a particular cloudscape follow each other (this is actually the default if no <tt>viewtype</tt> is given) -<br />
<a href="https://views.scraperwiki.com/run/cloudworks_network_d3js_force_directed_view_pretti/?cloudscapeID=2451&amp;viewtype=cloudscapeinnerfollowers">cloudscapeID=2451&amp;viewtype=cloudscapeinnerfollowers</a></p>
<p><a href="https://views.scraperwiki.com/run/cloudworks_network_d3js_force_directed_view_pretti/?cloudscapeID=2451&amp;viewtype=cloudscapeinnerfollowers"><img src="http://ouseful.files.wordpress.com/2013/01/cloudworks-cloudscape-innerfollowers.png?w=700" alt="cloudworks cloudscape innerfollowers"   class="alignnone size-full wp-image-9537" /></a></p>
<p>And here&#8217;s how to call a view that grabs a particular user&#8217;s clouds, looks up the cloudscapes they belong to, then graphs those cloudscapes and the people who follow them: <a href="https://views.scraperwiki.com/run/cloudworks_network_d3js_force_directed_view_pretti/?userID=1174&amp;viewtype=usercloudcloudscapefollower">?userID=1174&amp;viewtype=usercloudcloudscapefollower</a></p>
<p><a href="https://views.scraperwiki.com/run/cloudworks_network_d3js_force_directed_view_pretti/?userID=1174&amp;viewtype=usercloudcloudscapefollower"><img src="http://ouseful.files.wordpress.com/2013/01/cloudworks-followers-of-cloudscapes-containing-a-users-clouds.png?w=700&#038;h=579" alt="cloudworks followers of cloudscapes containing a user&#039;s clouds" width="700" height="579" class="alignnone size-full wp-image-9535" /></a></p>
<p>Here&#8217;s another way of describing that graph &#8211; <em>followers of cloudscapes containing a user&#8217;s clouds</em>.</p>
<p>The optional argument <tt>filterNdegree=N</tt> (where N is an integer) will filter the diaplayed network to remove nodes with degree &lt;=N. Here&#8217;s the above example, but filtered to remove the nodes that have degree 2 or less: <a href="https://views.scraperwiki.com/run/cloudworks_network_d3js_force_directed_view_pretti/?userID=1174&amp;viewtype=usercloudcloudscapefollower&amp;filterNdegree=2"><tt>?userID=1174&amp;viewtype=usercloudcloudscapefollower&amp;filterNdegree=2</tt></a></p>
<p><a href="https://views.scraperwiki.com/run/cloudworks_network_d3js_force_directed_view_pretti/?userID=1174&amp;viewtype=usercloudcloudscapefollower&amp;filterNdegree=2"><img src="http://ouseful.files.wordpress.com/2013/01/cloudworks-graph-filtered.png?w=700" alt="cloudworks graph filtered"   class="alignnone size-full wp-image-9536" /></a></p>
<p>That is, we prune the graph of people who follow no more than two of the cloudscapes to which the specified user has added a cloud. In other words, we depict folk who follow at least three of the cloudscapes to which the specified user has added a cloud.</p>
<p>(Note that on inspecting that graph it looks as if there is at least one node that has degree 2, rather than degree 3 and above. I&#8217;m guessing that it originally had degree 3 or more but that at least one of the nodes it was connected to was pruned out? If that isn&#8217;t the case, something&#8217;s going wrong&#8230;)</p>
<p>Also note that it would be neater to pull in the whole graph and filter the d3.js rendered version interactively, but I don&#8217;t know how to do this?</p>
<p>However&#8230;I also added a parameter to the script that generates the JSON data files from data pulled from the Cloudworks API calls that allows me to generate a GEXF network file that can be saved as an XML file (.gexf suffix, cf. <a href="http://blog.ouseful.info/2012/04/03/visualising-networks-in-gephi-via-a-scraperwiki-exported-gexf-file/">Visualising Networks in Gephi via a Scraperwiki Exported GEXF File</a>) and then visualised using a tool such as Gephi. The trick? Add the URL parameter <tt>&amp;format=gexf</tt> (the (optional) default is <tt>&amp;format=json</tt>) [<a href="https://views.scraperwiki.com/run/cloudworks_network/?format=gexf&amp;userID=1174&amp;viewtype=usercloudcloudscapefollower&amp;filterNdegree=1">example</a>].</p>
<p><a href="http://ouseful.files.wordpress.com/2013/01/gephiview-of-cloudworks-graph.png"><img src="http://ouseful.files.wordpress.com/2013/01/gephiview-of-cloudworks-graph.png?w=700&#038;h=518" alt="gephiview of cloudworks graph" width="700" height="518" class="alignnone size-full wp-image-9531" /></a></p>
<p>Gephi, of course, is a wonderful tool for the interactive exploration of graph-based data sets&#8230;. including a wide range of filters&#8230;</p>
<p>So, where are we at? The d3.js force directed layout is all very shiny but the graphs quickly get cluttered. I&#8217;m not sure if there are any interactive parameter controls I can add, but at the moment the visualisations border on the useless. At the very least, I need to squirt a header into the page from the supplied parameters so we know what the visualisation refers to. (The data I&#8217;ve played with to date &#8211; which has been very limited &#8211; doesn&#8217;t seem to be that interesting either from what I&#8217;ve seen? But maybe the rich structure isn&#8217;t there yet? Or maybe there is nothing to be had from these simple views?)</p>
<p>It may be worth exploring some other visualisation types to see if they are any more legible, at least, though it would be even more helpful if they were simply more informative ;-)</p>
<p>PS just in case, here&#8217;s a link to the <a href="http://cdn.bitbucket.org/cloudengine/cloudengine/downloads/Cloudworks_API-2010-06-15.pdf">Cloudworks API documentation</a>.</p>
<p>PPS if there are any terms of service associated with the API, I didn&#8217;t read them. So if I broke them, oops. But that said &#8211; such is life; never ever trust that anybody you give data to will look after it;-)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/9529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/9529/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9529&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/01/17/further-dabblings-with-the-cloudworks-api/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/01/prettier-view-d3js-force-directed-layout.png" medium="image">
			<media:title type="html">prettier view d3js force directed layout</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/01/cloudworks-cloudscapes-by-cloud.png" medium="image">
			<media:title type="html">cloudworks cloudscapes by cloud</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/01/cloudworks-cloudscape-cloud-tags.png" medium="image">
			<media:title type="html">cloudworks cloudscape cloud tags</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/01/cloudworks-cloudscape-innerfollowers.png" medium="image">
			<media:title type="html">cloudworks cloudscape innerfollowers</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/01/cloudworks-followers-of-cloudscapes-containing-a-users-clouds.png" medium="image">
			<media:title type="html">cloudworks followers of cloudscapes containing a user&#039;s clouds</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/01/cloudworks-graph-filtered.png" medium="image">
			<media:title type="html">cloudworks graph filtered</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/01/gephiview-of-cloudworks-graph.png" medium="image">
			<media:title type="html">gephiview of cloudworks graph</media:title>
		</media:content>
	</item>
		<item>
		<title>A Couple of Proof of Concept Demos With the Cloudworks API</title>
		<link>http://blog.ouseful.info/2013/01/16/a-couple-of-proof-of-concept-demos-with-the-cloudworks-api/</link>
		<comments>http://blog.ouseful.info/2013/01/16/a-couple-of-proof-of-concept-demos-with-the-cloudworks-api/#comments</comments>
		<pubDate>Wed, 16 Jan 2013 22:08:46 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[cloudworks]]></category>
		<category><![CDATA[oldsmooc]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=9521</guid>
		<description><![CDATA[Via a tweet from @mhawksey in response to a tweet from @sheilmcn, or something like that, I came across a post by Sheila on the topic of Cloud gazing, maps and networks &#8211; some thoughts on #oldsmooc so far. The post mentioned a prototyped mindmap style browser for Cloudworks, created in part to test out [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9521&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Via a tweet from @mhawksey in response to a tweet from @sheilmcn, or something like that, I came across a post by Sheila on the topic of <a href="http://blogs.cetis.ac.uk/sheilamacneill/2013/01/14/cloud-gazing-maps-and-networks-some-thoughts-on-oldsmooc-so-far/">Cloud gazing, maps and networks &#8211; some thoughts on #oldsmooc so far</a>. The post mentioned a prototyped mindmap style browser for Cloudworks, created in part to test out the Cloudworks API.</p>
<p>Having tinkered with mindmap style presentations using the d3.js library in the browser before (<a href="http://blog.ouseful.info/2012/05/11/viewing-openlearn-mindmaps-using-d3-js/">Viewing OpenLearn Mindmaps Using d3.js</a>; the app itself may well have rotted by now) I thought I&#8217;d have a go at exploring something similar for Cloudworks. With a promptly delivered API key by Nick Freear, it only took a few minutes to repurpose an old script to cast a test call to the Cloudworks API into a form that could easily be visualised using the d3.js library. The approach I took? To grab JSON data from the API, construct a tree using the Python networkx library, and drop a JSON serialisation of the network into a templated d3.js page. (<a href="http://networkx.lanl.gov/reference/readwrite.json_graph.html">networkx has a couple of JSON export functions</a> that will create tree based and graph/network based JSON data structures that d3.js can feed from.</p>
<p>Here&#8217;s the Python fragment:</p>
<pre class="brush: python; title: ; notranslate">#http://cloudworks.ac.uk/api/clouds/{cloud_id}.{format}?api_key={api_key}

import urllib2,json, networkx as nx
from networkx.readwrite import json_graph

id=cloudscapeID #need logic

urlstub=&quot;http://cloudworks.ac.uk/api/&quot;
urlcloudscapestub=urlstub+&quot;cloudscapes/&quot;+str(id)
urlsuffix=&quot;.json?api_key=&quot;+str(key)

ctyp=&quot;/clouds&quot;
url=urlcloudscapestub+ctyp+urlsuffix

entities=json.load(urllib2.urlopen(url))

#print entities

#I seem to remember issues with non-ascii before, though maybe that was for XML? Hmmm...
def ascii(s): return &quot;&quot;.join(i for i in s.encode('utf-8') if ord(i)&lt;128)

def graphRoot(DG,title,root=1):
    DG.add_node(root,name=ascii(title))
    return DG,root

def gNodeAdd(DG,root,node,name):
    node=node+1
    DG.add_node(node,name=ascii(name))
    DG.add_edge(root,node)
    return DG,node

DG=nx.DiGraph()
DG,root=graphRoot(DG,id)
currnode=root

#This simple example just grabs a list of clouds associated with a cloudscape
for c in entities['items']:
    DG,currnode=gNodeAdd(DG,root,currnode,c['title'])
    
#We're going to use the tree based JSON data format to feed the d3.js mindmap view
jdata = json_graph.tree_data(DG,root=1)
#print json.dumps(jdata)

#The page template is defined elsewhere.
#It loads the JSON from a declaration in the Javascript of the form: jsonData=%(jdata)s
print page_template % vars()</pre>
<p>The rendered view is something along the lines of:</p>
<p><a href="https://views.scraperwiki.com/run/cloudworks_mindmap/?cloudscapeID=2451"><img src="http://ouseful.files.wordpress.com/2013/01/cloudscapetree.png?w=700&#038;h=643" alt="cloudscapeTree" width="700" height="643" class="alignnone size-full wp-image-9522" /></a></p>
<p>You can find the original code <a href="https://scraperwiki.com/views/cloudworks_mindmap/">here</a>.</p>
<p>Now I know that: a) this isn&#8217;t very interesting to look at; and b) doesn&#8217;t even work as a navigation surface, but my intention was purely to demonstrate a recipe from getting data out of the Cloudworks API and into a d3.js mindmap view in the browser, and it does that. A couple of obvious next steps: i) add in additional API calls to grow the tree (easy); ii) linkify some of the nodes (I&#8217;m not sure I know who to do that at them moment?)</p>
<p>Sheila&#8217;s post ended with a brief reflection: &#8220;I&#8217;m also now wondering if a network diagram of cloudscape (showing the interconnectedness between clouds, cloudscapes and people) would be helpful ? Both in terms of not only visualising and conceptualising networks but also in starting to make more explicit links between people, activities and networks.&#8221; </p>
<p>So here&#8217;s another recipe, again using networkx but this time dropping the data into a graph based JSON format and using the d3.js force based layout to render it. What the script does is grab the followers of a particular cloudscape, grab each of their followers, and then graph how the followers of a particular cloudscape follow each other.</p>
<p>Because I had some problems getting the data into the template, I also used a slightly different wiring approach: </p>
<pre class="brush: python; title: ; notranslate">import urllib2,json,scraperwiki,networkx as nx
from networkx.readwrite import json_graph

id=cloudscapeID #need logic
typ='cloudscape'

urlstub=&quot;http://cloudworks.ac.uk/api/&quot;
urlcloudscapestub=urlstub+&quot;cloudscapes/&quot;+str(id)
urlsuffix=&quot;.json?api_key=&quot;+str(key)

ctyp=&quot;/followers&quot;
url=urlcloudscapestub+ctyp+urlsuffix
entities=json.load(urllib2.urlopen(url))

def ascii(s): return &quot;&quot;.join(i for i in s.encode('utf-8') if ord(i)&lt;128)

def getUserFollowers(id):
    urlstub=&quot;http://cloudworks.ac.uk/api/&quot;
    urluserstub=urlstub+&quot;users/&quot;+str(id)
    urlsuffix=&quot;.json?api_key=&quot;+str(key)

    ctyp=&quot;/followers&quot;
    url=urluserstub+ctyp+urlsuffix
    results=json.load(urllib2.urlopen(url))
    #print results
    f=[]
    for r in results['items']: f.append(r['user_id'])
    return f

DG=nx.DiGraph()

followerIDs=[]

#Seed graph with nodes corresponding of followers of a cloudscape
for c in entities['items']:
    curruid=c['user_id']
    DG.add_node(curruid,name=ascii(c['name']).strip())
    followerIDs.append(curruid)

#construct graph of how followers of a cloudscape follow each other
for c in entities['items']:
    curruid=c['user_id']
    followers=getUserFollowers(curruid)
    for followerid in followers:
        if followerid in followerIDs:
            DG.add_edge(curruid,followerid)

scraperwiki.utils.httpresponseheader(&quot;Content-Type&quot;, &quot;text/json&quot;)

#Print out the json representation of the network/graph as JSON
jdata = json_graph.node_link_data(DG)
print json_graph.dumps(jdata)
</pre>
<p>In <a href="https://scraperwiki.com/views/cloudworks_network/">this case</a>, I generate a JSON representation of the network that is then loaded into a <a href="https://scraperwiki.com/views/cloudworks_network_d3js_force_directed_view/">separate HTML page</a> that deploys the d3.js force directed layout visualisation, in this case how the followers of a particular cloudscape follow each other.</p>
<p><a href="https://views.scraperwiki.com/run/cloudworks_network_d3js_force_directed_view/?cloudscapeID=2451"><img src="http://ouseful.files.wordpress.com/2013/01/cloudworks_innerfrendsnet.png?w=700" alt="cloudworks_innerfrendsNet"   class="alignnone size-full wp-image-9523" /></a></p>
<p>This hits the Cloudworks API once for the cloudscape, then once for each follower of the cloudscape, in order to construct the graph and then pass the JSON version to the HTML page.</p>
<p>Again, I&#8217;m posting it as a minimum viable recipe that could be developed as a way of building out Sheila&#8217;s idea (though the graph definition would probably need to be a little more elaborate, eg in terms of node labeling). Some work on the graph rendering probably wouldn&#8217;t go amiss either, eg in respect of node sizing, colouring and labeling.</p>
<p>Still, what do you expect in just a couple of hours?!;-)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/9521/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/9521/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9521&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/01/16/a-couple-of-proof-of-concept-demos-with-the-cloudworks-api/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/01/cloudscapetree.png" medium="image">
			<media:title type="html">cloudscapeTree</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2013/01/cloudworks_innerfrendsnet.png" medium="image">
			<media:title type="html">cloudworks_innerfrendsNet</media:title>
		</media:content>
	</item>
		<item>
		<title>WordPress Stats in R</title>
		<link>http://blog.ouseful.info/2013/01/09/wordpress-stats-in-r/</link>
		<comments>http://blog.ouseful.info/2013/01/09/wordpress-stats-in-r/#comments</comments>
		<pubDate>Wed, 09 Jan 2013 11:50:14 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Anything you want]]></category>
		<category><![CDATA[Rstats]]></category>
		<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[Wordpress]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=9447</guid>
		<description><![CDATA[A trackback from Martin Hawksey&#8217;s recent post on Analysing WordPress post velocity and momentum stats with Google Sheets (Spreadsheet), which demonstrates how to pull WordPress stats into a Google Spreadsheet and generates charts and reports therein, reminded me of the WordPress stats API. So here&#8217;s a quick function for pulling WordPress reports into R. (Code [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9447&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A <a href="http://blog.ouseful.info/2011/12/15/jisc-project-blog-metrics-making-use-of-wordpress-stats-plus-an-aside/">trackback</a> from Martin Hawksey&#8217;s recent post on <a href="http://mashe.hawksey.info/2013/01/wordpress-postviews-stats/">Analysing WordPress post velocity and momentum stats with Google Sheets (Spreadsheet)</a>, which demonstrates how to pull WordPress stats into a Google Spreadsheet and generates charts and reports therein, reminded me of the WordPress stats API.</p>
<p>So here&#8217;s a quick function for pulling WordPress reports into R.</p>
<pre class="brush: r; title: ; notranslate">#Wordpress Stats
##---------------
#Wordpress Stats API docs (from http://stats.wordpress.com/csv.php)

#You can get a copy of your API key (required) from Akismet:
#Login with you WordPress account: http://akismet.com/account/
#Resend API key: https://akismet.com/resend/

#Required parameters: api_key, blog_id or blog_uri.
#Optional parameters: table, post_id, end, days, limit, summarize.

#Parameters:
#api_key     String    A secret unique to your WordPress.com user account.
#blog_id     Integer   The number that identifies your blog. Find it in other stats URLs.
#blog_uri    String    The full URL to the root directory of your blog. Including the full path.
#table       String    One of views, postviews, referrers, referrers_grouped, searchterms, clicks, videoplays.
#post_id     Integer   For use with postviews table.
#end         String    The last day of the desired time frame. Format is 'Y-m-d' (e.g. 2007-05-01) and default is UTC date.
#days        Integer   The length of the desired time frame. Default is 30. &quot;-1&quot; means unlimited.
#period      String    For use with views table and the 'days' parameter. The desired time period grouping. 'week' or 'month'
#Use 'days' as the number of results to return (e.g. '&amp;period=week&amp;days=12' to return 12 weeks)
#limit       Integer   The maximum number of records to return. Default is 100. &quot;-1&quot; means unlimited. If days is -1, limit is capped at 500.
#summarize   Flag      If present, summarizes all matching records.
#format      String    The format the data is returned in, 'csv', 'xml' or 'json'. Default is 'csv'.
##---------------------------------------------
#NOTE: some of the report calls I tried didn't seem to work properly?
#Need to build up a list of tested calls to the API that actually do what you think they should?
##-----

wordpress.getstats.demo=function(apikey, blogurl, table='postviews', end=Sys.Date(), days='12', period='week', limit='', summarise=''){
  #default parameters gets back last 12 weeks of postviews aggregated by week
  url=paste('http://stats.wordpress.com/csv.php?',
    'api_key=',apikey,
    '&amp;blog_uri=',blogurl,
    '&amp;table=',table,
    '&amp;end=',end,
    '&amp;days=',days,
    '&amp;period=',period,
    '&amp;limit=',limit,
    '&amp;',summarise, #set this to 'summarise=T' if required
    sep=''
  )
  #Martin's post notes that JSON appears to work better than CSV
  #May be worth doing a JSON parsing version?
  read.csv(url)
}


APIKEY='YOUR-API_KEY_HERE'
#Use the URL of a WordPress blog associated with the same account as the API key
BLOGURL='http://ouseful.wordpress.com'

#Examples
wp.pageviews.last12weeks=wordpress.getstats.demo(APIKEY,BLOGURL)
wp.views.last12weeks.byweek=wordpress.getstats.demo(APIKEY,BLOGURL,'views')
wp.views.last30days.byday=wordpress.getstats.demo(APIKEY,BLOGURL,'views',days=30,period='')
wp.clicks.wpdefault=wordpress.getstats.demo(APIKEY,BLOGURL,'clicks',days='',period='')
wp.clicks.lastday=wordpress.getstats.demo(APIKEY,BLOGURL,'clicks',days='1',period='')
wp.referrers.lastday=wordpress.getstats.demo(APIKEY,BLOGURL,'referrers',days='1',period='')


require(stringr)
getDomain=function(url) str_match(url, &quot;^http[s]?://([^/]*)/.*?&quot;)[, 2]

#We can pull out the domains clicks were sent to or referrals came from
wp.clicks.lastday$domain=getDomain(wp.clicks.lastday$click)
wp.referrers.lastday$domain=getDomain(wp.referrers.lastday$referrer)

require(ggplot2)

#Scruffy bar chart - is there a way of doing this sorted chart using geom_bar? How would we reorder x?
c=as.data.frame(table(wp.clicks.yesterday$domain))
ggplot(c)+geom_bar(aes(x=reorder(Var1,Freq),y=Freq),stat='identity')+theme( axis.text.x=element_text(angle=-90))

c=as.data.frame(table(wp.referrers.lastday$domain))
ggplot(c)+geom_bar(aes(x=reorder(Var1,Freq),y=Freq),stat='identity')+theme( axis.text.x=element_text(angle=-90))
</pre>
<p>(Code <a href="https://gist.github.com/4492571">as a gist</a>.)</p>
<p>I guess there&#8217;s scope for coming up with a set of child functions that pull back specific report types? Also, if we pull in the blog XML archive and extract external links from each page, we could maybe start to analyse we pages are sending traffic where? (Of course, you can use Google Analytics to do this more efficiently, for hosted WordPress blogs don&#8217;t support Google Analytics (for no very good reason that I can tell&#8230;?)</p>
<p>PS for more WordPress tinkerings, see eg <a href="http://blog.ouseful.info/2012/09/06/how-ouseful-info-posts-link-to-each-other/">How OUseful.Info Posts Link to Each Other…</a>,which links to a Python script for extracting data from WordPress blog export files that show how blogs posts in a particular WordPress blog link to each other.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/9447/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/9447/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=9447&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2013/01/09/wordpress-stats-in-r/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>
	</item>
	</channel>
</rss>
