<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>OUseful.Info, the blog... &#187; Working With Excel Spreadsheet Files Without Using Excel&#8230;</title>
	<atom:link href="http://blog.ouseful.info/2012/04/30/working-with-excel-files-without-using-excel/feed/?withoutcomments=1" rel="self" type="application/rss+xml" />
	<link>http://blog.ouseful.info</link>
	<description>Trying to find useful things to do with emerging technologies in open education</description>
	<lastBuildDate>Sat, 18 May 2013 06:08:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.ouseful.info' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>OUseful.Info, the blog... &#187; Working With Excel Spreadsheet Files Without Using Excel&#8230;</title>
		<link>http://blog.ouseful.info</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.ouseful.info/osd.xml" title="OUseful.Info, the blog..." />
	<atom:link rel='hub' href='http://blog.ouseful.info/?pushpress=hub'/>
		<item>
		<title>Working With Excel Spreadsheet Files Without Using Excel&#8230;</title>
		<link>http://blog.ouseful.info/2012/04/30/working-with-excel-files-without-using-excel/</link>
		<comments>http://blog.ouseful.info/2012/04/30/working-with-excel-files-without-using-excel/#comments</comments>
		<pubDate>Mon, 30 Apr 2012 11:58:54 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Infoskills]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=7698</guid>
		<description><![CDATA[One of the most frequently encountered ways of sharing small datasets is in the form of Excel spreadsheet (.xls) files, notwithstanding all that can be said In Praise of CSV;-) The natural application for opening these files is Microsoft Excel, but what if you don&#8217;t have a copy of Excel available? There are other desktop [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=7698&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>One of the most frequently encountered ways of sharing small datasets is in the form of Excel spreadsheet (.xls) files, notwithstanding all that can be said <a href="http://blog.datamarket.com/2012/04/17/in-praise-of-csv/">In Praise of CSV</a>;-) The natural application for opening these files is Microsoft Excel, but what if you don&#8217;t have a copy of Excel available?</p>
<p>There are other desktop office suites that can open spreadsheet files, of course, such as Open Office. As long as they&#8217;re not too big, spreadsheet files can also be uploaded to and then opened using a variety of online services, such as Google Spreadsheets, Google Fusion Tables or Zoho Sheet. But spreadsheet applications aren&#8217;t the only data wrangling tools that can be used to open xls files&#8230; Here are a couple more that should be part of every data wrangler&#8217;s toolbox&#8230;</p>
<p>(If you want to play along, the file I&#8217;m going to play with is a spreadsheet containing the names and locations of GP practices in England. The file can be found on the <a href="https://indicators.ic.nhs.uk/webview/">NHS Indicators portal</a> &#8211; here&#8217;s <a href="https://indicators.ic.nhs.uk/download/GP%20Practice%20data/summaries/demography/Practice%20Addresses%20Final.xls">the actual spreadsheet</a>.)</p>
<p><a href="http://ouseful.files.wordpress.com/2012/04/gp-practices-location-file.png"><img src="http://ouseful.files.wordpress.com/2012/04/gp-practices-location-file.png?w=700&#038;h=476" alt="" title="GP Practices location file" width="700" height="476" class="alignnone size-full wp-image-7706" /></a></p>
<p>Firstly, <a href="http://code.google.com/p/google-refine/">Google Refine</a>. Google Refine is a cross-platform, browser based tool that helps with many of the chores relating to getting a dataset tidied up so that you can use it elsewhere, as well as helping out with data reconcilation or augmenting rows with annotations provided by separate online services. You can also use it as a quick-and-dirty tool for opening an xls spreadsheet from a URL, knocking the data into shape, and dumping it to a CSV file that you can use elsewhere. To start with, choose the option to create a project by importing a file from a web address (the XLS spreadsheet URL):</p>
<p><a href="http://ouseful.files.wordpress.com/2012/04/google-refine-import.png"><img src="http://ouseful.files.wordpress.com/2012/04/google-refine-import.png?w=700&#038;h=174" alt="" title="google refine import" width="700" height="174" class="alignnone size-full wp-image-7705" /></a></p>
<p>Once loaded, you get a preview view..</p>
<p><a href="http://ouseful.files.wordpress.com/2012/04/google-refine-importing-xls.png"><img src="http://ouseful.files.wordpress.com/2012/04/google-refine-importing-xls.png?w=700&#038;h=398" alt="" title="google refine - importing xls" width="700" height="398" class="alignnone size-full wp-image-7704" /></a></p>
<p>You can tidy up the data that you are going to use in your project via the preview panel. In this case, I&#8217;m going to ignore the leading lines and just generate a dataset that I can export directly as a CSV file once I&#8217;ve got the data into my project.</p>
<p><a href="http://ouseful.files.wordpress.com/2012/04/importing-xls-with-config.png"><img src="http://ouseful.files.wordpress.com/2012/04/importing-xls-with-config.png?w=700&#038;h=380" alt="" title="importing xls with config" width="700" height="380" class="alignnone size-full wp-image-7703" /></a></p>
<p>If I then create a project around this dataset, I can trivially export it again using a format of my own preference:</p>
<p><a href="http://ouseful.files.wordpress.com/2012/04/google-refine-export.png"><img src="http://ouseful.files.wordpress.com/2012/04/google-refine-export.png?w=700&#038;h=283" alt="" title="google refine export" width="700" height="283" class="alignnone size-full wp-image-7708" /></a></p>
<p>So that&#8217;s one way of using Google Refine as a simple file converter service that allows you to preview and to a certain extent shape the data in XLS spreadsheet, as well as converting it to other file types.</p>
<p>The second approach I want to mention is to use a really handy Python software library (<a href="https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html">xlrd &#8211; Excel Reader</a>) in Scraperwiki. The <a href="https://scraperwiki.com/docs/python/python_excel_guide/">Scraperwiki tutorial on Excel scraping</a> gives a great example of how to get started, which I cribbed wholesale to produce the following snippet.</p>
<pre class="brush: python; title: ; notranslate">import scraperwiki
import xlrd

#cribbing https://scraperwiki.com/docs/python/python_excel_guide/
def cellval(cell):
    if cell.ctype == xlrd.XL_CELL_EMPTY:    return None
    return cell.value

def dropper(table):
    if table!='':
        try: scraperwiki.sqlite.execute('drop table &quot;'+table+'&quot;')
        except: pass

def reGrabber():
    #dropper('GPpracticeLookup')
    url = 'https://indicators.ic.nhs.uk/download/GP%20Practice%20data/summaries/demography/Practice%20Addresses%20Final.xls'
    xlbin = scraperwiki.scrape(url)
    book = xlrd.open_workbook(file_contents=xlbin)

    sheet = book.sheet_by_index(0)        

    keys = sheet.row_values(8)           
    keys[1] = keys[1].replace('.', '')
    print keys

    for rownumber in range(9, sheet.nrows):           
        # create dictionary of the row values
        values = [ cellval(c) for c in sheet.row(rownumber) ]
        data = dict(zip(keys, values))
        #print data
        scraperwiki.sqlite.save(table_name='GPpracticeLookup',unique_keys=['Practice Code'], data=data)

#Uncomment the next line if you want to regrab the data from the original spreadsheet
reGrabber()</pre>
<p>You can find my scraper here: <a href="https://scraperwiki.com/scrapers/uk_nhs_gp_practices_lookup/">UK NHS GP Practices Lookup</a>. What&#8217;s handy about this approach is that having scraped the spreadsheet data into a Scraperwiki database, I can now query it as database data via the Scraperwiki API.</p>
<p>(Note that the Google Visualisation API query language would also let me treat the spreadsheet data as a database if I uploaded it to Google Spreadsheets.)</p>
<p>So, if you find yourself with an Excel spreadsheet, but no Microsoft Office to hand, fear not&#8230; There are plenty of other tools other there you can appropriate to help you get the data out of the file and into a form you can work with:-)</p>
<p>PS R is capable of importing Excel files, I think, but the libraries I found don&#8217;t seem to compile onto Max OS/X?</p>
<p>PPS ***DATA HEALTH WARNING*** I haven&#8217;t done much testing of either of these approaches using spreadsheets containing multiple workbooks, complex linked formulae or macros. They may or may not be appropriate in such cases&#8230; but for simple spreadsheets, they&#8217;re fine&#8230;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/7698/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/7698/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#038;blog=325417&#038;post=7698&#038;subd=ouseful&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2012/04/30/working-with-excel-files-without-using-excel/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/abbd9f90565ce9ae4d065d93a81d8c03?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96" medium="image">
			<media:title type="html">Tony Hirst</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/04/gp-practices-location-file.png" medium="image">
			<media:title type="html">GP Practices location file</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/04/google-refine-import.png" medium="image">
			<media:title type="html">google refine import</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/04/google-refine-importing-xls.png" medium="image">
			<media:title type="html">google refine - importing xls</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/04/importing-xls-with-config.png" medium="image">
			<media:title type="html">importing xls with config</media:title>
		</media:content>

		<media:content url="http://ouseful.files.wordpress.com/2012/04/google-refine-export.png" medium="image">
			<media:title type="html">google refine export</media:title>
		</media:content>
	</item>
	</channel>
</rss>
