Calling Amazon Associates/Ecommerce Web Services from a Google Spreadsheet

[UPDATE: Note that since Amazon Product Advertising API, or whatever it’s called now – the thing that was the Amazon E-commerce API – started requiring signed calls (August 2009), this trick has stopped working…]

I’ve never really been one for using spreadsheets – I’d rather write code in a text environment than macros and formulae in a Microsoft environment (because Excel is the spreadsheet you’re likely to have to hand in most cases, right?), but over the last week or so, I’ve really been switched on to how we might be able to use them as a scribble pad for playing with web services…

So for example, in Viewing Campaign Finance Data In a Google Spreadsheet via the New York Times Campaign Data API I showed how to do what it says on the tin…

… and today I had a look at Amazon Associates Web Service (formerly known Amazon ECS (eCommerce webservices)).

Until now, the best way of getting your head round what these services can do has been to use the tools on AWSzone, a playground (or scratchpad) for previewing calls to Amazon web services.

In the case of the REST flavoured web service, the form simply provides a quick’n’easy way of creating the RESTful URL that calls the webservice.

The SubscriptionId is required and can be obtained by registering for access with the Amazon Associates web service.

So just pick the web service/function you want to call (ItemSearch in this case), fill in some necessary details (and some optional ones, if you like…) and view the results:

(You might also notice the scratchpad contains a tab for creating a minimal SOAP request to the web service (and viewing the associated SOAP response) and a tab for creating Java (or C#) code that will call the service). Amusingly, you view the SOAP request and response via a URL ;-)

Whilst the scratchpad makes it easy to construct web service calling URLs, the XML response is still likely to be unmanageable at best (and meaningless at worst) for most people. Which is where using a Google spreadsheet as a display surface comes in.

How so? Like this: take a URL for a (working) webservice call constructed in the AWZ Zone REST scratchpad and paste it into cell B3 in a new Google spreadsheet (enter “URL” as a label in cell A3).

In cell D3 enter the following formula:
=importXML(B3,”//Item”)

This calls the Amazon Associates web service via the RESTful URL in cell B3, and then attempts to display the XML for each results “Item”.

Compare this with the actual XML results file:

The spreadsheet has loaded in the ASIN (the ISBN for each book result) and the DetailPageURL, but the ItemAttributes are not loaded in (or if they are, they aren’t displayed because a single cell can’t display more than a single XML attribute, and it would have to display the Author(s), Manufacturer, ProductGroup and so on).

(Hmm, I wonder what a Treemap spreadsheet would look like? How would it handle the display of XML subtrees?!)

Tweak the formula in D3 so that it says:
=importXML(B3,B4)
in cell A4 enter Path and in B4 enter //Item.

Hopefully results table will remain the same, only now you can experiment with the path setting easily.

Inspect the ItemAttributes by setting the path (cell B4) to //ItemAttributes

A single result can be obtained by specifying which result to display. For example, set the path to //Item[1]/ItemAttributes to display the ItemAttributes for just the first ([1]) results Item.

By importing several XML results files, you could list just the results for the first, second and third results, for example. By loading formulas with different paths into different cells, you can force different results attributes into particular cells.

For example, set the path to //Item[2]/ItemAttributes to display the ItemAttributes for just the second ([2]) results Item.

It’s also possible to craft changes that will apply to the web service URL. In cell A2 enter the label Search and in cell B2 a single word search term, such as google.

Cut the URL from cell B3 and replace it with the formula =CONCATENATE(“”), then paste the URL back in between the double quotes of the CONCATENATE formula.

Now go into cell B3, and find the part of the URL that encodes the search keywords:

In the example above, I was searching for the keyword mashup – replace the keyword with “,B2,”.

What this will do is add the search term in cell B2 into the URL that calls the Amazon web service. So now you can use cell B2 as a search box for a search on the Amazon Associates web service.

Note that you can only use single word search terms at the moment – if you want to use multiple search words, you must use + instead of a space between each word.

So – that’s how to build a search (of sorts) using Amazon Associates web services in a Google spreadsheet. :-)

PS Now I know that that for webservices to count in an academic environment, you’ve got to use SOAP (this counts for teaching computing just as much as it counts in JISC funded projects!), so I don’t expect any of this to count in that environment. But for “mortals”, this way of accessing webservices and then doing something useful with the results may actually be a way forward? ;-)

Creating Database Query Forms in Google Spreadsheets – Sort Of

It’s all very well using a Google spreadsheet as a database, but sometimes you just want to provide a simple form to let people run a particular query. Here’s a quick way of doing that within a Spreadsheet…

So for example: Can you help me crowd source a solution?. The problem is as follows:

Students will make five choices from a list of over 200 projects that have been anonymised… We will give each project a code, and have already entered all the details into an excel sheet so we can tie the project code to the supervisor.

We need a solution that will enable students to enter their project code and then have the title of the project displayed as a check to make sure they have entered the code correctly. The list of projects is just too long for a drop down list, even when split by department (around 50 in each).

Does anyone have any suggestions of tools that we can use for students to submit this type of information, so that we get it in a format that we can use, and they get confirmation of the project titles they have chosen? A simple google form isn’t going to hack it!

Here’s one way…

Create a “form” – the text entry cell can be highlighted by setting the background colour from the spreadsheet toolbar:

Construct a query. In this case, I need to select three results columns (H, I and J) from another sheet (‘Sheet1’, the one that acts as the database and contains the project codes) so the query will be of the form “select H,I,J where H contains “BIOCHEM”; the search term (“BIOCHEM”) is pulled in from the query form show above:

=concatenate(“select H,I,J where H contains ‘”,B2,”‘”)

(As a rule of thumb, if you want your query to select cells A, D, AC, the range set in the first part of the query that defines the database should span the first to the last column in the select range (Sheet1!A:AC, for example).)

By using the contains relation, this query will generate a set of results that are, in effect, a list of auto-complete suggestions as the result of a searching on a partially stated query term.

Assuming I have placed the query in cell A4, I can automatically get the results from the query as follows:

Note that it would be possible to hide the query generator (the contents of cell A4) in another sheet and just have the search box and the results displayed in the user interface sheet.

Another approach is to query the spreadsheet via its API.

So for example, if the original spreadsheet database was published as a public document, we could also grab the results as an HTML table via an API using a URI of the form:

http://spreadsheets.google.com/tq?tqx=out:html
&tq=select%20H%2CI%2CJ%20where%20H%20contains%20%22SEARCHTERM%22
&key=SPREADSHEETKEY

Setting out:csv would return the results in comma separated variable format, so we could create a Yahoo pipes interface to query the form, for example:

Here’s how:

What would be really useful would be if the Google/Yahoo widget options for the feed respected the form elements, rather than just generating a widget that displays the feed corresponding to the current Run of the pipe with the provided search terms.

Building such a widget is something I need to put on my to do list, I guess?! Sigh…

Using Data From Linked Data Datastores the Easy Way (i.e. in a spreadsheet, via a formula)

Disclaimer: before any Linked Data purists say I’m missing the point about what Linked Data is, does, or whatever, I don’t care, okay? I just don’t care. This is for me, and me is someone who can’t get to grips with writing SPARQL queries, can’t stand the sight of unreadable <rdf> <all:over the=”place”>, can’t even work out how to find things are queryable in a Linked Data triple store, let alone write queries that link data from one data store with data from another data store (or maybe SPARQL can’t do that yet? Not that I care about that either, because I can, in Yahoo Pipes, or Google Spreadsheets, and in a way that’s meaningful to me…)

In Mulling Over =datagovukLookup() in Google Spreadsheets, I started wondering about whether or not it would be useful to be able to write formulae to look up “live facts” in various datastores from within a spreadsheet (you know, that Office app that is used pretty much universally in workplace whenever there is tabular data to hand. That or Access of course…)

Anyway, I’ve started tinkering with how it might work, so now I can do things like this:

The formulae in columns G, H and I are defined according to a Google Apps script, that takes a school ID and then returns something linked to it in the data.gov.uk education datastore, such as the name of the school, or its total capacity.

Formulae look like:

  • =datagovuk_education(A2,”name”)
  • =datagovuk_education(A2,”opendate”)
  • =datagovuk_education(A2,”totcapacity”)

and are defined to return a single cell element. (I haven’t worked out how to return several cells worth of content from a Google Apps Script yet!)

At the moment, te script is a little messy, taking the form:

function datagovuk_education(id,typ) {
  var ret=""; var args=""
  switch (typ) {
    case 'totcapacity':
      args= _datagovuk_education_capacity_quri(id);
      break;
    ...
    default:
      //hack something here;
  }
  var x=UrlFetchApp.fetch('http://data-gov.tw.rpi.edu/ws/sparqlproxy.php',{method: 'post', payload: args});
  var ret=x.getContentText();
  var xmltest=Xml.parse(ret);
  ret=xmltest.sparql.results.result.binding.literal.getText();

  return ret;
}
function _datagovuk_education_capacity_quri(id){
  return "query=prefix+sch-ont%3A+%3Chttp%3A%2F%2Feducation.data.gov.uk%2Fdef%2Fschool%2F%3E%0D%0ASELECT+%3FschoolCapacity+WHERE+{%0D%0A%3Fschool+a+sch-ont%3ASchool%3B%0D%0Asch-ont%3AuniqueReferenceNumber+"+id+"%3B%0D%0Asch-ont%3AschoolCapacity+%3FschoolCapacity.%0D%0A}+ORDER+BY+DESC%28%3Fdate%29+LIMIT+1&output=xml&callback=&tqx=&service-uri=http%3A%2F%2Fservices.data.gov.uk%2Feducation%2Fsparql";
}

The datagovuk_education(id,typ) function takes the school ID and the requested property, uses the case statement to create an appropriate query string, and then fetches the data from the education datastore, returning the result in an XML format like this. The data is pulled from the datastore via Sparqlproxy, and the query string URIs generated (at the moment) by adding the school ID number into a query string generated by running the desired SPARQL query on Sparqlproxy and then grabbing the appropriate part of the URI. (It’s early days yet on this hack!;-)

By defining appropriate Apps script functions, I can also create formulae to call other datastores, run queries on Google spreadsheets (e.g. in the Guardian datastore) and so on. I assume similar sorts of functionality would be supported using VB Macros in Excel?

Anyway – this is my starter for ten on how to make live datastore data available to the masses. It’ll be interesting to see whether this approach (or one like it) is used in favour of getting temps to write SPARQL queries and RDF parsers… The obvious problem is that my approach can lead to an explosion in the number of formulae and parameters you need to learn; the upside is that I think these could be quite easily documented in a matrix/linked formulae chart. The approach also scales to pulling in data from CSV stores and other online spreadsheets, using spreadsheets as a database via the =QUERY() formula (e.g. Using Google Spreadsheets Like a Database – The QUERY Formula), and so on. There might also be a market for selling prepackaged or custom formulae as script bundles via a script store within a larger Google Apps App store

PS I’m trying to collect example SPARQL queries that run over the various data.gov.uk end points because: a) I’m really struggling in getting my head round writing my own, not least because I struggle to make sense of the ontologies, if I can find them, and write valid queries off the back of them; even (in fact, especially) really simple training/example queries will do! b) coming up with queries that either have interesting/informative/useful results, or clearly demonstrate an important ‘teaching point’ about the construction of SPARQL queries, is something I haven’t yet got a feel for. If you’ve written any, and you’re willing to share, please post a gist to github and add a link in a comment here.

PPS utility bits, so I don’t lose track of them:
education datastore ontology
– Apps script XML Element class

PPPS HEre’s how to dump a 2D CSV table into a range of cells: Writing 2D Data Arrays to a Google Spreadsheet from Google Apps Script Making an HTTP POST Request for CSV Data

Grabbing “Facts” from the Guardian Datastore with a Google Spreadsheets Formula

In Using Data From Linked Data Datastores the Easy Way (i.e. in a spreadsheet, via a formula) I picked up on an idea outlined in Mulling Over =datagovukLookup() in Google Spreadsheets to show how to use Google Apps script to create a formula that could pull in live data from a data.gov.uk datastore.

So just because, here’s how to do something similar with data from a Google spreadsheet in the Guardian datastore. Note that what I’m essentially proposing here is using the datastore as a database…

To ground the example, consider the HE satisfaction tables:

Lots of data about lots of courses on lots of sheets in a single spreadsheet. But how do you compare the satisfaction ratings across subjects for a couple of institutions? How about like this:

Creating Subject comparison tables from Guardian HE data

(We can just click and drag the formula across a range of cells as we would any other formula.)

That is, how about defining a simple spreadsheet function that lets us look up a particular data value for a particular subject and a particular institution? How about being able to write a formula like:
=gds_education_unitable(“elecEng”,”Leeds”,”NSSTeachingPerCent”)
and get the national student satisfaction survey teaching satisfaction result back from students studying Electrical/Electronic Engineering at Leeds University?

Google Apps script provides a mechanism for defining formulae that can do this, and more:

Guardian Datastore as a database

The script takes the arguments and generates a query to the spreadsheet using the spreadsheet’s visualisation API, as used in my Guardian Datastore Explorer. The results are pulled back as CSV, run through a CSV2Javacript object function and then returned to the calling spreadsheet. Here’s an example of the Apps script:

function gds_education_unitable(sheet,uni,typ){
  var key="phNtm3LmDZEM6HUHUnVkPaA";
  var gid='0';//'Overall Institutional Table';
  var category="C"; //(Average) Guardian teaching score
  switch (sheet){
    case "full":
      gid='0';//'Overall Institutional Table';
      break;
    case "chemEng":
      gid='16';//'15 Chem Eng';
      break;
    case "matEng":
      gid='17';//'16 Mat Eng';
      break;
    case "civilEng":
      gid='18';//'17 Civil Eng';
      break;
    case "elecEng":
      gid='19';//'18 Elec Eng';
      break;
    case "mechEng":
      gid='20';//'19 Mech Eng';
      break;
    default:
  }

  switch (typ){
    case "guardianScore":
      category='C';//(Average) Guardian teaching score
      break;
    case "NSSTeachingPerCent":
      category='D';//
      break;
    case "expenditurePerStudent":
      category='E';//
      break;
    case "studentStaffRatio":
      category='F';//
      break;
    default:
  }

  if (sheet!='full') category=String.fromCharCode(category.charCodeAt(0)+2);

  var url="http://spreadsheets.google.com/tq?tqx=out:csv&tq=select%20B%2C"+category+"%20where%20B%20matches%20%22"+uni+"%22&key="+key+"&gid="+gid;

  var x=UrlFetchApp.fetch(url);
  var ret=x.getContentText();
  ret = CSVToArray( ret, "," );
  return ret[1][1];
}

(The column numbering between the first sheet in the spreadsheet and the subject spreadsheets is inconsistent, which is why we need a little bit of corrective fluff (if (sheet!=’full’)) in the code…)

Of course, we can also include script that will generate calls to other spreadsheets, or as I have shown elsewhere to other data sources such as the data.gov.uk Linked Data datastore.

Something that occurred to me though is if and how Google might pull on such “data formula” definitions to feed apps such as Google Squared (related: =GoogleLookup: Creating a Google Fact Engine Directory and Is Google Squared Just a Neatly Packaged and Generalised =googlelookup Array?).

Using CSV Docs As a Database

Earlier today, my twinterwebs were full of the story about the COI announcing a reduction in the number of government websites:

Multiple sources of news... not...

(The told the story differently, at least…)

A little while later(?), a press released appeared on the COI website: Clamp down on Government websites to save millions, although by that time, via @lesteph, I’d found a copy of the interim report and a CSV dataset Reporting on progress: Central Government websites 2009-10.

So what can we do with that CSV file? How about turning it into a database, simply by uploading it to Google docs? Or importing it live (essentially, synching a spreadsheet with it) from the source URL on the COI website?

<a href="COI website report - costs column headings” title=”Photo Sharing”>Import csv into google docs

I had a quick play pruning the code on my Guardian Datastore explorer, and put together this clunky query tool that lets you explore the COI website data as if it was a database.

COI Website review CSV data explorer

The explorer allows you to bookmark views over the data, to a limited extent (the ascending/descending views aren’t implemented:-(, so for example, we can see:

websites with a non-zero “Very Poor Editorial” score
Age profile of visitors (where available)
Costs

(Feel free to share bookmarks to other views over the data in the comments to this post.)

Note that the displayed results table is an activie one so you can click on column headings to order the results by column values.

SOrting a table by colun

Note that there seem to be issues with columns not being recognised as containing numerical data (maybe something to do in part with empty cells in a column?), which means the chart views don’t work, but this page is not trying to do anything clever – it’s just a minimal query interface over the visualisation API from a spreadsheet. (To build a proper explorer for this dataset, we’d check the column data types were correct, and so on.)

Looking at the app, I think it’d probably useful to display a “human readable” version of the query too, that translates column identifiers to column headings for example, but that’s for another day…

GOTCHAS: use single quote (‘) rather than double quote (“) in the WHERE statements.

Related: Using Google Spreadsheets as a Database with the Google Visualisation API Query Language
Using Google Spreadsheets Like a Database – The QUERY Formula

Cobbling Together a Searchable Twitter Friends/Followers Contact List in Google Spreadsheets

Have you ever found yourself in the situation where you want to send someone a Twitter message but you can’t remember their Twitter username although you do know their real name? Or where you can remember their twitter username or their real name, but you do remember who they work for, or some other biographical fact about them that might appear in their Twitter biography? If that sounds familiar, here’s a trick that may help…

… a searchable Twitter friends and followers contact list in Google Spreadsheets.

It’s based on Martin Hawksey’s rather wonderful Export Twitter Followers and Friends using a Google Spreadsheet (I have to admit – Martin has left me way behind now when it comes to tinkering with Google Apps Script…!) To get started, you’ll need a Google docs account, and then have to indulge in a quick secret handshake between Google docs and Twitter, but Martin’s instruction sheet is a joy to follow:-) Follow the *** Google Spreadsheet to Export Twitter Friends and Followers *** link on Martin’s page, then come back here once you’ve archived your Twitter friends and/or followers…

..done that? Here’s how to make the contact list searchable… I thought it should have been trivial, but it turned out to be quite involved!

The first thing I did was create a drop down list to let the user select Friends or Followers as the target of the search. (Martin’s application loads friends and followers into different sheets.)

The next step was to generate a query. To search for a particular term on a specified sheet we can use a QUERY formula that takes the following form:

=query(Friends!B:E,”select B,C,D,E where D contains ‘JISC'”)

Friends! specifies the sheet we want to search over; B:E says we want to pull columns B, C, D and E from the Friends sheet into the current sheet; the select statement will display results over four columns (B, C, D and E) from Friends for rows where the entry in column D contains the search term JISC.

To pull in the search term from cell D1 we can use a query of the form:

=query(Friends!B:E,concatenate(“select B,C,D,E where D contains ‘”,D1,”‘”))

The =concatenate formula constructs the search query. Make sure you use the right sort of quotes when constructing the string – Google Spreadsheets seems to prefer the use of double quotes wherever possible!

To search over two columns, (for example, the real name and the description columns of the twitter friends/follower data) we can use a query of the form:

=query(Followers!B:E,concatenate(“select B,C,D,E where C contains ‘”,D1,”‘ or D contains ‘”,D1,”‘”)

Again – watch out for the quotes – the result we want from the concatenation is something like:

=query(Followers!B:E,concatenate(“select B,C,D,E where C contains ‘Jisc’ or D contains ‘Jisc’)

so we have to explicitly code in the single quote in the concatenation formula.

Unfortunately, the query formula is case sensitive, which can cause the search to fail because we haven’t taken (mis)use of case into account in our search term. This means we need to go defensive in the query formulation – in the following example, I force everything to upper case – search corpus as well as search terms:

=query(Followers!B:E,concatenate(“select B,C,D,E where upper(C) contains upper(‘”,D1,”‘) or upper(D) contains upper(‘”,D1,”‘)”)

The final step is to define the sheet we want to search – Friends! or Followers! – depending on the setting of cell B1 in our search sheet. I had idly though I could use a concatenate formula to create this, but concatenate returns a string and we need to define a range. In the end, the workaround I adopted was an if statement, that chooses a query with an appropriate range set explicitly/hardwired within the formula depending on whether we are are searching Friends or Followers. Here’s the complete formula, which i put into cell E1.

=if(B1=”Friends”,query(Friends!B:E,concatenate(“select B,C,D,E where upper(C) contains upper(‘”,D1,”‘) or upper(D) contains upper(‘”,D1,”‘)”)),query(Followers!B:E,concatenate(“select B,C,D,E where upper(C) contains upper(‘”,D1,”‘) or upper(D) contains upper(‘”,D1,”‘)”)))

I now have a query sheet defined that allows me to search over my friends or followers, as required, according to their real name or a search term that appears in their biography description.

Reshaping Your Data – Pivot Tables in Google Spreadsheets

One of the slides I whizzed by in my presentation at the OU Statistitcs conference on “Visualisation an Presentation in Statistics” (tweet-commentary notes/links from other presentations here and here) relates to what we might describe as the shape that data takes.

An easy way of thinking about the shape of a dataset is is to consider a simple 2D data table with columns describing the properties of an object and rows typically corresponding to individual things. Often a regular structure, each cell in the table may take on a valid value. Occasionally, some cells may be blank, in which case we can start to think of the shape of the data getting a little bit ragged.

If you are working with data table, then on occasion you may want to swap the rows for columns (certain data operations require data to be organised in a particular way). By swapping the rows and columns, we change the shape of the data (for example, going from a table with N rows and M columns to one with M columns and N rows). So that’s one way of reshaping your data.

Many visualisation tools require data to be in a particular shape in order for the data to be visualised appropriately. If you look at the template pages on Number Picture, a new site hosting templated visualiastions built using Processing that allow you to cut, paste and visualise data – if it is is appropriately shaped – at a click.

But where do pivot tables come in? One way is to think of them as a tool for reshaping your data by providing summary reports of your original data set.

Here’s how the Goog describes them:

What pivot tables allow you to do is generate reports based on contents of a table using the values contained within one or more columns to define the columns and rows of a summary table. That is, you can re-present (or re-shape) a table as a new table that summarises data contained in the original table in terms of a rearrangement of the cell values of the original table.

Here’s a quick example. I have a data set that identifies the laptimes of drivers in an F1 race (yes, I know… again!;-) by stint, where different stints are groups of consecutive laps separated by pit stops.

If you look down the stint column you can see how its value groups together blocks of rows. But how do I easily show how much time each driver (column C) spent on each stint? The time the driver spent on each stint is the sum of laptimes by driver within a stint, so for each driver I need to find out the laps associated with each stint, and then sum them. Pivot tables let me do that. Here’s how:

So how does this work? The columns in the new table are defined according to the unique values found in the stint column of the original table. The rows in the new table are defined according to the unique values found in the car column of the original table. The cell values in the new table for a given row and column are defined as the sum of lapTime data values from rows in the original table where the car and stint values in the row correspond to the row and column values corresponding to each cell in the new table. Got that?

If you’re familiar with databases, you might think of the column and row settings in the new table defining “keys” into rows on the original table. The different car/stint pairs identify different blocks of rows that are processed per block to create the contents of the pivot table.

As well as summing the values from one column based on the values contained in two other columns, the pivot table can do other operations, such as counting the number of rows in the original table containing each “key” value. So for example, if we want to count the number of laps a car was out for by stint, we can do that simply by changing out pivot table Values setting.

Pivot tables can take a bit of time to get your head round… I find playing with them helps… A key thing to remember is: if you want to express a dataset in terms of the unique values contained within a column, the pivot table can help you do that. In the above example, I was essentially generating the row and column values for a new table based on categorical data (driver/car number and stint number). Another example might be sales data where the same categories of item appear across multiple rows and you want to generate reports based on category.

Google Spreadsheets API: Listing Individual Spreadsheet Sheets in R

In Using Google Spreadsheets as a Database Source for R, I described a simple Google function for pulling data into R from a Google Visualization/Chart tools API query language query applied to a Google spreadsheet, given the spreadsheet key and worksheet ID. But how do you get a list of sheets in spreadsheet, without opening up the spreadsheet and finding the sheet names or IDs directly? [Update: I’m not sure the query language API call lets you reference a sheet by name…]

The Google Spreadsheets API, that’s how… (see also GData Samples. The documentation appears to be all over the place…)

To look up the sheets associated with a spreadsheet identified by its key value KEY, construct a URL of the form:

http://spreadsheets.google.com/feeds/worksheets/KEY/public/basic

This should give you an XML output. To get the output as a JSON feed, append ?alt=json to the end of the URL.

Having constructed the URL for sheets listing for a spreadsheet with a given key identifier, we can pull in and parse either the XML version, or the JSON version, into R and identify all the different sheets contained within the spreadsheet document as a whole.

First, the JSON version. I use the RJSONIO library to handle the feed:

library(RJSONIO)
sskey='0AmbQbL4Lrd61dDBfNEFqX1BGVDk0Mm1MNXFRUnBLNXc'
ssURL=paste( sep="", 'http://spreadsheets.google.com/feeds/worksheets/', sskey, '/public/basic?alt=json' )
spreadsheet=fromJSON(ssURL)
sheets=c()
for (el in spreadsheet$feed$entry) sheets=c(sheets,el$title['$t'])
as.data.frame(sheets)

Using a variant of the function described in the previous post, we can look up the data contained in a sheet by the sheet ID (I’m not sure you can look it up by name….?) – I’m not convinced that the row number is a reliable indicator of sheet ID, especially if you’ve deleted or reordered sheets. It may be that you do actually need to go to the spreadsheet to look up the sheet number for the gid, which actually defeats a large part of the purpose behind this hack?:-(

library(RCurl)
gsqAPI = function( key, query,gid=0){ return( read.csv( paste( sep="", 'http://spreadsheets.google.com/tq?', 'tqx=out:csv', '&tq=', curlEscape(query), '&key=', key, '&gid=', curlEscape(gid) ) ) ) }
gsqAPI(sskey,"select * limit 10", 9)

getting a list of sheet names from a goog spreadsheet into R

The second approach is to pull on the XML version of the sheet data feed. (This PDF tutorial got me a certain way along the road: Extracting Data from XML, but then I got confused about what to do next (I still don’t have a good feel for identifying or wrangling with R data structures, though at least I now know how to use the class() function to find out what R things the type of any given item is;-) and had to call on the lazy web to work out how to do this in the end!)

library(XML)
ssURL=paste( sep="", 'http://spreadsheets.google.com/feeds/worksheets/', ssKey, '/public/basic' )
ssd=xmlTreeParse( ssURL, useInternal=TRUE )
nodes=getNodeSet( ssd, "//x:entry", "x" )
titles=sapply( nodes, function(x) xmlSApply( x, xmlValue ) )
library(stringr)
data.frame( sheetName = titles['content',], sheetId = str_sub(titles['id',], -3, -1 ) )

data frame in r

In this example, we also pull out the sheet ID that is used by the Google spreadsheets API to access individual sheets, just in case. (Note that these IDs are not the same as the numeric gid values used in the chart API query language…)

PS Note: my version of R seemed to choke if I gave it https: headed URLs, but it was fine with http: