Using Google Spreadsheets as a Database with the Google Visualisation API Query Language

Wouldn’t it be handy if we could treat all the public spreadsheets uploaded to Google docs as queryable tables in a database? Well, it appears that you can do so, at least at an individual spreadsheet level: Introducing the Google Visualization API.

Over the weekend, I started exploring the Google Visualisation API Query Language, which is reminiscent of SQL (if that means anything to you!). This language provides a way of interrogating a data source such as a public online Google spreadsheet and pulling back the results of the query as JSON, CSV, or an HTML table.

Got that? I’ll say it again: the Google Visualisation API Query Language lets you use a Google spreadsheet like a database (in certain respects, at least).

Google query languages are defined on a spreadsheet in the following way:

http://spreadsheets.google.com/tq?tq=QUERY&key=SPREADSHEET_ID

Although defined, by default, to return JSON data from a query, wrapped in a pre-defined (and fixed?) callback function (google.visualization.Query.setResponse()), it is also possible to display the results of a query as an HTML table (which is “useful”, as the documentation says, “for debugging”). The trick here is to add another argument to the URL: tqx=out:html, so for example a query would now be defined along the lines of:
http://spreadsheets.google.com/tq?tqx=out:html&tq=QUERY&key=SPREADSHEET_ID

Using the Guardian datastore’s MPs expenses spreadsheet 2007-8 as an example, we can write quite a wide variety of queries, which I’ll show below in their ‘HTML preview’ form.

(In a ‘real’ situation, you are more likely to retrieve the data as JSON and then process it as an object. Or, as I will also demonstrate, take the results of the query as CSV output (tqx=out:csv rather then tqx=out:html) and pull it directly into a service such as Many Eyes WIkified.)

The generic URL is of the form: http://spreadsheets.google.com/tq?tqx=out:html&tq=QUERY&key=phNtm3LmDZEObQ2itmSqHIA.

In the examples, I will just show the unencoded select statement, but the link will be the complete, well-formed link.

So here we go:

  • show everything – fetch the whole table: select * (in a lot of computer languages, ‘*’ often refers to ‘everything and anything’);
  • just show some particular columns, but again for everyone: fetch just columns B (surname), C (first name) and I (total additional costs allowance): select B,C,I
  • only show the names of people who have claimed the maximum additional costs allowance (£23,083): fetch just columns B, C and I where the value in column I is 23083: select B,C,I where I=23083 (column I is the additional costs allowance column);
  • How many people did claim the maximum additional costs allowance? Select the people who claimed the maximum amount (23083) and count them: select count(I) where I=23083
  • So which people did not claim the maximum additional costs allowance? Display the people who did not claim total additional allowances of 23083: select B,C,I where I!=23083 (using <> for ‘not equals’ also works); NB here’s a more refined take on that query: select B,C,I where (I!=23083 and I>=0) order by I
  • search for the name, party (column D) and constituency (column E) of people whose first name is Jane or is recorded as John (rather than “Mr John”, or “Rt Hon John”): select B,C,D,E where (C contains ‘Joan’ or C matches ‘John’)
  • only show the people who have claimed less than £100,000 in total allowances : select * where F<100000
  • what is the total amount of expenses claimed? Fetch the summed total of entries in column I (i.e. the total expenses claimed by everyone): select sum(I)
  • So how many MPs are there? Count the number of rows in an arbitrary column: select count(I)
  • Find the average amount claimed by the MPs: select sum(I)/count(I)
  • Find out how much has been claimed by each party (column D): select D,sum(I) where I>=0 group by D (Setting I>0 just ensures there is something in the column)
  • For each party, find out how much (on average) each party member claims: select D,sum(I)/count(I) where I=0 group by D

To create your own queries, just hack around the URIs.

Many Eyes WIkified is no more…One other trick is to grab a CSV output, rather than an HTML output, and pull it into Many Eyes Wikified, and then visualise it within that environment – so we grab the data (in this case, using select D,sum(I) where I>=0 group byD, i.e. the total amount of additional costs allowance claims by party):

to give this:

and then visualise it in an appropriate way:

So to recap this final case, then, we are running a query on the original spreadsheet that calculates the total additional costs allowance claims per party, and emits the results as CSV. These results are imported into Many Eyes Wikified, and displayed therein.

Now I’m pretty sure that Many Eyes Wikified will continue (how often?) to synch data from a potentially changing data source, which means we should be able to use a similar approach to plot a running total of claims from the Shadow Cabinet Expenses spreadsheet

…but, at the time of writing at least, it seems as if the publication/privacy settings on that spreadsheet are set such that access via th query language is denied…:-(

Anyway – that was a quick intro to the Google Visualisation API Query Language – so go play… ;-)

PS so what other spreadsheets might make for some interesting queries?

PPS @adrianshort has made a valuable point about how easy it is for a publisher to change the order of rows in a spreadsheet, and hence make a nonsense of your query. (Also, I think the approach I’m taking sort of assumes a simple, regular spreadsheet where row 1 is for headers, then the data, and ideally no other text e.g. in cells below the table describing the data in the table.) So always check… ;-)

PPPS If the first row in the table defines column headings, then there are intervening lines (maybe spaces) before the data starts, putting offset N (where N is a number) will skip that many rows before displaying the data.

Something else I noticed on the order by setting, this can be of the form order by COL asc (to sort in ascending order, which is the default) or order by COL desc ( to sort in descending order).

69 comments

  1. Pingback: And a Tweet Later – Querying Shadow Cabinet Expenses on Google Spreadsheets with the Google Query Language « OUseful.Info, the blog…
  2. Pingback: Using Google Spreadsheets as a database (no, it really is very interesting, honest) | Online Journalism Blog
    • simile fan

      @Marko – i m a big fan of simile exhibit, but the query language option is even better then simile exhibit. i have a website also using simile exhibit off a google spreadsheet and given the size of the spreadsheet, the load time my website is very extensive which i have observed makes me lose visitors interest. GQL is better since it actively queries the spreadsheet for only relevant data and hence the load times are much more snappier.

  3. Adrian Short

    Thanks for a great write-up, Tony. I can see this being useful for many things.

    A word of caution though if you’re using regular queries against a spreadsheet as part of something that matters. Spreadsheet layouts are more likely to change than database schema so you need to keep an eye on your sources and output to ensure that you’re getting the results you expect. A spreadsheet owner simply transposing two columns could cause all kinds of grief.

  4. Pingback: iand.net – Google Spreadsheets As A Database
  5. Chris Wallace

    Tony

    Great stuff – I’m sure I’ll use this in teaching next year. Just noticed a typo: in your link to get CSV, I think the URL should contain tqx=out:csv

    Chris

  6. Pingback: Making it a Little Easier to Use Google Spreadsheets as a Database (Hopefully!) « OUseful.Info, the blog…
  7. Pingback: The golden age of computer-assisted reporting is at hand » Nieman Journalism Lab
  8. Pingback: The golden age of data journalism?
  9. Pingback: links for 2009-05-21 «
  10. simile fan

    Excellent writeup – I am planning to change my simile exhibit powered website to use this instead.

    Btw, in relation to your question about the json callback (“Although defined, by default, to return JSON data from a query, wrapped in a pre-defined (and fixed?) callback function…”) the request format does support a custom callback function using the tqx paramter as follows:

    tqx=responseHandler:foo will call “foo” function as follows:

    foo({status:’ok’, … });

  11. CountCulture

    Many thanks for this. Saved me a bunch of time (just starting using it in a small way on theyworkforyoulocal.com — an early-stage project to make local govt data accessible).

    Agree with your comment on later post about data integrity, particularly consistency in naming/unique refs for the bodies concerned, although sometimes this is a problem of the original govt produced data

  12. James

    This is interesting indeed.

    I tried with my own spreadsheet,
    “http://spreadsheets.google.com/tq?tqx=out%3Ahtml&tq=select+count%28A%29+where+A%3DPHP%3Fkey%3D0AipNn919hx-OdHNhTDl1RExJVlFMUXVwOGUtNjdzanc&hl=en”

    but I always got this message:
    “Oops, an error occured.
    Status: error
    Reason: Access denied
    Description: Access denied”

    Anyone can view and edit the spreadsheet via the key I used in this example, how come I still get access denied ?

  13. Tony Hirst

    @James – there are a couple of permissions fields associated with Google spreadsheets – I don’t remember offhand which you need to set for the query language API to work, but you should maybe check them both?

    One is in the “Publish as a web page” Share menu option, the other is in the “Get the link to share” option.

    • Rob

      I am having this problem and getting the Access denied error listed in the above comment. Tony, can you explain how to set permissions and what to set them to?

      Thanks.

      • Tony Hirst

        There are two sorts of permissions available from the Share menu (top right of the page in a Google spreadsheet): “Sharing Settings” and “Publish as web page”.

        If you select the “Publish as web page” option, and actually publish the page, I think you can use the Google visualisation API.

        If there is a problem, go into the other Share menu option – Sharing Settings – and make the spreadsheet public…

        One day I’ll try to find out if a private spreadsheet can be queried via the API… but not right now… ;-)

  14. Pingback: Using Google Spreadsheets Like a Database – The QUERY Formula « OUseful.Info, the blog…
  15. Pingback: links for 2010-02-17 at adam hoyle presents suckmypixel
  16. ppol

    This is nice, but I don’t like the idea of using column letters like A, B, C in queries. Because if the sheet is modified with new columns being inserted, then all the data is shifted and the queries do not make sense any more.

    It is somewhere mentioned in the Google docs documentation something about column ID or column identifier. So that if your column ID is party, then your query can be:

    select party
    instead of
    select D

    So if you insert a new column left of D, the query “select party” remains valid, while “select D” now points so something different than party.

    But I never managed to set column ID / identifier in Google spreadsheets. Does anyone know how to do this?

    • Tony Hirst

      Yes, agreed – it would be far better to query against the column name; when I put this together I couldn’t work out how to do it, and still haven’t found the trick. I am keeping a weather eye out for it tho…

  17. tim

    Super write-up! Thanks so much for sharing your knowledge.

    Somehow I feel lost about getting at the result data, however:

    Example URL below returns custom callback “myResult”, but I’m not able to actually get anything into my DOM ( I use jQuery). What am I missing? It must be something very simple, I’m sure…

    http://spreadsheets.google.com/tq?tqx=responseHandler:myResult&tq=select%20A%2C%20B&key=tSa2oPteZ0x9nqmOs9ldXGA

    returns :

    myResult({version:’0.6′,status:’ok’,sig:’3140154′,table:{cols:[{id:’A’,label:”,type:’string’,pattern:”},{id:’B’,label:”,type:’string’,pattern:”}],rows:[{c:[{v:’first name’},{v:’last name’}]},{c:[{v:’john’},{v:’smith’}]},{c:[{v:’adrian’},{v:’booger’}]},{c:[{v:’sally’},{v:’fields’}]},{c:[{v:’sally’},{v:’gophers’}]}]}});

    I much appreciate any pointers!

    • Tony Hirst

      Have you defined a function called “myResult” that acts on a Javascript object as defined by the format of the returned results?

  18. Dan

    Absolutely fantastic technology and write up- thanks.

    I was thinking about building my database within Google App Engine (“big table”) and then querying against that, but this honestly looks much easier.

  19. fde

    Hi!
    I use the query language in order to do the simplest thing: show a html table of the data extracted from a spreadsheet. However, this data contains URLs of images; I would like the resulting html table to show the image itself instead of its URL. Do you think it would be possible.

    Thanks in advance for any pointer!

  20. Pingback: Web Lego And Format Glue, aka Get Yer Mashup On « OUseful.Info, the blog…
  21. ran_koto

    Hi!
    Can you help me please! i am a novice of visualization

    How do you traslate from mysql at query language visualization ?

    mysql : select T.col1 From Table T Where T.datestart like ‘2010-05-*’ and T.dateFinish like ‘2010-05-‘;
    OR select T.col1 From Table T Where T.date between ‘2010-05-*’ and ‘2010-05-‘;

    query language visualization : “select D Where datestart like ‘2010-05-*’ and T.dateFinish like ‘2010-05-‘ ” ;

    It wrong ?

  22. Pingback: Using CSV Docs As a Database « OUseful.Info, the blog…
  23. Pingback: Links for July 13th through July 15th | daveharte.com
  24. forsdick

    Tony,

    Great writeup. Enabled me to do a lot of things I thought I should be able to do, but couldn’t figure out just reading reference documentation. A true service to the community. Thanks.

    Now, onto my problem: This is a problem that has been spoken about before, but let me describe what I am seeing.

    I have a spreadsheet called Members. I want to present subsets of the data in this spreadsheet by using the query mechanism and then taking the results and embedding them in a spreadsheet.

    I have set up the visibility and protection in the following manner:

    Sharing: Anyone who has the link can view
    Publish to the web: I am automatically publishing all spreadsheets in the workbook

    This is the URL to the published data:

    https://spreadsheets.google.com/pub?key=0ApJDC5yR3H4LdFFvenBpOXdsUTBuMnFfbW9lSDJsd0E&authkey=CL2r6sMH&hl=en&single=true&gid=0&output=html

    When I look at this in a browser when I am not logged into Google, I get the single sheet I expected.

    Here is the schema from the article for the URL to access the sheet with a query:

    http://spreadsheets.google.com/tq?tqx=out:html&tq=QUERY&key=SPREADSHEET_ID

    My QUERY is:

    tq=select%20A,%20B,%20C,%20G,%20H,%20I,%20J,%20K,%20O,%20R%20where%20K%20=%200

    and my SPREADSHEET_ID is:

    key=0ApJDC5yR3H4LdFFvenBpOXdsUTBuMnFfbW9lSDJsd0E

    So, the URL for my query to my spreadsheet is:

    http://spreadsheets.google.com/tq?tqx=out:html&tq=select%20A,%20B,%20C,%20G,%20H,%20I,%20J,%20K,%20O,%20R%20where%20K%20=%200&key=0ApJDC5yR3H4LdFFvenBpOXdsUTBuMnFfbW9lSDJsd0E

    When I put this is the address line of my browser (FireFox), I get the following error message:

    Oops, an error occured.
    Status: error
    Reason: Access denied
    Description: Access denied

    Of course, if I change the Sharing so that everyone can have read access whether or not they are logged in, it works correctly.

    So, my question is, why doesn’t what I am doing work?

    TIA,

    Harry

    • Tony Hirst

      That’s a new gotcha to me – maybe the spreadsheet API permissions are different to the Spreadsheet page view permissions, which is why the full sharing read access is required?

      Could be worth posting this to the Google group…

      tony

    • Amit Singh

      That is because you are not logged-in into your account in the Firefox Browser.
      Solution:
      Log into your Google account (and still keep permissions ‘Private’).
      OR
      Change the permissions to ‘Public’ (no log-in required, can use any Browser).

      Wish it should help you.

  25. forsdick

    @Tony Thanks for your reply. I looked around in Google Docs Help and discovered that appears to be quite a bit of confusion about access control to documents by users, as well as a lot of basic principles of access control being relearned by Google developers.

    Bottom line is that I don’t think that Google Docs is ready for selectively protecting a single spreadsheet that may have public and private information in the same spreadsheet. The alternative, as the expense of ease of management, is to create two spreadsheets where you would like one: one for public information and one for private information. In my application, I have a set of records of a membership database where I want to keep the email address of the member private, but allow selective (through a query) access to all of the other information.

    For the time being I have solved this by separating the single members spreadsheet into two spreadsheets membersPublic and membersPrivate, and repeat several columns (First, Last name) from the membersPublic spreadsheet in the membersPrivate spreadsheet. The query accesses only the membersPublic spreadsheet and so by making membersPublic accessible by the world, things work fine. I can now make the private information protected by adjusting the sharing of the membersPrivate worksheet to include only a selected list of people.

    As I mentioned this makes management of what should be a single database of membership information more complex because a record for a single person appears in two places. But, until Google figures out a way to make the access control finer grain, I will have to live with that.

    Any better ideas?

    — Harry Forsdick

  26. Pingback: Google docs spreadsheet as database « ideaton
  27. vinay doddamane

    Hello
    I have a problem that is i have created a Visualization: Table i need to calculate the total of a perticular column so can any one help me do

    thanks
    vinay

  28. Pingback: First Dabblings With Scraperwiki – All Party Groups « OUseful.Info, the blog…
  29. Seth

    Thanks for writing this article. I am just starting to use Google Spreadsheets, and am an amateur at code. Can you access data from different sheets in the spreadsheet, or just from the first sheet on the spreadsheet? Even when I copy the key from Sheet3 of my spreadsheet, the html still shows the data from Sheet1.

    Thanks,
    Seth

    • Tony Hirst

      I think you reference different sheets the follow way:
      var ss = SpreadsheetApp.getActiveSpreadsheet(); //get a reference to the current spreadsheet
      var sheet = ss.getSheets()[0]; //where [0] is the first sheet, [1] is the 2nd sheet and so on.
      The key is the key/identifier for the whole spreadsheet document, not the sheet within the spreadsheet.

  30. Seth

    Never mind Tony, I found the answer on one of your other articles. I just needed to add “&gid=2″ to the end of my key to get data from the third sheet. Thanks for all of your hard work on these articles. They are helping me a lot

  31. Damian Branigan

    A useful summation of the SQL-Like query API

    Two glaring omissions in Google’s otherwise very useful API, one is the “JOIN” concept – which appears counter to Google’s philosophy – but is still necessary in the real world, to combine ‘tables’ by matching a common value. (In the end I wrote a function to do it).

    The other one is an “IN” or “One of” construct and inverse “NOT IN” as for example
    Select * Where J =’value1′ OR J =’Value2′ OR J = ‘Value3′

    Is more efficiently written as
    Select * Where J in (‘Value1′, ‘Value2′, ‘Value3′)

    Which could with a bit of thought resolve from a series of values in a column or row, rather than as constants, for example I use the following to construct my query taking the values in cells.

    Select * Where J ='”&A2 &”‘ or J ='”&A3&”‘ OR J = ‘”&A4&”‘ or J ='”&A5&”‘ or J ='”&A6&”‘ or J ='”&A7&”‘ “

  32. Pingback: TG SEO » How To Build Agile SEO Tools Using Google Spreadsheets
  33. steppenwolf

    Thanks for a brilliant tutorial. I’ve adapted this for a genealogy database I’m developing.
    I noticed that your filters do not exclude the header row. My header rows get filtered out, and also do not show a background color. How did you fix the header?

    • Tony Hirst

      @Will ….that’s because Many Eyes Wikified is no more… (I’ve amended the post with strikethoughs..) Many Eyes, original edition, still exists, but without the “import from URL” facility :-(

  34. Pingback: Fragments: Glueing Different Data Sources Together With Google Refine « OUseful.Info, the blog…
  35. Pingback: Google Visualisation API Controls Support Interactive Data Queries Within a Web Page « OUseful.Info, the blog…
  36. Pingback: How To Build Agile SEO Tools Using Google Spreadsheets | | distilled
  37. Vishal

    Hello friends,

    this is the greate article i am in need of it..

    its very nice…greate work.

    By using the imformation i am able to query all the information

    but now i need some more things..please help me to do it.

    i need the some solution to print the tabular data to the specific form/report like formate …

    please help me to work out to get html page having the form like data

  38. Taha Asadullah

    Hi,
    Thank you very much for the tutorial.
    I am trying to combine google visualization api with google spreadhseet. I am able to query by column names A, B C etc By typeing
    query.setQuery(“select A, Q “);
    .. But I would like to query by column title.(header)
    Could you please help me in this regard.

    • Tony Hirst

      @taha I’m not sure there’s an easy way of doing that? You might have to grab a copy of the eg a single row of the table along with column headers, then set up a mapping from column IDs to column labels, and then map the labels you want to query on to the column ids? I’m a bit short of time atm, but will try to post an example when I’m back from hols.

    • Taha Asadullah

      Hi Tony,
      thanks for quick reply. Yes, even am working on mapping methods only. Will post it if I succeed.
      The problem is that I have several tables. Each of them is having categories and the cutoff rank for that particular category. Some of the tables have few categories and some of them have more. So, For example if I map column B to General Merit, and column C to Defense category, then if there is no Defense category in the next table then it will lead to erroneous graphs. The other approach would be a lengthy code written specifically for each sheet, which I dont want as an option.
      I have gone through this table:
      http://code.google.com/apis/chart/interactive/docs/querylanguage.html#Table_used_in_Examples
      But am not getting how do I add labels to the table headers.
      Also, I have referred to
      http://code.google.com/apis/chart/interactive/docs/spreadsheets.html
      and added headers=1 in options. But it did not work.
      Thanks once again.
      Greetings from India.

    • Taha Asadullah

      Hi Tony,
      I wrote a small file read write java program which converted all the data uniformly.
      I have another question now. If I made my spreadhseet public, anyone will be able to view it. If I make it private only allowed users can.
      Is it possible to grant access to the spreadsheet if it access via only a particular URL. For example http://www.foo.com. If the spreadsheet query does nor originate from http://www.foo.com, access denied should appear.
      I have another concern. If a person has access to my Data Source URL, he will be able to find out how I made that particular application. Please be patient. Am a newbie here.
      Thanks for all the help.
      I must say I was able to build a small application using your blog. God bless you.
      Rock on
      Greetings from India.

  39. Pingback: Using Google Spreadsheets as a Database Source for R « OUseful.Info, the blog…
  40. stevecaldwells

    if anyone has problems with the header row not returning properly in your results, you can add &headers=1 (or however many header rows you have) to force it to return the header row correctly.

  41. Gareth

    Great article – very helpful – having issues trying to hide the headings though when calling a html table – they always seems to be there despite trying your PPPS suggestions below – which only work on the data – not the headings?! – could it be something to do with the spreadsheet maybe??? Any suggestions welcome!

    “PPPS If the first row in the table defines column headings, then there are intervening lines (maybe spaces) before the data starts, putting offset N (where N is a number) will skip that many rows before displaying the data.”

    Thanks again,

    Gareth

  42. Johny why

    Offset does not seem to work. Any help?

    How can I apply CSS?

    Note, headers does NOT turn the first row of the spreadsheet on/off– it tells the HTML how many rows should be formatted as headers. At 0, your first row of data will be formatted as data.

  43. Ed Beighe

    Hey, this is so cool! Thanks for the examples. when i tried them a couple of weeks ago, everything worked great. Now, for the past day or so they say:

    “Google Docs has encountered a server error. If reloading the page doesn’t help, please contact us. We’re sorry, a server error occurred. Please wait a bit and try reloading your spreadsheet….”

    anybody know why?

  44. Pingback: So Where Do the Numbers in Government Reports Come From? « OUseful.Info, the blog…
  45. Pingback: Plotting Latitude and Longitude with ggplot Map Projections in R « OUseful.Info, the blog…
  46. John

    I have a spreadsheet in google docs which I am trying to fetch data pertaining to a member’s ID #. On our site, members login, and once logged in, they are browsing around on their own replicated version of our website. I am trying to gather info from a google doc spreadsheet and stream it into each back office of each member. Thing is, the area I need to do this in is a GLOBAL area, meaning whatever goes in that area everybody can see.

    I would like to be able to pull rows from the google doc spreadsheet based on Member ID # (example 102365) and have all lines from the spreadsheet that belong to that member appear in their back office. So each member can only see their own info. member 102365 only views submissions from member 102365, and so on.

    Can anybody help with this? We do have tokens that we place on our end to capture logged in member’s ID #’s and complete URLs. The ID # is also it’s own column in the spreadsheet.

    Thanks,

  47. Pingback: Open Research Data Processes: KMi Crunch – Hosted RStudio Analytics Studio « OUseful.Info, the blog…
  48. Pingback: Filtering Guardian University Data Every Which Way You Can… « OUseful.Info, the blog…
  49. Pingback: So It Seems My Ballot Didn’t Count Twice in the PCC Election… « OUseful.Info, the blog…
  50. Pingback: LAK13: Recipes in capturing and analyzing data – Twitter Jisc CETIS MASHe
  51. ImportXML

    We can import strings,tables and Multiple data from a webpages . Like that we can use google Docs as databases by saving websites data and by importing those tables like databases.