Using Google Spreadsheets Like a Database – The QUERY Formula

In this year’s student satisfaction tables, which universities have a good teaching score but low employment prospects? How would you find out? In this post, you’ll find out…

Whether or not it was one of my resolutions, one of the things I want to do more this year is try to try to make more use of stuff that’s already out there, and come up with recipes that hopefully demonstrate to others how to make use of those resources.

So today’s trick is prompted by a request from @paulbradshaw about “how to turn a spreadsheet into a form-searchable database for users” within a Google spreadsheet (compared to querying a google spreadsheet via a URI, as described in Using Google Spreadsheets as a Database with the Google Visualisation API Query Language).

I’m not going to get as far as the form bit, but here’s how to grab details from a Google spreadsheet, such as one of the spreadsheets posted to the Guardian Datastore, and query it as if it was a database in the context of one of your own Google spreadsheets.

This trick actually relies on the original Google spreadsheet being shared in “the right way”, which for the purposes of this post we’ll take to mean – it can be viewed using a URL of the form:

http://spreadsheets.google.com/ccc?key=SPREADSHEETKEY&hl=en

(The &hl=en on the end is superfluous – it doesn’t matter if it’s not there…) The Guardian Datastore folks sometimes qualify this link with a statement of the form Link (if you have a Google Docs account).

If the link is of the form:
http://spreadsheets.google.com/pub?key=SPREADSHEETKEY
just change pub to ccc

So for example, take the case of the 2010-2011 Higher Education tables (described here):

http://spreadsheets.google.com/ccc?key=reBYenfrJHIRd4voZfiSmuw

The first thing to do is to grab a copy of the data into our own spreadsheet. So go to Google Docs, create a new spreadsheet, and in cell A1 enter the formula:
=ImportRange(“reBYenfrJHIRd4voZfiSmuw”,”Institutional Table!A1:K118″)

When you hit return, the spreadsheet should be populated with data from the Guardian Datastore spreadsheet.

So let’s see how that formula is put together.
=ImportRange(“reBYenfrJHIRd4voZfiSmuw”,”Institutional Table!A1:K118″)

Firstly, we use the =ImportRange() formula, which has the form:
=ImportRange(SPREADSHEETKEY, SHEET!RANGE)

This says that we want to import a range of cells from a sheet in another spreadsheet/workbook that we have access to (such as one we own, one that is shared with us in an appropriate way, or a public one). The KEY is the key value from the URL of the spreadsheet we want to import data from. The SHEET is the name of the sheet the data is on:

The RANGE is the range of the cells we want to copy over from the external spreadsheet.

Enter the formula into a single cell in your spreadsheet and the whole range of cells identified in the specified sheet of the original spreadsheet will be imported to your spreadsheet.

Give the sheet a name (I called mine ‘Institutional Table 2010-2011’; the default would be ‘Sheet1’).

Now we’re going to treat that imported data as if it was in a database, using the =QUERY() formula.

Create a new sheet, call it “My Queries” or something similar and in cell A1 enter the formula:

=QUERY(‘Institutional Table 2010-2011’!A1:K118,”Select A”)

What happens? Column A is pulled into the spreadsheet is what. So how does that work?

The =QUERY() formula, which has the basic form =QUERY(RANGE,DATAQUERY), allows us to run a special sort of query against the data specified in the RANGE. That is, you can think of =QUERY(RANGE,) as specifying a database; and DATAQUERY as a database query language query (sic) over that database.

So what sorts of DATAQUERY can we ask?

The simplest queries are not really queries at all, they just copy whole columns from the “database” range into our “query” spreadsheet.

So things like:

  • =QUERY(‘Institutional Table 2010-2011’!A1:K118,“Select C”) to select column C;
  • =QUERY(‘Institutional Table 2010-2011’!A1:K118,“Select C,D,G,H”) to select columns C, D, G and H;

So looking at copy of the data in our spreadsheet, import the columns relating to the Institution, Average Teaching Score, Expenditure per Student and Career Prospects, I’d select columns C, D, F and H:

like this:
=QUERY(‘Institutional Table 2010-2011’!A1:K118,“Select C,D, F,H”)
to give this:

(Remember that the column labels in the query refer to the spreadsheet we are treating as a database, not the columns in the query results sheet shown above.)

All well and good. But suppose we only want to look at institutions with a poor teaching score (column D), less than 40? Can we do that too? Well, yes, we can, with a query of the form:

“Select C,D, F,H where D < 40"

(The spaces around the less than sign are important… if you don’t include them, the query may not work.)

Here’s the result:

(Remember, column D in the query is actually the second selected column, which is placed into column B in the figure shown above.)

Note that we can order the results according to other columns to. So for example, to order the results according to increasing expenditure (column F), we can write:

“Select C,D, F,H where D < 40 order by F asc"

(For decreasing order, use desc.)

Note that we can run more complex queries too. So for example, if we want to find institutions with a high average teaching score (column D) but low career prospects (column H) we might ask:

“Select C,D, F,H where D > 70 and H < 70"

And so on…

Over the nect week or two, I’ll post a few more examples of how to write spreadsheet queries, as well as showing you a trick or two about how to build a simple form like interface within the spreadsheet for constructing queries automatically; but for now, why try having a quick play with the =QUERY() formula yourself?

37 comments

  1. Pingback: news:rewired #3: data mashing panel « Librarian of tomorrow
  2. Pingback: Creating Query Forms in Google Spreadsheets – Sort Of « OUseful.Info, the blog…
  3. Pingback: Creating a Winter Olympics 2010 Medal Map In Google Spreadsheets « OUseful.Info, the blog…
  4. Pingback: Using Data From Linked Data Datastores the Easy Way… « OUseful.Info, the blog…
  5. Pingback: A Rosetta Stone for Guardian Datastore UK Higher Education Data « OUseful.Info, the blog…
  6. Steve Trefethen

    Thanks! Great, post. I’m working on publishing some data from a spreadsheet and have several columns that I do not want published thus I’ve been looking for ways to manage the data.

  7. ctnablog

    This is great – thanks. Can you help with constructing a query based on selecting a range of data constrained by date – ie all future entries? I’ve tried playing around with

    =QUERY(‘Basic’!A1:M853,”Select A,B,C,D,E,F,G,H,I,J,K,L,M where A > ‘now()'”)

    but without success. (Column A has dates.) Many thanks.

    • Tony Hirst

      You could put =now() in a cell on its own, then concatenate it into the query; eg if =now() in C4, then you could have something like:
      =CONCATENATE(“Select A,B,C,D,E,F,G,H,I,J,K,L,M where A > ‘”,F3,”‘”)
      in another cell to construct the query string?

  8. ctnablog

    Thanks for this. I’ve eventually tracked down a convoluted but more or less logical way of filtering everything after (and including) today from a table:

    select A,B,C,D,E,F,G,H,I,K,L where (A >= DATE “&TEXT( DATE( YEAR(TODAY()) ; MONTH(TODAY()) ; DAY(TODAY()) ),””YYYY-MM-DD””)&”

    Not sure quite how it works – but it does!

  9. Pingback: Web Lego And Format Glue, aka Get Yer Mashup On « OUseful.Info, the blog…
  10. Pingback: Using CSV Docs As a Database « OUseful.Info, the blog…
  11. felipe

    Really cool, I am using this technology and i am triyng to compare some dates what i want to do it is to get the closer date to the date where at real time from a column… Can someone help me how can i do this…?

  12. Pingback: First Dabblings WIth Scraperwiki – All Party Groups « OUseful.Info, the blog…
  13. Seth

    Thanks for the article. I set up a spreadsheet and form that uses imported data from another spreadsheet. The form asks the user for the conditions they want. I use a script to set the formula for a cell to =QUERY(‘Sheet2’!A2:H5001,”Select * where E > 17 and F > 65 order by E,F asc”). The 17 & 65 have been entered from the form. If I just enter this manually in the cell, it works, but when it is submitted from the form, I get #NAME? error:Unknown range name ‘Sheet2’!A2. Any ideas? Thanks

  14. Todd Katz

    Regarding the ‘greater than now’ problem, the Google spreadsheet guru Ahab posted some solutions and for my today-or-later select statement I found that:

    SELECT A WHERE A >= todate(now())

    worked. Apparently there are some Google server issues related to which time zone is now(), so this may not be a good solution for everyone.

    Thanks again for all the helpful pointers!

    • Todd Katz

      So, the above does work for some calculated date fields … but for others not.

      Relying again on Google spreadsheet gurus notes, it seems that the following does work insofar as it will retrieve all instances of a date after today:

      =query(E2:E;”SELECT E where E >= datetime “&TEXT(today(),””YYYY-MM-DD HH:mm:ss””)&””)

  15. Pingback: Google Visualisation API Controls Support Interactive Data Queries Within a Web Page « OUseful.Info, the blog…
  16. Benjamín

    Hi: I’m very interested in using SQL to retrive data from spreadsheets. I’m now working with Google Apps Script in Google Sites. My question is: Is it possible to use some ‘Query’ formula to retrieve data from Spreadsheets (in an Apps Script) and save the results into an array? I would use the result to populate diferent listboxex widgets.
    Thank you very much.
    Benjamin.

  17. kenly

    Hi.
    I sort of was getting very comfortable using Google Spreadsheets as Database, but what a surprise I just got in the moment I thought everything was going just perfect, well I got this:
    Oops
    “At this time, spreadsheets in Google Docs only support up to 50 ImportRange functions in a single spreadsheet.”

    Please any ideas of which other function can I use besides ImportRange?

    Thanks?
    ;-)

    • Tony Hirst

      @kenly do you need to import >50 separate ranges?! Maybe one apporach would be to populate the spreadsheet/model the importRange using Google Apps script?

    • valex

      You’re right, 50 only but consider you can use ranges so if you need to import from cell A1 to B2 you do not need 4 of them but one as a range (A1:B2)…

  18. Shawn Hoefer

    This is very cool. Thanks… What I’m looking for is the ability to load a form with data ready for editing from a spreadsheet based on a query. i.e. a search field at the top of the page with a form below. When someone enters a name, the spreadsheet row for that name populates the form. Make sense? If so, is this possible?

  19. Conor

    Great post.

    I’m trying to create a google spreadsheet that produces charts of amounts (of a few categories) against time. The amounts along with their associated dates are input from a form. As I’m using a form, I think that I need to use QUERY to ensure that the charts pick up the changes. I have a QUERY returning nearly what I need for the chart, bar the following:

    1. I’m making a bar chart with a bar for each week number of the year summing the amount for that week. Getting the week of the year is normally easy, but is difficult when limited to the query function. I’ve managed to get a column with the month of the year and another with the week of the month. These could suffice but I would need to format them with leading zeros and concatenate them… in a query. Is this possible? I think I’m having issues running a query on a RANGE that is itself a result of a query (so under the hood it’s a CONTINUE).

    2. Where there are no amount entries submitted for a particular week, I would like to show a gap in the chart. i.e. an amount of zero for a week not specified in my form data. Is this possible?

    Thanks in advance.

  20. Pingback: So Where Do the Numbers in Government Reports Come From? « OUseful.Info, the blog…
  21. Ben

    What about a dynamic range? Let’s say I have a form that is adding to the range. How do I automatically get the formula to cover the correct range?

  22. Bethann

    I have a similar ? to Ben’s, r.e. dynamic ranges.

    In addition, if someone does not fill in data, I want the query to exclude them. Therefore, I’ve tried something like this: =QUERY(‘USK Self-ID data’!A3:N501, (IF(C>0,”Select D,E,F,G”>0)))

    However, each time I try, I get an error message saying “Circular dependency detected.”

    Any suggestions for how to get the query to only return data if the cell in column C is >0?

    Thank you!

  23. Panos

    Although I browsed through the previous comments I did’t find any solution to my case:
    I have a primary and a secondary spreadsheet where the primary one has a column that would like to use as a primary key to MATCH VALUES (multiple for each row) from the secondary sheet (it also has a corresponding column as a foreign key), and return them inside a SINGLE CELL. I have been able to do this with a query formula, but can’t make it to update itself as more rows are added. I don’t want to have to drag it downwards in order to update itself.

    Any ideas?

  24. Pingback: Reshaping importHTML data in Google Spreadsheet using QUERY and TRANSPOSE formula JISC CETIS MASHe
  25. Pingback: WordPress – Weekly Dump (weekly) | ResponsiPaul
  26. VinaPlace

    Hi. Can I use “Notification rules” on on cell range what are coming from QUERY? Ex. I want that my sales team will be notified everything what happens in “sheet!A:A”.

  27. Pingback: Asking Questions of Data – Garment Factories Data Expedition | archaeoinaction.info
  28. Sarah

    I am trying to create an import query based on user selection. The part that I can’t get to work is when the user enters a value in cell F2 nothing gets returned (yes the key Col 5 is in currency format). Below is my code; any help would be greatly appreciated:
    =if(B1=K4,Query(importRange(“0AifmgnvG8i_kdEY1NGN4a1NJZThFbEZxMWFrWDFTcFE”,”Customer Experience-CONSO!A2:g37″),”select Col1, Col2, Col3, Col4, Col5, Col6,Col7 where Col5 >'”&F2&”‘ order by Col1″,2),if(B1=K5,Query(importRange(“0AifmgnvG8i_kdEZVTG5SdXNwd3BCNjBPLWFtR0Jra1E”,”Residential Sales-ONT!A2:g37″),”select Col1, Col2, Col3, Col4, Col5, Col6, Col7 where Col5 > ‘”&F2&”‘ order by Col1″,2),Query(importRange(“0AifmgnvG8i_kdEZVTG5SdXNwd3BCNjBPLWFtR0Jra1E”,”Residential Sales-QC!A2:g37″),”select Col1, Col2, Col3, Col4, Col5, Col6, Col7 where Col5 >'”&F2&”‘ order by Col1″,2)))

  29. Pingback: Using Google Spreadsheets Like a Database – The QUERY Formula | OUseful.Info, the blog… | timcoleman3d
  30. Pingback: Google: Your Database | The Braindoc's Blog