OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Using Google Spreadsheets Like a Database – The QUERY Formula

In this year’s student satisfaction tables, which universities have a good teaching score but low employment prospects? How would you find out? In this post, you’ll find out…

Whether or not it was one of my resolutions, one of the things I want to do more this year is try to try to make more use of stuff that’s already out there, and come up with recipes that hopefully demonstrate to others how to make use of those resources.

So today’s trick is prompted by a request from @paulbradshaw about “how to turn a spreadsheet into a form-searchable database for users” within a Google spreadsheet (compared to querying a google spreadsheet via a URI, as described in Using Google Spreadsheets as a Database with the Google Visualisation API Query Language).

I’m not going to get as far as the form bit, but here’s how to grab details from a Google spreadsheet, such as one of the spreadsheets posted to the Guardian Datastore, and query it as if it was a database in the context of one of your own Google spreadsheets.

This trick actually relies on the original Google spreadsheet being shared in “the right way”, which for the purposes of this post we’ll take to mean – it can be viewed using a URL of the form:

http://spreadsheets.google.com/ccc?key=SPREADSHEETKEY&hl=en

(The &hl=en on the end is superfluous – it doesn’t matter if it’s not there…) The Guardian Datastore folks sometimes qualify this link with a statement of the form Link (if you have a Google Docs account).

If the link is of the form:
http://spreadsheets.google.com/pub?key=SPREADSHEETKEY
just change pub to ccc

So for example, take the case of the 2010-2011 Higher Education tables (described here):

http://spreadsheets.google.com/ccc?key=reBYenfrJHIRd4voZfiSmuw

The first thing to do is to grab a copy of the data into our own spreadsheet. So go to Google Docs, create a new spreadsheet, and in cell A1 enter the formula:
=ImportRange(“reBYenfrJHIRd4voZfiSmuw”,”Institutional Table!A1:K118″)

When you hit return, the spreadsheet should be populated with data from the Guardian Datastore spreadsheet.

So let’s see how that formula is put together.
=ImportRange(“reBYenfrJHIRd4voZfiSmuw”,”Institutional Table!A1:K118″)

Firstly, we use the =ImportRange() formula, which has the form:
=ImportRange(SPREADSHEETKEY, SHEET!RANGE)

This says that we want to import a range of cells from a sheet in another spreadsheet/workbook that we have access to (such as one we own, one that is shared with us in an appropriate way, or a public one). The KEY is the key value from the URL of the spreadsheet we want to import data from. The SHEET is the name of the sheet the data is on:

The RANGE is the range of the cells we want to copy over from the external spreadsheet.

Enter the formula into a single cell in your spreadsheet and the whole range of cells identified in the specified sheet of the original spreadsheet will be imported to your spreadsheet.

Give the sheet a name (I called mine ‘Institutional Table 2010-2011′; the default would be ‘Sheet1′).

Now we’re going to treat that imported data as if it was in a database, using the =QUERY() formula.

Create a new sheet, call it “My Queries” or something similar and in cell A1 enter the formula:

=QUERY(‘Institutional Table 2010-2011’!A1:K118,”Select A”)

What happens? Column A is pulled into the spreadsheet is what. So how does that work?

The =QUERY() formula, which has the basic form =QUERY(RANGE,DATAQUERY), allows us to run a special sort of query against the data specified in the RANGE. That is, you can think of =QUERY(RANGE,) as specifying a database; and DATAQUERY as a database query language query (sic) over that database.

So what sorts of DATAQUERY can we ask?

The simplest queries are not really queries at all, they just copy whole columns from the “database” range into our “query” spreadsheet.

So things like:

  • =QUERY(‘Institutional Table 2010-2011’!A1:K118,“Select C”) to select column C;
  • =QUERY(‘Institutional Table 2010-2011’!A1:K118,“Select C,D,G,H”) to select columns C, D, G and H;

So looking at copy of the data in our spreadsheet, import the columns relating to the Institution, Average Teaching Score, Expenditure per Student and Career Prospects, I’d select columns C, D, F and H:

like this:
=QUERY(‘Institutional Table 2010-2011’!A1:K118,“Select C,D, F,H”)
to give this:

(Remember that the column labels in the query refer to the spreadsheet we are treating as a database, not the columns in the query results sheet shown above.)

All well and good. But suppose we only want to look at institutions with a poor teaching score (column D), less than 40? Can we do that too? Well, yes, we can, with a query of the form:

“Select C,D, F,H where D < 40"

(The spaces around the less than sign are important… if you don’t include them, the query may not work.)

Here’s the result:

(Remember, column D in the query is actually the second selected column, which is placed into column B in the figure shown above.)

Note that we can order the results according to other columns to. So for example, to order the results according to increasing expenditure (column F), we can write:

“Select C,D, F,H where D < 40 order by F asc"

(For decreasing order, use desc.)

Note that we can run more complex queries too. So for example, if we want to find institutions with a high average teaching score (column D) but low career prospects (column H) we might ask:

“Select C,D, F,H where D > 70 and H < 70"

And so on…

Over the nect week or two, I’ll post a few more examples of how to write spreadsheet queries, as well as showing you a trick or two about how to build a simple form like interface within the spreadsheet for constructing queries automatically; but for now, why try having a quick play with the =QUERY() formula yourself?

Written by Tony Hirst

January 19, 2010 at 2:21 pm

37 Responses

Subscribe to comments with RSS.

  1. fab stuff – can’t wait for more, thank you
    Mark

    scipmark

    January 19, 2010 at 10:35 pm

  2. […] then write queries on it (see Hirst’s recent blog for further […]

  3. […] 2010 Pipework Leave a Comment Tags: Google spreadsheet It’s all very well using a Google spreadsheet as a database, but sometimes you just want to provide a simple form to let people run a particular query. […]

  4. […] do this using a QUERY formula that interrogates a list of ISO country codes in another sheet (see Using Google Spreadsheets Like a Database – The QUERY Formula for more on the Google Spreadsheets QUERY […]

  5. […] and other online spreadsheets, using spreadsheets as a database via the =QUERY() formula (e.g. Using Google Spreadsheets Like a Database – The QUERY Formula), and so on. There might also be a market for selling prepackaged or custom formulae as script […]

  6. […] So how can we use the Rosetta Table? One way is to use a =QUERY() formula, building on the ideas explored in Using Google Spreadsheets Like a Database – The QUERY Formula. […]

  7. Thanks! Great, post. I’m working on publishing some data from a spreadsheet and have several columns that I do not want published thus I’ve been looking for ways to manage the data.

    Steve Trefethen

    April 8, 2010 at 4:18 pm

  8. This is great – thanks. Can you help with constructing a query based on selecting a range of data constrained by date – ie all future entries? I’ve tried playing around with

    =QUERY(‘Basic’!A1:M853,”Select A,B,C,D,E,F,G,H,I,J,K,L,M where A > ‘now()'”)

    but without success. (Column A has dates.) Many thanks.

    ctnablog

    April 19, 2010 at 6:57 pm

    • You could put =now() in a cell on its own, then concatenate it into the query; eg if =now() in C4, then you could have something like:
      =CONCATENATE(“Select A,B,C,D,E,F,G,H,I,J,K,L,M where A > ‘”,F3,”‘”)
      in another cell to construct the query string?

      Tony Hirst

      April 19, 2010 at 7:43 pm

  9. Thanks for this. I’ve eventually tracked down a convoluted but more or less logical way of filtering everything after (and including) today from a table:

    select A,B,C,D,E,F,G,H,I,K,L where (A >= DATE “&TEXT( DATE( YEAR(TODAY()) ; MONTH(TODAY()) ; DAY(TODAY()) ),””YYYY-MM-DD””)&”

    Not sure quite how it works – but it does!

    ctnablog

    April 21, 2010 at 8:17 pm

  10. […] NB there are are actually two ways of using a Google Spreadsheet as a database: form a third party page via a query API; and within a spreadsheet using a =QUERY() formula; […]

  11. […] Related: Using Google Spreadsheets as a Database with the Google Visualisation API Query Language Using Google Spreadsheets Like a Database – The QUERY Formula […]

  12. Really cool, I am using this technology and i am triyng to compare some dates what i want to do it is to get the closer date to the date where at real time from a column… Can someone help me how can i do this…?

    felipe

    August 11, 2010 at 10:26 pm

  13. […] data into that environment. Once there it becomes a database, as I’ve described before (e.g. Using Google Spreadsheets Like a Database – The QUERY Formula and Using Google Spreadsheets as a Database with the Google Visualisation API Query […]

  14. Thanks for the article. I set up a spreadsheet and form that uses imported data from another spreadsheet. The form asks the user for the conditions they want. I use a script to set the formula for a cell to =QUERY(‘Sheet2’!A2:H5001,”Select * where E > 17 and F > 65 order by E,F asc”). The 17 & 65 have been entered from the form. If I just enter this manually in the cell, it works, but when it is submitted from the form, I get #NAME? error:Unknown range name ‘Sheet2’!A2. Any ideas? Thanks

    Seth

    December 16, 2010 at 2:49 am

    • Of course. Forms do not allow code. This is a basic security req on all dbs

      Jose

      January 5, 2011 at 2:46 am

  15. Regarding the ‘greater than now’ problem, the Google spreadsheet guru Ahab posted some solutions and for my today-or-later select statement I found that:

    SELECT A WHERE A >= todate(now())

    worked. Apparently there are some Google server issues related to which time zone is now(), so this may not be a good solution for everyone.

    Thanks again for all the helpful pointers!

    Todd Katz

    February 10, 2011 at 11:42 pm

    • So, the above does work for some calculated date fields … but for others not.

      Relying again on Google spreadsheet gurus notes, it seems that the following does work insofar as it will retrieve all instances of a date after today:

      =query(E2:E;”SELECT E where E >= datetime “&TEXT(today(),””YYYY-MM-DD HH:mm:ss””)&””)

      Todd Katz

      February 12, 2011 at 8:31 pm

  16. […] PS here are a couple of other ways of using a Google spreadsheet as a database: – Using Google Spreadsheets as a Database with the Google Visualisation API Query Language – Using Google Spreadsheets Like a Database – The QUERY Formula […]

  17. Hi: I’m very interested in using SQL to retrive data from spreadsheets. I’m now working with Google Apps Script in Google Sites. My question is: Is it possible to use some ‘Query’ formula to retrieve data from Spreadsheets (in an Apps Script) and save the results into an array? I would use the result to populate diferent listboxex widgets.
    Thank you very much.
    Benjamin.

    Benjamín

    July 18, 2011 at 9:40 am

  18. Hi.
    I sort of was getting very comfortable using Google Spreadsheets as Database, but what a surprise I just got in the moment I thought everything was going just perfect, well I got this:
    Oops
    “At this time, spreadsheets in Google Docs only support up to 50 ImportRange functions in a single spreadsheet.”

    Please any ideas of which other function can I use besides ImportRange?

    Thanks?
    ;-)

    kenly

    August 23, 2011 at 10:47 am

    • @kenly do you need to import >50 separate ranges?! Maybe one apporach would be to populate the spreadsheet/model the importRange using Google Apps script?

      Tony Hirst

      August 23, 2011 at 7:28 pm

    • You’re right, 50 only but consider you can use ranges so if you need to import from cell A1 to B2 you do not need 4 of them but one as a range (A1:B2)…

      valex

      February 27, 2012 at 3:10 pm

  19. This is very cool. Thanks… What I’m looking for is the ability to load a form with data ready for editing from a spreadsheet based on a query. i.e. a search field at the top of the page with a form below. When someone enters a name, the spreadsheet row for that name populates the form. Make sense? If so, is this possible?

    Shawn Hoefer

    September 11, 2011 at 4:12 am

  20. how to make an embed spreadsheet has a search/filter/sort box ?

    anjie

    December 15, 2011 at 1:53 am

  21. Great post.

    I’m trying to create a google spreadsheet that produces charts of amounts (of a few categories) against time. The amounts along with their associated dates are input from a form. As I’m using a form, I think that I need to use QUERY to ensure that the charts pick up the changes. I have a QUERY returning nearly what I need for the chart, bar the following:

    1. I’m making a bar chart with a bar for each week number of the year summing the amount for that week. Getting the week of the year is normally easy, but is difficult when limited to the query function. I’ve managed to get a column with the month of the year and another with the week of the month. These could suffice but I would need to format them with leading zeros and concatenate them… in a query. Is this possible? I think I’m having issues running a query on a RANGE that is itself a result of a query (so under the hood it’s a CONTINUE).

    2. Where there are no amount entries submitted for a particular week, I would like to show a gap in the chart. i.e. an amount of zero for a week not specified in my form data. Is this possible?

    Thanks in advance.

    Conor

    January 6, 2012 at 3:44 pm

  22. […] Google Spreadsheets as a Database with the Google Visualisation API Query Language (via an API) and Using Google Spreadsheets Like a Database – The QUERY Formula (within Google spreadsheets […]

  23. What about a dynamic range? Let’s say I have a form that is adding to the range. How do I automatically get the formula to cover the correct range?

    Ben

    March 17, 2012 at 6:25 am

  24. I have a similar ? to Ben’s, r.e. dynamic ranges.

    In addition, if someone does not fill in data, I want the query to exclude them. Therefore, I’ve tried something like this: =QUERY(‘USK Self-ID data’!A3:N501, (IF(C>0,”Select D,E,F,G”>0)))

    However, each time I try, I get an error message saying “Circular dependency detected.”

    Any suggestions for how to get the query to only return data if the cell in column C is >0?

    Thank you!

    Bethann

    September 10, 2012 at 1:52 am

  25. Although I browsed through the previous comments I did’t find any solution to my case:
    I have a primary and a secondary spreadsheet where the primary one has a column that would like to use as a primary key to MATCH VALUES (multiple for each row) from the secondary sheet (it also has a corresponding column as a foreign key), and return them inside a SINGLE CELL. I have been able to do this with a query formula, but can’t make it to update itself as more rows are added. I don’t want to have to drag it downwards in order to update itself.

    Any ideas?

    Panos

    September 18, 2012 at 11:29 am

  26. […] Tony Hirst has also  written about Using Google Spreadsheets Like a Database – The QUERY Formula and this is a place if you want some more query […]

  27. […] Using Google Spreadsheets Like a Database – The QUERY Formula « OUseful.Info, the blog… […]

  28. Hi. Can I use “Notification rules” on on cell range what are coming from QUERY? Ex. I want that my sales team will be notified everything what happens in “sheet!A:A”.

    VinaPlace

    May 22, 2013 at 5:37 pm

  29. […] data is being published via a Google Spreadsheet, which means we can also treat it as a database, asking database like queries either within the spreadsheet itself, or via a […]

  30. I am trying to create an import query based on user selection. The part that I can’t get to work is when the user enters a value in cell F2 nothing gets returned (yes the key Col 5 is in currency format). Below is my code; any help would be greatly appreciated:
    =if(B1=K4,Query(importRange(“0AifmgnvG8i_kdEY1NGN4a1NJZThFbEZxMWFrWDFTcFE”,”Customer Experience-CONSO!A2:g37″),”select Col1, Col2, Col3, Col4, Col5, Col6,Col7 where Col5 >'”&F2&”‘ order by Col1″,2),if(B1=K5,Query(importRange(“0AifmgnvG8i_kdEZVTG5SdXNwd3BCNjBPLWFtR0Jra1E”,”Residential Sales-ONT!A2:g37″),”select Col1, Col2, Col3, Col4, Col5, Col6, Col7 where Col5 > ‘”&F2&”‘ order by Col1″,2),Query(importRange(“0AifmgnvG8i_kdEZVTG5SdXNwd3BCNjBPLWFtR0Jra1E”,”Residential Sales-QC!A2:g37″),”select Col1, Col2, Col3, Col4, Col5, Col6, Col7 where Col5 >'”&F2&”‘ order by Col1″,2)))

    Sarah

    July 15, 2013 at 2:17 pm

  31. […] Using Google Spreadsheets Like a Database – The QUERY Formula | OUseful.Info, the blog…. […]

  32. […] Using Google Spreadsheets Like a Database – The QUERY Formula […]


Comments are closed.

Follow

Get every new post delivered to your Inbox.

Join 866 other followers

%d bloggers like this: