Some Notes on Churnalism and a Question About Two Sided Markets

Whilst recently pondering automated content generation from original data sources again (eg as per Data Textualisation – Making Human Readable Sense of Data, or Notes on Narrative Science and Automated Insights), along with other forms of so-called “robot journalism”, I started framing for myself some of the risks associated with that approach in the context of churnalism, “the use of unchecked PR material in news” (Daniel Jackson and Kevin Moloney, ‘Inside Churnalism: PR, journalism and power relationships in flux’, Journalism Studies, 2015), “the passive processing of material which overwhelmingly tends to be supplied for them by outsiders, particularly wire agencies and PR” (Nick Davies, ‘Flat Earth News’, Vintage Books, 2009 p73), “journalists failing to perform the simple basic functions of their profession. … journalists who are no longer out gathering news but who are reduced instead to passive processors of whatever material comes their way, churning out stories, whether real event or PR artifice, important or trivial, true or false” (ibid, p59).

Davies (ibid) goes on to suggest that: “the churnalists working on the assembly line in the news factory construct national news stories from raw material which arrives along two primary conveyor belts: the Press Association and public relations” (p74).

The quality of these sources differs in the following ways: “PA is a news agency, not a newspaper. It is not attempting, nor does it claim to be attempting, to tell people the truth about the world. … The PA reporter goes to the press conference with the intention of capturing an accurate record of what is said. Whether what is said is itself a truthful account of the world is simply not their business” (p83). If this is a fair representation of what PA does, we might then expect the journalist to do some journalistic work around the story, contextualising it, perhaps seeking comment, alternative views or challenges to it.

On the other hand, PR content is material that is “clearly inherently unreliable as a source of truth, simply because it is designed to serve an interest” (p89), “whether or not it is truthful, [because it] is designed specifically to promote and to suppress stories in order to serve the interests of political, commercial and other groups” (p91). This sort of content requires additional journalistic effort in terms of verification and some reflection on the rationale behind the press release when trying to to work out the extent to which it is newsworthy or might feed into a story that is newsworthy (a press release or flurry of press releases might give you a feeling that there is a deeper story…).

The two input drivers of churnalism claimed by Davies – wire copy and PR – both play a significant role in influencing what content goes into a story. For the publisher of web-mediated news content, another filtering process may influence what content the audience sees in the form of algorithms that prioritise the placement of news stories on a website, such as the “most popular story” links. Mindful of web traffic stats, the churnalist might also be influenced by this third “reverse input” in their selection of what content is likely to do well when posting a story.

According to Jackson & Maloney (p2), “[t]he classic sociological conceptualisation of [the] process [in which “the PR practitioner trades data and opinion with journalists in exchange for favourable publicity”] is the information subsidy (Gandy 1982, Gandy, Oscar H. 1982. Beyond Agenda Setting: Information Subsidies and Public Policy)” [my emphasis]. The notion of the information subsidy was new to me, and I think is a useful one; it is explored in Gandy, Oscar H. “Information in health: Subsidised news.” Media, Culture & Society 2.2 (1980), pp103-115 as follows:

“We have suggested that the news media, traditionally seen as an independent and highly credible source for information about the environment, is in fact, dominated by purposive information supplied by PRs interested in influencing private and public decision making. We have suggested further that information subsidies of journalists and other gatekeepers operate on the basis of simple economic rules. Journalists need news, however defined, and routine sources are the easiest ways to gain that information. We have suggested further that success in providing information subsidies to one’s chosen targets is closely tied to the resources available to the subsidy giver, since considerable resources are necessary for the creation of pseudo-events dramatic enough to lure a harried reporter away from a well written press release, or from the cocktails which often accompany a special press briefing. Media visibility breeds more visibility. Appearance in the press lends status, and that status leads more quickly to a repeat appearance” p106.

Economically speaking, the media is often regarded as operating in a two sided market (for example, I’ve just popped the following onto my to-read list: Two-Sided Markets: An Overview, Jean-Charles Rochet, Jean Tirole, 2004, updated as Two-Sided Markets: A Progress Report, Jean-Charles Rochet, Jean Tirole, November 29, 2005, Rochet, Jean‐Charles, and Jean Tirole. “Platform competition in two‐sided markets.” Journal of the European Economic Association 1.4 (2003): 990-1029 and Parker, Geoffrey G., and Marshall W. Van Alstyne. “Two-sided network effects: A theory of information product design.” Management Science 51.10 (2005): 1494-1504). In the context of two sided markets in publishing, the publisher itself can be seen as operating as a platform selling content to readers (one side of the market), content which includes adverts from advertisers, and selling advertising space – and audience – to advertisers (the other side of the market).

To the extent the costs of running the platform – and hence the profitability of generating content that satisfies audience and generates audience figures (and perhaps conversion rates) that satisfies advertisers – are reduced through the provision of ready made content by PR firms, we perhaps see how the PR players might be modelled as advertisers who, rather than paying cash to the platform for access to audience, instead subsidise the costs of producing content by providing it directly. That is, to the extent that advertisers subsidise the monetary cost of accessing content by an audience (to the level of free in many cases), PR firms also subside the cost of accessing content by an audience by reducing the production costs of the platform. Maybe? (I’m not an economist and haven’t really delved very far into the two sided market literature… But when I did look (briefly), I didn’t find any two sided market analyses explictly covering PR, churnalism or information subsidies, although there are papers do do consider information subsidies in a market context, eg Patricia A. Curtin (1999) “Reevaluating Public Relations – Information Subsidies: Market-Driven Journalism and Agenda-Building Theory and Practice”, Journal of Public Relations Research, 11:1, 53-90?) So my question here is: are there any economics that do explore the idea of “information subsidy” in the context of two sided market models?

It seems to me that the information subsidy provided by PR or wire copy represents a direct efficiency or timesaving for the journalist. It is not hard to understand why journalists feel pressured into working this way:

“Despite operating in a highly competitive marketplace driven by new technology, conglomeration, deregulation, competition from free newspapers and declining circulations, newspapers’ managements have squared the circle by paying staff low salaries, shedding staff and cutting training, while simultaneously increasing output, including online content (Curran and Seaton, 1997; Davis 2003; Franklin, 1997; Murphy, 1998; Tunstall, 1996, Williams and Franklin, 2007). Time available for journalists to speak to contacts, nurture sources, become familiar with a ‘patch’ and uncover and follow up leads has become a ‘luxury’ (Pecke, 2004, p. 30)” p489.

“In such a pressurised and demoralised working environment it is all too easy for journalists to become dependent on the pre-fabricated, pre-packaged ‘news’ from resource-rich public relations organisations or the familiar and easily accessed routine source or re-writes of news agency copy” p489

The Passive Journalist: How sources dominate local news, Deirdre O’Neill and Catherine O’Connor, Journalism Practice, Vol. 2, No 3, 2008 pp487-500

Writing almost 40 years ago, David Murphy (David Murphy, “The silent watchdog: the press in local politics”, Constable, 1976) described the situation in local newsrooms then in ways that perhaps still ring true today:

When the local newspaper editor comes to create his edition for the week or the evening he has to assess: (a) the raw material – bits of information – in the light of how much space he has available, which is calculated on the ratio of news to advertisements, the advertisements being a controlling factor; (b) the cost of particular kinds of coverage; (c) the circulation pull of any particular coverage, bearing i mind the audience to which it will be directed; (d) the need to have something tin the paper by the deadline, which is at the same time up-to-date. The reporter is aware when collecting his data of these sorts of factors, because he is acquainted with the news editor’s or editor’s previous responses to similar material.

This creates a situation in which the ideal type of story is one which involves the minimum amount of investigation – preferably a single interview – or the redrafting of a public relations handout, which can be written quickly and cut from the end backwards towards the beginning without making it senseless. It must also have the strongest possible readership pull. This is the minimum-cost, maximum-utility news story (p17).

Returning to Jackson & Maloney:

“The habitual incorporation of media releases and other PR material into the news by journalists is not a new phenomenon, but the apparent change is in the scale and regularity in which this is now happening” p3.

“From time immemorial, PR practitioners have been attempting to get their copy into the news. As discussed earlier, this is typically considered an information subsidy, where the PR practitioner acts as a sort of ‘pre-reporter’ for the journalist (Supa and Zoch 2009. Supa, Dustin W., and Lynn M. Zoch. 2009. “Maximizing Media Relations through a Better Understanding of the Public Relations–journalist Relationship: A Quantitative Analysis of Changes Over the Past 23 Years.” Public Relations Journal 3 (4)). In exchange for sending them pre-packaged information that the journalist can use to write a story, the PR hopes to gain favourable coverage of their client” p7.

They also extend the idea of an information subsidy to a more pernicious one of an editorial subsidy in which the content is so well packaged that is is ready-to-go without any additional input from the journalist, placing the journalist more squarely in the role of a gatekeeper than an interpreter or contextualiser:

“Our findings on PR practice in 2013 are quite clear: for the practitioners we spoke to, the days of the monolithic media release sent to all news desks are largely over. They are preparing page-ready content customised for each publication, which is carefully targeted. They are thinking like journalists—starting with the news hook, then working in their PR copy backwards from there. Alongside the body of work that documents the growing influence of PR material in the news, we believe that the concept of the information subsidy may need expanding in light of this. The implication of churnalism is that there is more than an information subsidy taking place. Where journalists copy-paste, there is an editorial subsidy occurring too. This is significant when we think about the agenda-building process, and its associated power dimension. An editorial subsidy implies more than just setting the agenda and providing building blocks for a news story (such as basic facts, statistics, or quotes) for the journalist to add editorial framing (see Reich 2010, Reich, Zvi. 2010. ‘Measuring the Impact of PR on Published News in Increasingly Fragmented News Environments: A Multifaceted Approach.’ Journalism Studies 11 (6): 799–816.). It means a focus on the more sacred editorial element of framing stories too, which for our participants usually meant positive coverage of their client and the delivery (in print or on air) of the key campaign messages. But for most of our participants, achieving the editorial subsidy was dependent on the (journalistic) style in which it was written, and it was this that they seemed most preoccupied with when discussing their media relations practice” p13.

(In a slightly different context, this reminds me in part of The University Expert Press Room and to a lesser extent (Social Media Releases and the University Press Office.)

If journalists are simply treating PR copy as gatekeepers, then we need to consider what news values, and and what editorial values, they then apply to the content they are simply passing on. As Peter Bro & Filip Wallberg describe in “Gatekeeping in a Digital Era”, Journalism Practice, 9:1, pp92-105, (2015):

“When the concept of gatekeeping was originally introduced within journalism studies, it was employed to describe a process where a wire editor received telegrams from the wire services. From these telegrams the wire editor, known as Mr. Gates in David Manning White’s (1950 White, David Manning. 1964. ‘Introduction to the Gatekeeper.’ In People, Society, and Mass Communication, edited by Lewis Anthony Dexter and David Manning White, 160–161. New York: Macmillan) seminal study, selected what to publish. This capacity to select and reject content for publication has become a popular way of portraying the function of news reporters. In time, however, telegraphy has been succeeded by new technologies, and they have inspired new practices and principles when it comes to producing, publishing and distributing news stories.

This is a technological development that challenges us to revisit, reassess and rethink the process of gatekeeping in a digital era” p93.

“What White (1950, 384) described as a daily ‘avalanche’ from the wire services, such as United Press and Associated Press, has not vanished in news organizations even though the technological platform has changed. Many news media still publish news to their readers, listeners and viewers by way of a one-way linear process, where persons inside the newsrooms are charged with the function of selecting or rejecting news stories for publication” p96.

In the linear model where the journalist acts not simply as a gatekeeper but plays a more creative, journalistic role, one of the costs associated with performing “the simple basic functions of their profession”, as Davies put it, is checking the veracity of story via a second source. In their paper on “The Passive Journalist”, which looked at how reporters operate in local and regional news, O’Neill & O’Connor “wished to know the extent to which sources were influencing the selection and production of news and rendering the role of the local journalist essentially passive or reactive, with all the subsequent implications for the quality of local reporting and the public interest” (p490).

“The findings suggest that journalists’ reliance on a single source for stories, possibly reflecting shortage of time and resources, combined with sources’ skills in presenting positive public images, is a significant contributory factor to uncritical local press reporting. … Of the 24 per cent of articles with a secondary source, most were still framed by a primary source, with a brief alternative quote included at the end of the report. What this means in practice is a formulaic style, superficially giving the appearance of ‘objective news’, but which fails to get to the heart of the issue, or misses the real story. There was little evidence of the sifting of conflicting information or contextualising that assists readers’ understanding and makes for good journalism (Williams, 2007)” p493.

“Th[e] study found that almost two-thirds (61 per cent) of local government-sourced stories (one of the main routine source categories) had no discernible secondary sources and suggests a significant unquestioning reliance on council press officers or press releases … For example, the Yorkshire Evening Post covered a story on 22 February 2007 about local authority performance league tables (‘Three-star Rating for City Council’s Good Showing’), but framed it only in terms of the report and the views of the council leader and chief executive, with no alternative or dissenting views presented, despite the fact that the authority had dropped one star in the ratings” p493.

Routine Sources, Court Reporting, the Data Beat and Metadata Journalism

In The Re-Birth of the “Beat”: A hyperlocal online newsgathering model (Journalism Practice 6.5-6 (2012): 754-765), Murray Dick cites various others to suggest that routine sources are responsible for generating a significant percentage of local news reports:

Schlesinger [Schlesinger, Philip (1987) Putting ‘Reality’ Together: BBC News. Taylor & Francis: London] found that BBC news was dependent on routine sources for up to 80 per cent of its output, while later [Franklin, Bob and Murphy, David (1991) Making the Local News: Local Journalism in Context. Routledge: London] established that local press relied upon local government, courts, police, business and voluntary organisations for 67 per cent of their stories (in [Keeble, Richard (2009) Ethics for Journalists, 2nd Edition. Routledge: London], p114-15)”].

As well as human sources, news gatherers may also look to data sources at either a local level, such as local council transparency (that is, spending data), or national data sources with a local scope as part of a regular beat. For example, the NHS publish accident and emergency statistics as the provider organisation level on a weekly basis, and nomis, the official labour market statistics publisher, publish unemployment figures at a local council level on a monthly basis. Ratings agencies such as the Care Quality Commission (CQC) and the Food Standards Agency (FSA) publish inspections data for local establishments as it becomes available, and other national agencies publish data annually that can be broken down to a local level: if you want to track car MOT failures at the postcode region level, the DVLA have the data that will help you do it.

To a certain extent, adding data sources to a regular beat, or making a beat purely from data sources enables the automatic generation of data driven press releases that can be used to shorten the production process of news reports about a particular class of routine stories that are essentially reports about “the latest figures” (see, for example, my nomis Labour Market Statistics textualisation sketch).

Data sources can also be used to support the newsgathering process by processing the data in order to raise alerts or bring attention to particular facts that might otherwise go unnoticed. Where the data has a numerical basis, this might relate to sorting a national dataset on the basis of some indicator value or other and highlighting to a particular local news outlet that their local X is in the top M or bottom N of similar establishments in the rest of the country, and that there may be a story there. Where the data has a text basis, looking for keywords might pull out paragraphs or records that are of particular interest, or running a text through an entity recognition engine such as Thomson Reuters’ OpenCalais might automatically help identify individuals or organisations of interest.

In this context of this post, I will be considering the role that metadata about court cases that is contained within court lists and court registers might have to play in helping news media identify possibly newsworthy stories arising from court proceedings. I will also explore the extent to which the metadata may be processed, both in order to help identify court proceedings that may be worth reporting on, as well to produce statistical summaries that may in themselves be newsworthy and provide a more balanced view over the activity of the courts than the impression one might get about their behaviour simply from the balance of coverage provided by the media.

Continue reading “Routine Sources, Court Reporting, the Data Beat and Metadata Journalism”

Data Driven Journalism – Survey

The notion of data driven journalism appears to have some sort of traction at the moment, not least as a recognised use context of some very powerful data handling tools, as Simon “Guardian Datastore” Rogers appearance at Google I/O suggests:


(Simon’s slot starts about 34:30 in, but there’s a good tutorial intro to Fusion Tables from the start…)

As I start to doodle ideas for an open online course on something along the lines of “visually, data” to run October-December, data journalism is going to provide one of the major scenarios for working through ideas. So I guess it’s in my interest to promote this European Journalism Centre: Survey on Data Journalism to try to find out what might actually be useful to journalists…;-)

[T]he survey Data-Driven Journalism – Your opinion aims to gather the opinion of journalists on the emerging practice of data-driven journalism and their training needs in this new field. The survey should take no more than 10 minutes to complete. The results will be publicly released and one of the entries will win a EUR 100 Amazon gift voucher

I think the EJC are looking to run a series of data-driven journalism training activities/workshops too, so it’s worth keeping an eye on the EJC site if #datajourn is your thing…

PS related: the first issue of Google’s “Think Quarterly” magazine was all about data: Think Data

PPS Data in journalism often gets conflated with data visualisation, but that’s only a part of it… Where the visulisation is the thing, then here’s a few things to think about…


Ben Fry interviewed at Where 2.0 2011

UK Journalists on Twitter

A post on the Guardian Datablog earlier today took a dataset collected by the Tweetminster folk and graphed the sorts of thing that journalists tweet about ( Journalists on Twitter: how do Britain’s news organisations tweet?).

Tweetminster maintains separate lists of tweeting journalists for several different media groups, so it was easy to grab the names on each list, use the Twitter API to pull down the names of people followed by each person on the list, and then graph the friend connections between folk on the lists. The result shows that the hacks are follow each other quite closely:

UK Media Twitter echochamber (via tweetminster lists)

Nodes are coloured by media group/Tweetminster list, and sized by PageRank, as calculated over the network using the Gephi PageRank statistic.

The force directed layout shows how folk within individual media groups tend to follow each other more intensely than they do people from other groups, but that said, inter-group following is still high. The major players across the media tweeps as a whole seem to be @arusbridger, @r4today, @skynews, @paulwaugh and @BBCLauraK.

I can generate an SVG version of the chart, and post a copy of the raw Gephi GDF data file, if anyone’s interested…

PS if you’re interested in trying out Gephi for yourself, you can download it from gephi.org. One of the easiest ways in is to explore your Facebook network

PPS for details on how the above was put together, here’s a related approach:
Trying to find useful things to do with emerging technologies in open education
Doodlings Around the Data Driven Journalism Round Table Event Hashtag Community
.

For a slightly different view over the UK political Twittersphere, see Sketching the Structure of the UK Political Media Twittersphere. And for the House and Senate in the US: Sketching Connections Between US House and Senate Tweeps

Creating Database Query Forms in Google Spreadsheets – Sort Of

It’s all very well using a Google spreadsheet as a database, but sometimes you just want to provide a simple form to let people run a particular query. Here’s a quick way of doing that within a Spreadsheet…

So for example: Can you help me crowd source a solution?. The problem is as follows:

Students will make five choices from a list of over 200 projects that have been anonymised… We will give each project a code, and have already entered all the details into an excel sheet so we can tie the project code to the supervisor.

We need a solution that will enable students to enter their project code and then have the title of the project displayed as a check to make sure they have entered the code correctly. The list of projects is just too long for a drop down list, even when split by department (around 50 in each).

Does anyone have any suggestions of tools that we can use for students to submit this type of information, so that we get it in a format that we can use, and they get confirmation of the project titles they have chosen? A simple google form isn’t going to hack it!

Here’s one way…

Create a “form” – the text entry cell can be highlighted by setting the background colour from the spreadsheet toolbar:

Construct a query. In this case, I need to select three results columns (H, I and J) from another sheet (‘Sheet1’, the one that acts as the database and contains the project codes) so the query will be of the form “select H,I,J where H contains “BIOCHEM”; the search term (“BIOCHEM”) is pulled in from the query form show above:

=concatenate(“select H,I,J where H contains ‘”,B2,”‘”)

(As a rule of thumb, if you want your query to select cells A, D, AC, the range set in the first part of the query that defines the database should span the first to the last column in the select range (Sheet1!A:AC, for example).)

By using the contains relation, this query will generate a set of results that are, in effect, a list of auto-complete suggestions as the result of a searching on a partially stated query term.

Assuming I have placed the query in cell A4, I can automatically get the results from the query as follows:

Note that it would be possible to hide the query generator (the contents of cell A4) in another sheet and just have the search box and the results displayed in the user interface sheet.

Another approach is to query the spreadsheet via its API.

So for example, if the original spreadsheet database was published as a public document, we could also grab the results as an HTML table via an API using a URI of the form:

http://spreadsheets.google.com/tq?tqx=out:html
&tq=select%20H%2CI%2CJ%20where%20H%20contains%20%22SEARCHTERM%22
&key=SPREADSHEETKEY

Setting out:csv would return the results in comma separated variable format, so we could create a Yahoo pipes interface to query the form, for example:

Here’s how:

What would be really useful would be if the Google/Yahoo widget options for the feed respected the form elements, rather than just generating a widget that displays the feed corresponding to the current Run of the pipe with the provided search terms.

Building such a widget is something I need to put on my to do list, I guess?! Sigh…

Using Google Spreadsheets Like a Database – The QUERY Formula

In this year’s student satisfaction tables, which universities have a good teaching score but low employment prospects? How would you find out? In this post, you’ll find out…

Whether or not it was one of my resolutions, one of the things I want to do more this year is try to try to make more use of stuff that’s already out there, and come up with recipes that hopefully demonstrate to others how to make use of those resources.

So today’s trick is prompted by a request from @paulbradshaw about “how to turn a spreadsheet into a form-searchable database for users” within a Google spreadsheet (compared to querying a google spreadsheet via a URI, as described in Using Google Spreadsheets as a Database with the Google Visualisation API Query Language).

I’m not going to get as far as the form bit, but here’s how to grab details from a Google spreadsheet, such as one of the spreadsheets posted to the Guardian Datastore, and query it as if it was a database in the context of one of your own Google spreadsheets.

This trick actually relies on the original Google spreadsheet being shared in “the right way”, which for the purposes of this post we’ll take to mean – it can be viewed using a URL of the form:

http://spreadsheets.google.com/ccc?key=SPREADSHEETKEY&hl=en

(The &hl=en on the end is superfluous – it doesn’t matter if it’s not there…) The Guardian Datastore folks sometimes qualify this link with a statement of the form Link (if you have a Google Docs account).

If the link is of the form:
http://spreadsheets.google.com/pub?key=SPREADSHEETKEY
just change pub to ccc

So for example, take the case of the 2010-2011 Higher Education tables (described here):

http://spreadsheets.google.com/ccc?key=reBYenfrJHIRd4voZfiSmuw

The first thing to do is to grab a copy of the data into our own spreadsheet. So go to Google Docs, create a new spreadsheet, and in cell A1 enter the formula:
=ImportRange(“reBYenfrJHIRd4voZfiSmuw”,”Institutional Table!A1:K118″)

When you hit return, the spreadsheet should be populated with data from the Guardian Datastore spreadsheet.

So let’s see how that formula is put together.
=ImportRange(“reBYenfrJHIRd4voZfiSmuw”,”Institutional Table!A1:K118″)

Firstly, we use the =ImportRange() formula, which has the form:
=ImportRange(SPREADSHEETKEY, SHEET!RANGE)

This says that we want to import a range of cells from a sheet in another spreadsheet/workbook that we have access to (such as one we own, one that is shared with us in an appropriate way, or a public one). The KEY is the key value from the URL of the spreadsheet we want to import data from. The SHEET is the name of the sheet the data is on:

The RANGE is the range of the cells we want to copy over from the external spreadsheet.

Enter the formula into a single cell in your spreadsheet and the whole range of cells identified in the specified sheet of the original spreadsheet will be imported to your spreadsheet.

Give the sheet a name (I called mine ‘Institutional Table 2010-2011’; the default would be ‘Sheet1’).

Now we’re going to treat that imported data as if it was in a database, using the =QUERY() formula.

Create a new sheet, call it “My Queries” or something similar and in cell A1 enter the formula:

=QUERY(‘Institutional Table 2010-2011’!A1:K118,”Select A”)

What happens? Column A is pulled into the spreadsheet is what. So how does that work?

The =QUERY() formula, which has the basic form =QUERY(RANGE,DATAQUERY), allows us to run a special sort of query against the data specified in the RANGE. That is, you can think of =QUERY(RANGE,) as specifying a database; and DATAQUERY as a database query language query (sic) over that database.

So what sorts of DATAQUERY can we ask?

The simplest queries are not really queries at all, they just copy whole columns from the “database” range into our “query” spreadsheet.

So things like:

  • =QUERY(‘Institutional Table 2010-2011’!A1:K118,“Select C”) to select column C;
  • =QUERY(‘Institutional Table 2010-2011’!A1:K118,“Select C,D,G,H”) to select columns C, D, G and H;

So looking at copy of the data in our spreadsheet, import the columns relating to the Institution, Average Teaching Score, Expenditure per Student and Career Prospects, I’d select columns C, D, F and H:

like this:
=QUERY(‘Institutional Table 2010-2011’!A1:K118,“Select C,D, F,H”)
to give this:

(Remember that the column labels in the query refer to the spreadsheet we are treating as a database, not the columns in the query results sheet shown above.)

All well and good. But suppose we only want to look at institutions with a poor teaching score (column D), less than 40? Can we do that too? Well, yes, we can, with a query of the form:

“Select C,D, F,H where D < 40"

(The spaces around the less than sign are important… if you don’t include them, the query may not work.)

Here’s the result:

(Remember, column D in the query is actually the second selected column, which is placed into column B in the figure shown above.)

Note that we can order the results according to other columns to. So for example, to order the results according to increasing expenditure (column F), we can write:

“Select C,D, F,H where D < 40 order by F asc"

(For decreasing order, use desc.)

Note that we can run more complex queries too. So for example, if we want to find institutions with a high average teaching score (column D) but low career prospects (column H) we might ask:

“Select C,D, F,H where D > 70 and H < 70"

And so on…

Over the nect week or two, I’ll post a few more examples of how to write spreadsheet queries, as well as showing you a trick or two about how to build a simple form like interface within the spreadsheet for constructing queries automatically; but for now, why try having a quick play with the =QUERY() formula yourself?

My Presentation for News:Rewired – Doing the Data Mash

For once, I didn’t put links into a presentation, so here instead are the link resources for my News:Rewired presentation:

(If I get a chance over the next week or so, I may even try to make a slidecast out of the above…)

The link story for the presentation goes something like this:

If there’s something “dataflow” related you’d like see explored here, please leave a request as a comment and I’ll see what I can do :-) I’ve also started a newsrw) category (view it here) which I’ll start posting relevant content to; (see also the datajourn tag).