OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Disposable Visual Data Explorers with Shiny – Guardian University Tables 2014

Have data – now what? Building your own interactive data explorer need not be a chore with the R shiny library… Here’s a quick walkthrough…

In Datagrabbing Commonly Formatted Sheets from a Google Spreadsheet – Guardian 2014 University Guide Data, I showed how to grab some data from several dozen commonly formatted sheets in a Google spreadsheet, and combine them to produce a single monolithic data set. The data relates to UK universities and provides several quality/satisfaction scores for each of the major subject areas they offer courses in.

We could upload this data to something like Many Eyes in order to generate visualisations over it, or we could create a visual data explorer app of our own. It needn’t take too long, either…

Here’s an example, the Simple Guardian University Rankings 2014 Explorer, that lets you select a university and then generate a scatterplot that shows how different quality/ranking scores vary for that university by subject area:

Crude data explorer - guardian 2014 uni stats

The explorer allows you to select a university and then generate a scatterplot based around selected quality scores. The label size is also set relative to a selected quality score.

The application is built up from three files. A generic file, that we use to load the source data in (in this example I pull it form a file, though we could bring it in live from the Google spreadsheet).

##global.R
load("guardian2014unidata.Rda")
#In this case, the data is loaded into the dataframe: gdata
#Once it's loaded, we tidy it (I should have tidied the saved data really!)
gdata[, 4:11] <- sapply(gdata[, 4:11], as.numeric)
gdata$Name.of.Institution=as.factor(gdata$Name.of.Institution)
gdata$subject=as.factor(gdata$subject)

A “server” file that takes input from the user interface elements on the left of the app and generates the displayed chart:

##server.R
library(shiny)
library(ggplot2)
 
# Define server logic
shinyServer(function(input, output) {
  
  #Simple test plot
  output$testPlot = renderPlot( {
    pdata=subset(gdata, Name.of.Institution==input$tbl)
    #g=ggplot(pdata) + geom_text(aes(x=X..Satisfied.with.Teaching,y=X..Satisfied.with.Assessment,label=subject,size=Value.added.score.10))
    g=ggplot(pdata) + geom_text(aes_string(x=input$tblx,y=input$tbly,size=input$tbls, label='subject'))
    g=g+labs(title=paste("Guardian University Tables 2014:",input$tbl))
    print(g)
  })
  
})

A user interface definition file.

##ui.R
library(shiny)
 
#Generate a list containing the names of the institutions
uList=levels(gdata$Name.of.Institution)
names(uList) = uList
 
#Generate a list containing the names of the quality/ranking score columns by column name
cList=colnames(gdata[c(1,3:11)])
names(cList) = cList
 
# Define UI for application that plots random distributions 
shinyUI(pageWithSidebar(
  
  # Application title
  headerPanel("Guardian 2014 University Tables Explorer"),
  
  sidebarPanel(
    #Which table do you want to view, based on the list of institution names?
    selectInput("tbl", "Institution:",uList),

    #Also let the user select the x, y and label size, based on quality/ranking columns
    selectInput("tblx", "x axis:",cList,selected = 'X..Satisfied.with.Teaching'),
    selectInput("tbly", "y axis:",cList,selected='X..Satisfied.with.Assessment'),
    selectInput("tbls", "Label size:",cList,selected = 'Value.added.score.10'),
    
    div("This demo provides a crude graphical view over data extracted from",
        a(href='http://www.guardian.co.uk/news/datablog/2013/jun/04/university-guide-2014-table-rankings',
          "Guardian Datablog: University guide 2014 data tables") )
    
  ),
  
  #The main panel is where the "results" charts are plotted
  mainPanel(
    plotOutput("testPlot")#,
    #tableOutput("view")  
  )
))

And that’s it… If we pop these files into a single gist – such as the one at https://gist.github.com/psychemedia/5824495, which also includes code for grabbing the data from the Google spreadsheet – we can run application from the RStudio command line as follows:

library(shiny)
runGist('5824495')

(Hit “escape” to stop the script running.)

With a minor tweak, we can get a list of unique subjects, rather than institutions and allow the user to compare courses across institution by subject, rather than across subject areas within an institution.

We can then combine the two approaches into a single interface, Guardian 2014 university table explorer v2 – whilst not ideal (we should really grey out the inactive selector – institution or subject area according to which one hasn’t been selected via the radio button).

guardian uni 2014 explorer 2

The global.R file is the same, although we need to tweak the ui.R and server.R files.

To the UI file, we add a radio button selector and an additional menu (for subjects):

##ui.R
library(shiny)

uList=levels(gdata$Name.of.Institution)
names(uList) = uList

#Pull out the list of subjects
sList=levels(gdata$subject)
names(sList) = sList

cList=colnames(gdata[c(1,3:11)])
names(cList) = cList

# Define UI for application that plots random distributions 
shinyUI(pageWithSidebar(
  
  # Application title
  headerPanel("Guardian 2014 University Tables Explorer v.2"),
  
  sidebarPanel(
    #Add in a radio button selector
    radioButtons("typ", "View by:",
                 list("Institution" = "inst",
                      "Subject" = "subj")),
    
    #Just a single selector here - which table do you want to view?
    selectInput("tbli", "Institution:",uList),
    #Add a selector for the subject list
    selectInput("tblb", "Subject:",sList),
    selectInput("tblx", "x axis:",cList,selected = 'X..Satisfied.with.Teaching'),
    selectInput("tbly", "y axis:",cList,selected='X..Satisfied.with.Assessment'),
    selectInput("tbls", "Label size:",cList,selected = 'Value.added.score.10'),
    
    div("This demo provides a crude graphical view over data extracted from",
        a(href='http://www.guardian.co.uk/news/datablog/2013/jun/04/university-guide-2014-table-rankings',
          "Guardian Datablog: University guide 2014 data tables") )
    
  ),
  
  #The main panel is where the "results" charts are plotted
  mainPanel(
    plotOutput("testPlot")#,
    #tableOutput("view")
  )
))

To the server file, we add a level of indirection, setting local state variables based on the UI selectors, and then using the value of these variables within the chart generator code itself.

##server.R
library(shiny)
library(ggplot2)

# Define server logic
shinyServer(function(input, output) {

  #We introduce a level of indirection, creating routines that set state within the scope of the server based on UI actions
  #If the radio button state changes, reset the data filter
  pdata <- reactive({
    switch(input$typ,
           inst=subset(gdata, Name.of.Institution==input$tbli),
           subj=subset(gdata, subject==input$tblb)
    )
  })
  
  #Make sure we use the right sort of label (institution or subject) in the title
  ttl <- reactive({
    switch(input$typ,
           inst=input$tbli,
           subj=input$tblb
    )
  })
 
  #Make sure we display the right sort of label (by institution or by subject) in the chart
  lab <- reactive({
    switch(input$typ,
           inst='subject',
           subj='Name.of.Institution'
           )
  })
  
  #Simple test plot
  output$testPlot = renderPlot({
    
    #g=ggplot(pdata) + geom_text(aes(x=X..Satisfied.with.Teaching,y=X..Satisfied.with.Assessment,label=subject,size=Value.added.score.10))
    g=ggplot(pdata()) + geom_text(aes_string(x=input$tblx,y=input$tbly,size=input$tbls, label=lab()  ))
    g=g+labs(title=paste("Guardian University Tables 2014:",ttl()))
    print(g)
  })
  
})

What I hoped to show here was how it’s possible to create a quick visual explorer interface over a dataset using the R shiny library. Many users will be familiar with using wizards to create charts in spreadsheet programmes, but may get stuck when it comes to figuring out how to generate large numbers of charts. As a quick and dirty tool, shiny provides a great environment for knocking up disposable interfaces that provide you with a playground for checking out a wide range of chart data settings from automatically populated list selectors.

With a few more tweaks, we could add in the option to download data by subject or institution, add range selectors to allow us to view only results where a score falls within a particular range, and so on. We can also define new charts and displays (including tabular data displays) to view the data, just slotting them in with very simple UI and server components as used in the original example.

PS this post isn’t necessarily intended to say that we should just be adding to the noise by publishing interactive data explorers that folk don’t how to use to support “datajournalism” stories or research dissemination (the above apps are way to scruffy for the that, and the charts potentially too confusing or cluttered for the uninitiated to make sense of). Rather, I suggest that journalists, researchers etc should feel as if they are in a position to knock up their own data exploration tools as part of the “homework” involved in prepping for a conversation with a data source. The tool building also becomes and extension of the conversation with the data. Complete/complex apps aren’t built in one go. As the example described here shows, it was built up in baby steps, starting with the data grab an initial chart in the previous post, moving on to a simple interactive chart at the start of this post, then starting to evolve into a more complex tool through the addition of additional features.

If the app gets more complex, eg in response to me wanting to be able to ask more refined questions of the data, or take filtered data dumps from it, (for example, for use in other charting applications, such as datawrapper.de), this just represents an evolution of, or increase in depth of, the conversation I am having with the data and the notes I am taking of it.

Written by Tony Hirst

June 21, 2013 at 10:24 am

Posted in Rstats

Tagged with

One Response

Subscribe to comments with RSS.

  1. […] This post build on two earlier posts and may not make much sense if you haven’t been following the story to date. The first post, which tells how to grab the data into R, can be found in Datagrabbing Commonly Formatted Sheets from a Google Spreadsheet – Guardian 2014 University Guide Data ; the second, how to start building a simple interactive viewer for the data using the R shiny library, can be found in Disposable Visual Data Explorers with Shiny – Guardian University Tables 2014. […]


Comments are closed.

Follow

Get every new post delivered to your Inbox.

Join 811 other followers

%d bloggers like this: