OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Posts Tagged ‘Shiny

Disposable Visual Data Explorers with Shiny – Guardian University Tables 2014

Have data – now what? Building your own interactive data explorer need not be a chore with the R shiny library… Here’s a quick walkthrough…

In Datagrabbing Commonly Formatted Sheets from a Google Spreadsheet – Guardian 2014 University Guide Data, I showed how to grab some data from several dozen commonly formatted sheets in a Google spreadsheet, and combine them to produce a single monolithic data set. The data relates to UK universities and provides several quality/satisfaction scores for each of the major subject areas they offer courses in.

We could upload this data to something like Many Eyes in order to generate visualisations over it, or we could create a visual data explorer app of our own. It needn’t take too long, either…

Here’s an example, the Simple Guardian University Rankings 2014 Explorer, that lets you select a university and then generate a scatterplot that shows how different quality/ranking scores vary for that university by subject area:

Crude data explorer - guardian 2014 uni stats

The explorer allows you to select a university and then generate a scatterplot based around selected quality scores. The label size is also set relative to a selected quality score.

The application is built up from three files. A generic file, that we use to load the source data in (in this example I pull it form a file, though we could bring it in live from the Google spreadsheet).

##global.R
load("guardian2014unidata.Rda")
#In this case, the data is loaded into the dataframe: gdata
#Once it's loaded, we tidy it (I should have tidied the saved data really!)
gdata[, 4:11] <- sapply(gdata[, 4:11], as.numeric)
gdata$Name.of.Institution=as.factor(gdata$Name.of.Institution)
gdata$subject=as.factor(gdata$subject)

A “server” file that takes input from the user interface elements on the left of the app and generates the displayed chart:

##server.R
library(shiny)
library(ggplot2)
 
# Define server logic
shinyServer(function(input, output) {
  
  #Simple test plot
  output$testPlot = renderPlot( {
    pdata=subset(gdata, Name.of.Institution==input$tbl)
    #g=ggplot(pdata) + geom_text(aes(x=X..Satisfied.with.Teaching,y=X..Satisfied.with.Assessment,label=subject,size=Value.added.score.10))
    g=ggplot(pdata) + geom_text(aes_string(x=input$tblx,y=input$tbly,size=input$tbls, label='subject'))
    g=g+labs(title=paste("Guardian University Tables 2014:",input$tbl))
    print(g)
  })
  
})

A user interface definition file.

##ui.R
library(shiny)
 
#Generate a list containing the names of the institutions
uList=levels(gdata$Name.of.Institution)
names(uList) = uList
 
#Generate a list containing the names of the quality/ranking score columns by column name
cList=colnames(gdata[c(1,3:11)])
names(cList) = cList
 
# Define UI for application that plots random distributions 
shinyUI(pageWithSidebar(
  
  # Application title
  headerPanel("Guardian 2014 University Tables Explorer"),
  
  sidebarPanel(
    #Which table do you want to view, based on the list of institution names?
    selectInput("tbl", "Institution:",uList),

    #Also let the user select the x, y and label size, based on quality/ranking columns
    selectInput("tblx", "x axis:",cList,selected = 'X..Satisfied.with.Teaching'),
    selectInput("tbly", "y axis:",cList,selected='X..Satisfied.with.Assessment'),
    selectInput("tbls", "Label size:",cList,selected = 'Value.added.score.10'),
    
    div("This demo provides a crude graphical view over data extracted from",
        a(href='http://www.guardian.co.uk/news/datablog/2013/jun/04/university-guide-2014-table-rankings',
          "Guardian Datablog: University guide 2014 data tables") )
    
  ),
  
  #The main panel is where the "results" charts are plotted
  mainPanel(
    plotOutput("testPlot")#,
    #tableOutput("view")  
  )
))

And that’s it… If we pop these files into a single gist – such as the one at https://gist.github.com/psychemedia/5824495, which also includes code for grabbing the data from the Google spreadsheet – we can run application from the RStudio command line as follows:

library(shiny)
runGist('5824495')

(Hit “escape” to stop the script running.)

With a minor tweak, we can get a list of unique subjects, rather than institutions and allow the user to compare courses across institution by subject, rather than across subject areas within an institution.

We can then combine the two approaches into a single interface, Guardian 2014 university table explorer v2 – whilst not ideal (we should really grey out the inactive selector – institution or subject area according to which one hasn’t been selected via the radio button).

guardian uni 2014 explorer 2

The global.R file is the same, although we need to tweak the ui.R and server.R files.

To the UI file, we add a radio button selector and an additional menu (for subjects):

##ui.R
library(shiny)

uList=levels(gdata$Name.of.Institution)
names(uList) = uList

#Pull out the list of subjects
sList=levels(gdata$subject)
names(sList) = sList

cList=colnames(gdata[c(1,3:11)])
names(cList) = cList

# Define UI for application that plots random distributions 
shinyUI(pageWithSidebar(
  
  # Application title
  headerPanel("Guardian 2014 University Tables Explorer v.2"),
  
  sidebarPanel(
    #Add in a radio button selector
    radioButtons("typ", "View by:",
                 list("Institution" = "inst",
                      "Subject" = "subj")),
    
    #Just a single selector here - which table do you want to view?
    selectInput("tbli", "Institution:",uList),
    #Add a selector for the subject list
    selectInput("tblb", "Subject:",sList),
    selectInput("tblx", "x axis:",cList,selected = 'X..Satisfied.with.Teaching'),
    selectInput("tbly", "y axis:",cList,selected='X..Satisfied.with.Assessment'),
    selectInput("tbls", "Label size:",cList,selected = 'Value.added.score.10'),
    
    div("This demo provides a crude graphical view over data extracted from",
        a(href='http://www.guardian.co.uk/news/datablog/2013/jun/04/university-guide-2014-table-rankings',
          "Guardian Datablog: University guide 2014 data tables") )
    
  ),
  
  #The main panel is where the "results" charts are plotted
  mainPanel(
    plotOutput("testPlot")#,
    #tableOutput("view")
  )
))

To the server file, we add a level of indirection, setting local state variables based on the UI selectors, and then using the value of these variables within the chart generator code itself.

##server.R
library(shiny)
library(ggplot2)

# Define server logic
shinyServer(function(input, output) {

  #We introduce a level of indirection, creating routines that set state within the scope of the server based on UI actions
  #If the radio button state changes, reset the data filter
  pdata <- reactive({
    switch(input$typ,
           inst=subset(gdata, Name.of.Institution==input$tbli),
           subj=subset(gdata, subject==input$tblb)
    )
  })
  
  #Make sure we use the right sort of label (institution or subject) in the title
  ttl <- reactive({
    switch(input$typ,
           inst=input$tbli,
           subj=input$tblb
    )
  })
 
  #Make sure we display the right sort of label (by institution or by subject) in the chart
  lab <- reactive({
    switch(input$typ,
           inst='subject',
           subj='Name.of.Institution'
           )
  })
  
  #Simple test plot
  output$testPlot = renderPlot({
    
    #g=ggplot(pdata) + geom_text(aes(x=X..Satisfied.with.Teaching,y=X..Satisfied.with.Assessment,label=subject,size=Value.added.score.10))
    g=ggplot(pdata()) + geom_text(aes_string(x=input$tblx,y=input$tbly,size=input$tbls, label=lab()  ))
    g=g+labs(title=paste("Guardian University Tables 2014:",ttl()))
    print(g)
  })
  
})

What I hoped to show here was how it’s possible to create a quick visual explorer interface over a dataset using the R shiny library. Many users will be familiar with using wizards to create charts in spreadsheet programmes, but may get stuck when it comes to figuring out how to generate large numbers of charts. As a quick and dirty tool, shiny provides a great environment for knocking up disposable interfaces that provide you with a playground for checking out a wide range of chart data settings from automatically populated list selectors.

With a few more tweaks, we could add in the option to download data by subject or institution, add range selectors to allow us to view only results where a score falls within a particular range, and so on. We can also define new charts and displays (including tabular data displays) to view the data, just slotting them in with very simple UI and server components as used in the original example.

PS this post isn’t necessarily intended to say that we should just be adding to the noise by publishing interactive data explorers that folk don’t how to use to support “datajournalism” stories or research dissemination (the above apps are way to scruffy for the that, and the charts potentially too confusing or cluttered for the uninitiated to make sense of). Rather, I suggest that journalists, researchers etc should feel as if they are in a position to knock up their own data exploration tools as part of the “homework” involved in prepping for a conversation with a data source. The tool building also becomes and extension of the conversation with the data. Complete/complex apps aren’t built in one go. As the example described here shows, it was built up in baby steps, starting with the data grab an initial chart in the previous post, moving on to a simple interactive chart at the start of this post, then starting to evolve into a more complex tool through the addition of additional features.

If the app gets more complex, eg in response to me wanting to be able to ask more refined questions of the data, or take filtered data dumps from it, (for example, for use in other charting applications, such as datawrapper.de), this just represents an evolution of, or increase in depth of, the conversation I am having with the data and the notes I am taking of it.

Written by Tony Hirst

June 21, 2013 at 10:24 am

Posted in Rstats

Tagged with

More Shiny Goodness – Tinkering With the Ergast Motor Racing Data API

I had a bit of a play with Shiny over the weekend, using the Ergast Motor Racing Data API and the magical Shiny library for R, that makes building interactive, browser based applications around R a breeze.

As this is just a quick heads-up/review post, I’ll largely limit myself to a few screenshots. When I get a chance, I’ll try to do a bit more of a write-up, though this may actually just take the form of more elaborate documentation of the app, both within the code and in the form of explanatory text in the app itself.

If you want to try ou the app, you can find an instance here: F1 2012 Laptime Explorer. The code is also available.

Here’s the initial view – the frist race of the season is selected as a default and data loaded in. The driver list is for all drivers represented during the season.

f1 2012 shiny ergast explorer

THe driver selectors allow us to just display traces for selected drivers.

The Race History chart is a classic results chart. It show the difference between the race time to date for each driver, by lap, compared to the average lap time for the winner times the lap number. (As such, this is an offline statistic – it is calculated when the winner’s overall average laptime is known).

race hisotry - classic chart

Variants of the classic Race History chart are possible, for example, using different base line times, but I haven’t implemented any of them – or the necessary UI controls. Yet…

The Lap Chart is another classic:

Lap chart - another classic

Annotations for this chart are also supported, describing all drivers who final status was not “Finished”.

lap chart with annotations

The Lap Evolution chart shows how each driver’s laptime evolved over the course of the race compared with the fastest overall recorded laptime.

Lap evolution

The Personal Lap Evolution chart shows how each driver’s laptime evolved over the course of the race compared with their personal fastest laptime.

Personal lap evolution

The Personal Deltas Chart shows the difference between one laptime and the next for each driver.

Personal deltas

The Race Summary Chart is a chart of my own design that tries to capture notable features relating to race position – the grid position (blue circle), final classification (red circle), position at the end of the first lap (the + or horizontal bar). The violin plot shows the distribution of how many laps the driver spent in each race position. Where the chart is wide, the driver spent a large number of laps in that position.

race summary

The x-axis ordering pulls out different features about how the race progressed. I need to add in a control that lets the user select different orderings.

Finally, the Fast Lap text scatterplot shows the fastest laptime for each driver and the lap at which they recorded it.

fastlaps

So – that’s a quick review of the app. All in all it took maybe 3 hours getting my head round the data parsing, 2-3 hours figuring what I wanted to do and learning how to do it in Shiny, and a couple of hours doing it/starting to document/annotate it. Next time, it’ll be much quicker…

Written by Tony Hirst

December 4, 2012 at 2:14 pm

Posted in Rstats, Tinkering

Tagged with , , ,

Quick Shiny Demo – Exploring NHS Winter Sit Rep Data

Having spent a chink of the weekend and a piece of yesterday trying to pull NHS Winter sitrep data into some sort of shape in Scraperwiki, (described, in part, here: When Machine Readable Data Still Causes “Issues” – Wrangling Dates…), I couldn’t but help myself last night and had a quick go at using RStudio’s Shiny tooling to put together a quick, minimal explorer for it:

For proof of concept, I just pulled in data relating to the Isle of Wight NHS Trust, but it should be possible to build a more generic explorer: Isle of Wight NHS Sit Rep Explorer Demo.

Three files are used to crate the app – a script to define the user interface (ui.R), a script to define the server that responds to UI actions and displays the charts (server.R), and a supporting file that creates variables and functions that are globally available to bother the server and UI scripts (global.R).

##wightsitrep2/global.R

#Loading in CSV directly from https seems to cause problems but this workaround seems okay
floader=function(fn){
  temporaryFile <- tempfile()
  download.file(fn,destfile=temporaryFile, method="curl")
  read.csv(temporaryFile)
}

#This is the data source - a scraperwiki API call
#It would make sense to abstract this further, eg allowing the creation of the URL based around a passed in a select statement
u="https://api.scraperwiki.com/api/1.0/datastore/sqlite?format=csv&name=nhs_sit_reps&query=select%20SHA%2CName%2C%20fromDateStr%2CtoDateStr%2C%20tableName%2CfacetB%2Cvalue%20from%20fulltable%20%20where%20Name%20like%20'%25WIGH%25'"

#Load the data and do a bit typecasting, just in case...
d=floader(u)
d$fdate=as.Date(d$fromDateStr)
d$tdate=as.Date(d$toDateStr)
d$val=as.integer(d$value)
##wightsitrep2/ui.R

library(shiny)

tList=levels(d$tableName)
names(tList) = tList

# Define UI for application that plots random distributions 
shinyUI(pageWithSidebar(
  
  
  # Application title
  headerPanel("IW NHS Trust Sit Rep Explorer"),
  
  sidebarPanel(
    #Just a single selector here - which table do you want to view?
    selectInput("tbl", "Report:",tList),
    
    div("This demo provides a crude graphical view over data extracted from",
        a(href='http://transparency.dh.gov.uk/2012/10/26/winter-pressures-daily-situation-reports-2012-13/',
          "NHS Winter pressures daily situation reports"),
        "relating to the Isle of Wight NHS Trust."),
    div("The data is pulled in from a scraped version of the data stored on Scraperwiki",
        a(href="https://scraperwiki.com/scrapers/nhs_sit_reps/","NHS Sit Reps"),".")
    
 ),
  
  #The main panel is where the "results" charts are plotted
  mainPanel(
    plotOutput("testPlot"),
    tableOutput("view")
    
  )
))
##wightsitrep2/server.R

library(shiny)
library(ggplot2)

# Define server logic
shinyServer(function(input, output) {
  
  #Do a simple barchart of data in the selected table.
  #Where there are "subtables", display these using the faceted view
  output$testPlot = reactivePlot(function() {
    g=ggplot(subset(d,fdate>as.Date('2012-11-01') & tableName==input$tbl))
    g=g+geom_bar(aes(x=fdate,y=val),stat='identity')+facet_wrap(~tableName+facetB)
    g=g+theme(axis.text.x=element_text(angle=-90),legend.position="none")+labs(title="Isle of Wight NHS Trust")
    #g=g+scale_y_discrete(breaks=0:10)
    print(g)
  })
  
  #It would probable make sense to reshape the data presented in this table
  #For example, define columns based on facetB values, so we have one row per date range
  #I also need to sort the table by date
  output$view = reactiveTable(function() {
    head(subset(d,tableName==input$tbl,select=c('Name','fromDateStr','toDateStr','tableName','facetB','value')),n=100)
  })
  
})

I get the feeling that it shouldn’t be too hard to create quite complex Shiny apps relatively quickly, pulling on things like Scraperwiki as a remote data source. One thing I haven’t tried is to use googleVis components, which would support in the first instance at least a sortable table view… Hmmm…

PS for an extended version of this app, see NHS Winter Situation Reports Shiny Viewer v2

Written by Tony Hirst

November 28, 2012 at 10:32 am

Posted in Data, Infoskills, Rstats

Tagged with ,

Follow

Get every new post delivered to your Inbox.

Join 784 other followers