Bootstrapping and Permutation Testing: A Shiny App

Academic

Shiny

Using Shiny to help understand the difference between bootstrapping and permutation testing.

Author

Matthew J. Kmiecik & Ekarin E. Pongpipat

Published

June 7, 2018

Tip

We have since upgraded this shiny app to a shiny dashboard. Click here to see it!

Purpose

This post serves as a brief overview of the difference between bootstrapping and permutation testing with a shiny app to visualize their differences. The example statistic we use here is a correlation; however, these techniques can be extended to a variety of statistics (e.g., t-test, ANOVA, PCA).

Shiny App

Here’s the shiny app used to illustrate these concepts. Click here to see the app in full screen.

knitr::include_app(
  "https://mattkmiecik.shinyapps.io/boot-perm-app/", height = "1400px"
  )

Bootstrapping

When you think of boostrapping, think confidence intervals. Bootstrapping samples observations with replacement without breaking the relationship between measures (e.g., X and Y). The number of samples is equal to the number of observations (i.e., sample size). After sampling with replacement is finished, the statistic of interest, such as a correlation, is computed and stored.

Tip

When you think of boostrapping, think confidence intervals.

This process explained above is then repeated hundreds or thousands of iterations resulting in a distribution of values for your statistic of interest. This distibution will be centered about the original statistical value that you computed before any resampling occured. In other words, the mean of these stored values will equal your observed statistic.

As with any distribution, you can calculate what are the lower bounds and upper bounds of values for a given percentage. This percentage is determined by the researcher/statistician/hockey analyst enthusiast and is called a confidence interval (95% is a common confidence interval). These bounds are usually reported in square brackets in the format: confidence interval % [lowerbound, upper bound]. For example, “There was a positive correlation observed between X and Y, r = .31, 95% CI [.21, .41].”

Permutation Testing

When you think of permutation testing, think of p-values. Permutation testing does not sample observations with replacement, but instead breaks the relationship between measures (e.g., X and Y). This is done by shuffling/randomizing/sampling the observed data points for one variable, while keeping the other (or others) intact. In terms of correlation, this would mean that X would be shuffled within the sample, while Y remained the original values. After the responses for one variable are randomized the statistic of interest, such as a correlation, is computed and stored.

Tip

When you think of boostrapping, think confidence intervals.

This process explained above is then repeated hundreds or thousands of iterations resulting in a distribution of values for your statistic of interest. This distibution will not be centered about the original statistic value that you computed before any shuffling occured, but rather will be centered around the null. In terms of correlation, a null distribution would center about r = 0; meaning no linear relationship between variables.

In other words, a null distribution is created by shuffling the values in X but not Y. This is because the relationship has been broken between X and Y.

A p-value is calculated by first counting the number of statistical values that are more extreme than your observed statistic. Put another way, how many times did the statistical value that emerged from a “null distribution” surpass your original computed statistic (before any shuffling). Then, you take the number of times that the null distribution is more extreme than your original value and divide it by the number of permutation iterations (number of observations in your null distribution).

For example, let’s say I ran a permutation test on a correlation of r = .5 and shuffled X, kept Y, computed their correlation, stored this value, and repeated this 100 times. Out of 100 times, there were 4 correlations that emerged that were greater than .5. Therefore, my p-value for this correlation would be 4/100 = 0.04.

Shiny App Code

Below is the code that runs the shiny app:

# Packages required ----
library(shiny) 
library(dplyr)
library(ggplot2)
library(tibble)
library(purrr)
library(broom)
library(MASS)
library(modelr)
library(shinythemes)
library(shinycssloaders)

# Global aesthetics ----

# Main color palette (Flatly bootstrap theme)
flatlyPal <- c('#2C3E50', '#18BC9C', '#94A5A6', '#3498DC', '#F39C13', '#E74C3C') 

# loading spinner options
options(spinner.color = flatlyPal[1], spinner.type = 8)

# Plot constant
plotFinish <- theme(plot.title = element_text(hjust = 0.5, size = 15),
                    text = element_text(size = 15),
                    plot.caption = element_text(hjust = .5)
                    )

# Turning off scientific notation for p-values
options(scipen = 999)

# UI ----

# Define UI for application that draws a histogram
ui <- fluidPage(theme = shinytheme("flatly"),
  
  # Title
  h2("Bootstrapping and Permutation Testing"),
  
  # Author info
  h4("Matthew J. Kmiecik & Ekarin E. Pongpipat"),
  
  # Blogpost info
  p("See our ", 
    a("blog post", href="https://mattkmiecik.com/post-Bootstrapping-and-Permutation-Testing-Shiny-App.html"), 
    "for more information about this shiny app."
    ),
  
  # Sidebar with a slider input for number of bins 
  sidebarLayout(
    sidebarPanel(
      sliderInput(inputId = "rUser",
                  label = "Correlation (r): ",
                  min = -1,
                  max = 1,
                  value = .3,
                  step = .1,
                  ticks = FALSE),
      sliderInput(inputId = "sampleN",
                  label = "Sample size (N): ",
                  min = 10,
                  max = 200,
                  value = 50,
                  step = 10,
                  ticks = FALSE),
      sliderInput(inputId = "bootIters",
                  label = "Bootstrap Iterations: ",
                  min = 100,
                  max = 2000,
                  value = 100,
                  step = 100,
                  ticks = FALSE),
      sliderInput(inputId = "ciUser",
                  label = "Bootstrap Confidence Interval: ",
                  min = .80,
                  max = .99,
                  value = .90,
                  step = .01,
                  ticks = FALSE),
      sliderInput(inputId = "permIters",
                  label = "Permutation Iterations: ",
                  min = 100,
                  max = 2000,
                  value = 100,
                  step = 100,
                  ticks = FALSE), 
      width = "2"),
    
    # Plots
    mainPanel(
      flowLayout(
        plotOutput("scatterPlot") %>% withSpinner(),
        plotOutput("bootPlot")  %>% withSpinner(),
        plotOutput("permPlot")  %>% withSpinner()
      )
    )
  )
)

# Server ----
server <- function(input, output) {

  # Defined variables
  vars    <- 2 # Minimum of 2 variables are required for correlation
  mu <- rep(0, vars) # Means of all the vars are 0
  
  # Calculates sigma
  re_sigma <- reactive({matrix(input$rUser, nrow = vars, ncol = vars)})
  
  # Establihses sigma = 1
  re_sigma2 <- reactive({re_sigma() - diag(vars)*(input$rUser) + diag(vars)}) 
  
  # Gathers correlations
  re_rawvars <- reactive({as.tibble(mvrnorm(n = input$sampleN, 
                                            mu = mu, 
                                            Sigma = re_sigma2(), 
                                            empirical = T
                                            )
                                    ) %>%
      rename(x = V1, y = V2)}
      )
  
  # Bootstrapping procedure ----
  re_bootResults <- reactive({re_rawvars() %>%
    broom::bootstrap(input$bootIters) %>%
    do(tidy(lm(scale(y) ~ scale(x), .))) %>%
    filter(term != '(Intercept)')})
   
  # Confidence interval calculations ----
  re_ciUserL <- reactive({(1-input$ciUser)/2})
  re_ciUserU <- reactive({input$ciUser + re_ciUserL()})
  re_ciObs <- reactive({quantile(re_bootResults()$estimate, 
                                 c(re_ciUserL(), re_ciUserU()))})

  # Permutation testing procedure ----
  # From: https://www.rdocumentation.org/packages/modelr/versions/0.1.1/topics/permute
  re_perms <- reactive({re_rawvars() %>% permute(input$permIters, y)})
  re_models <- reactive({map(re_perms()[["perm"]], 
                             ~ lm(scale(y) ~ scale(x), data = .))})
  re_tidyd <- reactive({map_df(re_models(), broom::tidy, .id = 'id') %>% 
      filter(term != '(Intercept)')})
  
  # Calculates p value from permutation testing
  re_permPVal <- reactive({
    if (input$rUser >= 0) {
      (sum(re_tidyd()[["estimate"]] >= input$rUser) + 1)/input$permIters
    } else if ( input$rUser < 0 ) {
      (sum(re_tidyd()[["estimate"]] <= input$rUser) + 1)/input$permIters
    }
  }
  )
    
  # Scatterplot ----
  
   output$scatterPlot <- renderPlot({
     
     # Caption for Scatterplot
     corResP <- round(cor.test(re_rawvars()$x, re_rawvars()$y)$p.value, 3)
     corResPFinal <- ifelse(corResP > .001, corResP, .001)
     pSign <- ifelse(corResP > .001, '= ', '< ')
     scatRes <- paste0('r (', input$sampleN-2, ') = ', input$rUser,
                       ', p ', pSign,corResPFinal)
     
     # Plot
     ggplot(re_rawvars(), aes(x, y)) +
       geom_smooth(method = 'lm', se = T, 
                   color = flatlyPal[6], fill = flatlyPal[3]) +
       geom_point(color = flatlyPal[1], alpha = 2/3) +
       labs(x = 'X', y = 'Y', 
            title = 'Scatterplot', caption = scatRes) +
       theme_classic() +
       plotFinish
     
        })

  # Bootstrap Plot ----
   output$bootPlot <- renderPlot({
     ggplot(re_bootResults(), aes(estimate)) +
       geom_histogram(binwidth = .05, fill = flatlyPal[1]) +
       geom_errorbarh(aes(xmin = re_ciObs()[[1]], xmax = re_ciObs()[[2]], 
                          x = input$rUser, y = nrow(re_bootResults())/10), 
                      height = 0, color = flatlyPal[2], size = 1) +
       geom_point(aes(x = input$rUser, y = nrow(re_bootResults())/10),
                  size = 4, color = flatlyPal[2]) +
       labs(x = 'Correlation', y = 'Frequency', 
            title = 'Bootstrapping', 
            caption = paste0(input$ciUser*100, '% CI [', 
                             round(re_ciObs()[[1]], 2), ', ',
                             round(re_ciObs()[[2]], 2), ']')
            ) +
       theme_classic() +
       plotFinish
   })
   
   # Permutation plot ----
   output$permPlot <- renderPlot({
     ggplot(re_tidyd(), aes(estimate)) +
       geom_histogram(binwidth = .05, fill = flatlyPal[1]) +
       geom_vline(aes(xintercept = input$rUser), 
                  color = flatlyPal[6], linetype = 2, size = .75) +
       labs(x = 'Correlation', y = 'Frequency', 
            title = 'Permutation Testing',
            caption = paste0('r = ', input$rUser, ', p = ', round(re_permPVal(), 5))
            ) +
       theme_classic() +
       plotFinish
     
   })
}

# Application ----
shinyApp(ui = ui, server = server)

References

Readers who would like more theoretical or technical explanations of boostrapping and permutation testing are encouraged to read:

Efron, B. (1979). Bootstrap methods: Another look at the jacknife. The Annals of Statistics, 7(1), 1-26.
Ludbrook, J., & Dudley, H. (1998). Why Permutation Tests are Superior to t and F Tests in Biomedical Research. The American Statistician, 52(2), 127-132. doi:10.1080/00031305.1998.10480551
Wright, D. B., London, K., & Field, A. P. (2018). Using Bootstrap Estimation and the Plug-in Principle for Clinical Psychology Data. Journal of Experimental Psychopathology, 2(2). doi:10.5127/jep.013611