This post serves as a brief overview of the difference between bootstrapping and permutation testing with a shiny app to visualize their differences. The example statistic we use here is a correlation; however, these techniques can be extended to a variety of statistics (e.g., t-test, ANOVA, PCA).

Here’s the shiny app used to illustrate these concepts. Click here to see the app in full screen.

```
knitr::include_app("http://18.188.158.248:3838/",
height = "1400px")
```

*Note:* We are now hosting this app on Amazon’s EC2 cloud service. Please email me ([email protected]) if the app is down. Thanks!

When you think of boostrapping, think confidence intervals. Bootstrapping **samples observations with replacement** without breaking the relationship between measures (e.g., X and Y). The number of samples is equal to the number of observations (i.e., sample size). After sampling with replacement is finished, the statistic of interest, such as a correlation, is computed and stored.

When you think of boostrapping, think confidence intervals.

This process explained above is then repeated hundreds or thousands of iterations resulting in a distribution of values for your statistic of interest. This distibution will be centered about the original statistical value that you computed before any resampling occured. In other words, the mean of these stored values will equal your observed statistic.

As with any distribution, you can calculate what are the lower bounds and upper bounds of values for a given percentage. This percentage is determined by the researcher/statistician/hockey analyst enthusiast and is called a confidence interval (95% is a common confidence interval). These bounds are usually reported in square brackets in the format: confidence interval % [lowerbound, upper bound]. For example, “There was a positive correlation observed between X and Y, *r* = .31, 95% CI [.21, .41].”

When you think of permutation testing, think of p-values. Permutation testing **does not sample observations with replacement, but instead breaks the relationship between measures (e.g., X and Y).** This is done by shuffling/randomizing/sampling the observed data points for one variable, while keeping the other (or others) intact. In terms of correlation, this would mean that X would be shuffled within the sample, while Y remained the original values. After the responses for one variable are randomized the statistic of interest, such as a correlation, is computed and stored.

When you think of permutation testing, think of p-values.

This process explained above is then repeated hundreds or thousands of iterations resulting in a distribution of values for your statistic of interest. This distibution **will not be centered about the original statistic value that you computed before any shuffling occured, but rather will be centered around the null.** In terms of correlation, a null distribution would center about *r* = 0; meaning no linear relationship between variables.

In other words, a null distribution is created by shuffling the values in X but not Y. This is because the relationship has been broken between X and Y.

A p-value is calculated by first counting the number of statistical values that are more extreme than your observed statistic. Put another way, how many times did the statistical value that emerged from a “null distribution” surpass your original computed statistic (before any shuffling). Then, you take the number of times that the null distribution is more extreme than your original value and divide it by the number of permutation iterations (number of observations in your null distribution).

For example, let’s say I ran a permutation test on a correlation of *r* = .5 and shuffled X, kept Y, computed their correlation, stored this value, and repeated this 100 times. Out of 100 times, there were 4 correlations that emerged that were greater than .5. Therefore, my p-value for this correlation would be 4/100 = 0.04.

Below is the code that runs the shiny app:

```
# Packages required ----
library(shiny)
library(dplyr)
library(ggplot2)
library(tibble)
library(purrr)
library(broom)
library(MASS)
library(modelr)
library(shinythemes)
library(shinycssloaders)
# Global aesthetics ----
# Main color palette (Flatly bootstrap theme)
flatlyPal <- c('#2C3E50', '#18BC9C', '#94A5A6', '#3498DC', '#F39C13', '#E74C3C')
# loading spinner options
options(spinner.color = flatlyPal[1], spinner.type = 8)
# Plot constant
plotFinish <- theme(plot.title = element_text(hjust = 0.5, size = 15),
text = element_text(size = 15),
plot.caption = element_text(hjust = .5)
)
# Turning off scientific notation for p-values
options(scipen = 999)
# UI ----
# Define UI for application that draws a histogram
ui <- fluidPage(theme = shinytheme("flatly"),
# Title
h2("Bootstrapping and Permutation Testing"),
# Author info
h4("Matthew J. Kmiecik & Ekarin E. Pongpipat"),
# Blogpost info
p("See our ",
a("blog post", href="https://mattkmiecik.com/post-Bootstrapping-and-Permutation-Testing-Shiny-App.html"),
"for more information about this shiny app."
),
# Sidebar with a slider input for number of bins
sidebarLayout(
sidebarPanel(
sliderInput(inputId = "rUser",
label = "Correlation (r): ",
min = -1,
max = 1,
value = .3,
step = .1,
ticks = FALSE),
sliderInput(inputId = "sampleN",
label = "Sample size (N): ",
min = 10,
max = 200,
value = 50,
step = 10,
ticks = FALSE),
sliderInput(inputId = "bootIters",
label = "Bootstrap Iterations: ",
min = 100,
max = 2000,
value = 100,
step = 100,
ticks = FALSE),
sliderInput(inputId = "ciUser",
label = "Bootstrap Confidence Interval: ",
min = .80,
max = .99,
value = .90,
step = .01,
ticks = FALSE),
sliderInput(inputId = "permIters",
label = "Permutation Iterations: ",
min = 100,
max = 2000,
value = 100,
step = 100,
ticks = FALSE),
width = "2"),
# Plots
mainPanel(
flowLayout(
plotOutput("scatterPlot") %>% withSpinner(),
plotOutput("bootPlot") %>% withSpinner(),
plotOutput("permPlot") %>% withSpinner()
)
)
)
)
# Server ----
server <- function(input, output) {
# Defined variables
vars <- 2 # Minimum of 2 variables are required for correlation
mu <- rep(0, vars) # Means of all the vars are 0
# Calculates sigma
re_sigma <- reactive({matrix(input$rUser, nrow = vars, ncol = vars)})
# Establihses sigma = 1
re_sigma2 <- reactive({re_sigma() - diag(vars)*(input$rUser) + diag(vars)})
# Gathers correlations
re_rawvars <- reactive({as.tibble(mvrnorm(n = input$sampleN,
mu = mu,
Sigma = re_sigma2(),
empirical = T
)
) %>%
rename(x = V1, y = V2)}
)
# Bootstrapping procedure ----
re_bootResults <- reactive({re_rawvars() %>%
broom::bootstrap(input$bootIters) %>%
do(tidy(lm(scale(y) ~ scale(x), .))) %>%
filter(term != '(Intercept)')})
# Confidence interval calculations ----
re_ciUserL <- reactive({(1-input$ciUser)/2})
re_ciUserU <- reactive({input$ciUser + re_ciUserL()})
re_ciObs <- reactive({quantile(re_bootResults()$estimate,
c(re_ciUserL(), re_ciUserU()))})
# Permutation testing procedure ----
# From: https://www.rdocumentation.org/packages/modelr/versions/0.1.1/topics/permute
re_perms <- reactive({re_rawvars() %>% permute(input$permIters, y)})
re_models <- reactive({map(re_perms()[["perm"]],
~ lm(scale(y) ~ scale(x), data = .))})
re_tidyd <- reactive({map_df(re_models(), broom::tidy, .id = 'id') %>%
filter(term != '(Intercept)')})
# Calculates p value from permutation testing
re_permPVal <- reactive({
if (input$rUser >= 0) {
(sum(re_tidyd()[["estimate"]] >= input$rUser) + 1)/input$permIters
} else if ( input$rUser < 0 ) {
(sum(re_tidyd()[["estimate"]] <= input$rUser) + 1)/input$permIters
}
}
)
# Scatterplot ----
output$scatterPlot <- renderPlot({
# Caption for Scatterplot
corResP <- round(cor.test(re_rawvars()$x, re_rawvars()$y)$p.value, 3)
corResPFinal <- ifelse(corResP > .001, corResP, .001)
pSign <- ifelse(corResP > .001, '= ', '< ')
scatRes <- paste0('r (', input$sampleN-2, ') = ', input$rUser,
', p ', pSign,corResPFinal)
# Plot
ggplot(re_rawvars(), aes(x, y)) +
geom_smooth(method = 'lm', se = T,
color = flatlyPal[6], fill = flatlyPal[3]) +
geom_point(color = flatlyPal[1], alpha = 2/3) +
labs(x = 'X', y = 'Y',
title = 'Scatterplot', caption = scatRes) +
theme_classic() +
plotFinish
})
# Bootstrap Plot ----
output$bootPlot <- renderPlot({
ggplot(re_bootResults(), aes(estimate)) +
geom_histogram(binwidth = .05, fill = flatlyPal[1]) +
geom_errorbarh(aes(xmin = re_ciObs()[[1]], xmax = re_ciObs()[[2]],
x = input$rUser, y = nrow(re_bootResults())/10),
height = 0, color = flatlyPal[2], size = 1) +
geom_point(aes(x = input$rUser, y = nrow(re_bootResults())/10),
size = 4, color = flatlyPal[2]) +
labs(x = 'Correlation', y = 'Frequency',
title = 'Bootstrapping',
caption = paste0(input$ciUser*100, '% CI [',
round(re_ciObs()[[1]], 2), ', ',
round(re_ciObs()[[2]], 2), ']')
) +
theme_classic() +
plotFinish
})
# Permutation plot ----
output$permPlot <- renderPlot({
ggplot(re_tidyd(), aes(estimate)) +
geom_histogram(binwidth = .05, fill = flatlyPal[1]) +
geom_vline(aes(xintercept = input$rUser),
color = flatlyPal[6], linetype = 2, size = .75) +
labs(x = 'Correlation', y = 'Frequency',
title = 'Permutation Testing',
caption = paste0('r = ', input$rUser, ', p = ', round(re_permPVal(), 5))
) +
theme_classic() +
plotFinish
})
}
# Application ----
shinyApp(ui = ui, server = server)
```

Readers who would like more theoretical or technical explanations of boostrapping and permutation testing are encouraged to read:

Efron, B. (1979). Bootstrap methods: Another look at the jacknife.

*The Annals of Statistics, 7*(1), 1-26.Ludbrook, J., & Dudley, H. (1998). Why Permutation Tests are Superior to t and F Tests in Biomedical Research.

*The American Statistician, 52*(2), 127-132. doi:10.1080/00031305.1998.10480551Wright, D. B., London, K., & Field, A. P. (2018). Using Bootstrap Estimation and the Plug-in Principle for Clinical Psychology Data.

*Journal of Experimental Psychopathology, 2*(2). doi:10.5127/jep.013611