Statistical Methods in bbssr

Introduction

This vignette provides a detailed explanation of the statistical methods implemented in the bbssr package. We cover the theoretical foundations of blinded sample size re-estimation (BSSR) and the five exact statistical tests supported by the package.

library(bbssr)
library(dplyr)
library(ggplot2)
library(tidyr)

Theoretical Foundation of BSSR

The Problem with Fixed Sample Sizes

Traditional clinical trials use fixed sample sizes determined during the planning phase based on: - Assumed treatment effect size - Expected response rates in each group - Desired power and significance level

However, these assumptions are often inaccurate, leading to: - Underpowered studies when the assumed effect size is too optimistic - Overpowered studies when the assumed effect size is too conservative - Resource inefficiency due to incorrect sample size planning - Ethical concerns about continuing underpowered trials

The BSSR Solution

Blinded Sample Size Re-estimation addresses these issues by:

Interim Analysis: Conducting an interim analysis using a fraction (ω) of the planned data
Pooled Estimation: Estimating the pooled response rate without unblinding treatment assignments
Sample Size Adjustment: Recalculating required sample size based on observed pooled data
Controlled Type I Error: Maintaining exact statistical control throughout the process

Mathematical Framework

Let’s define the key parameters:

\(n_1, n_2\): Initial sample sizes for groups 1 and 2
\(X_1, X_2\): Number of responders in groups 1 and 2
\(p_1, p_2\): True response probabilities
\(\hat{p} = \frac{X_1 + X_2}{n_1 + n_2}\): Pooled response rate (observable)
\(\Delta = p_1 - p_2\): Treatment effect (risk difference)
\(\omega\): Fraction of data used for interim analysis

Exact Statistical Tests

The bbssr package implements five exact statistical tests, each with different characteristics and optimal use cases.

1. Pearson Chi-squared Test (`'Chisq'`)

The one-sided Pearson chi-squared test uses the test statistic:

\[Z_{ij} = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}\]

where \(\hat{p} = \frac{X_1 + X_2}{n_1 + n_2}\) is the pooled proportion.

P-value formula: \[p\text{-value} = P(Z \geq z_{\text{obs}}) = 1 - \Phi(z_{\text{obs}})\]

where \(\Phi(\cdot)\) is the standard normal cumulative distribution function.

# Example: Chi-squared test
power_chisq <- BinaryPower(
  p1 = 0.6, p2 = 0.4, 
  N1 = 30, N2 = 30, 
  alpha = 0.025, 
  Test = 'Chisq'
)
print(paste("Chi-squared test power:", round(power_chisq, 3)))
#> [1] "Chi-squared test power: 0.349"

Characteristics: - Good asymptotic properties for large samples - Computationally efficient - May be anti-conservative for small samples

2. Fisher Exact Test (`'Fisher'`)

The Fisher exact test conditions on the total number of successes and uses the hypergeometric distribution.

P-value formula: \[p\text{-value} = P(X_1 \geq k | X_1 + X_2 = s) = \sum_{i=k}^{\min(n_1,s)} \frac{\binom{n_1}{i}\binom{n_2}{s-i}}{\binom{n_1+n_2}{s}}\]

where \(k\) is the observed number of successes in group 1, and \(s = X_1 + X_2\) is the total number of successes.

The conditional probability mass function is: \[P(X_1 = i | X_1 + X_2 = s) = \frac{\binom{n_1}{i}\binom{n_2}{s-i}}{\binom{n_1+n_2}{s}}\]

# Example: Fisher exact test
power_fisher <- BinaryPower(
  p1 = 0.6, p2 = 0.4, 
  N1 = 30, N2 = 30, 
  alpha = 0.025, 
  Test = 'Fisher'
)
print(paste("Fisher exact test power:", round(power_fisher, 3)))
#> [1] "Fisher exact test power: 0.257"

Characteristics: - Exact Type I error control - Conservative (actual α < nominal α) - Widely accepted by regulatory agencies - Conditional test

3. Fisher Mid-p Test (`'Fisher-midP'`)

The Fisher mid-p test reduces the conservatism of the Fisher exact test by including half the probability of the observed outcome.

P-value formula: \[p\text{-value} = P(X_1 > k | X_1 + X_2 = s) + 0.5 \cdot P(X_1 = k | X_1 + X_2 = s)\]

This can be expressed as: \[p\text{-value} = \sum_{i=k+1}^{\min(n_1,s)} \frac{\binom{n_1}{i}\binom{n_2}{s-i}}{\binom{n_1+n_2}{s}} + 0.5 \cdot \frac{\binom{n_1}{k}\binom{n_2}{s-k}}{\binom{n_1+n_2}{s}}\]

# Example: Fisher mid-p test
power_midp <- BinaryPower(
  p1 = 0.6, p2 = 0.4, 
  N1 = 30, N2 = 30, 
  alpha = 0.025, 
  Test = 'Fisher-midP'
)
print(paste("Fisher mid-p test power:", round(power_midp, 3)))
#> [1] "Fisher mid-p test power: 0.349"

Characteristics: - Less conservative than Fisher exact - Better power properties - Maintains approximate Type I error control

4. Z-pooled Exact Unconditional Test (`'Z-pool'`)

This test uses the Z-statistic but calculates exact p-values by considering all possible values of the nuisance parameter \(\theta\) (the common success probability under the null hypothesis).

P-value formula: \[p\text{-value} = \max_{\theta \in [0,1]} P_{\theta}(Z \geq z_{\text{obs}})\]

where under the null hypothesis \(H_0: p_1 = p_2 = \theta\): \[P_{\theta}(Z \geq z_{\text{obs}}) = \sum_{(x_1,x_2): z(x_1,x_2) \geq z_{\text{obs}}} \binom{n_1}{x_1}\binom{n_2}{x_2}\theta^{x_1+x_2}(1-\theta)^{n_1+n_2-x_1-x_2}\]

The test statistic is: \[z(x_1,x_2) = \frac{\frac{x_1}{n_1} - \frac{x_2}{n_2}}{\sqrt{\frac{x_1+x_2}{n_1 n_2} \cdot \left(1 - \frac{x_1+x_2}{n_1+n_2}\right)}}\]

# Example: Z-pooled test
power_zpool <- BinaryPower(
  p1 = 0.6, p2 = 0.4, 
  N1 = 30, N2 = 30, 
  alpha = 0.025, 
  Test = 'Z-pool'
)
print(paste("Z-pooled test power:", round(power_zpool, 3)))
#> [1] "Z-pooled test power: 0.33"

Characteristics: - Unconditional test - Good balance between power and conservatism - Computationally more intensive than conditional tests

5. Boschloo Exact Unconditional Test (`'Boschloo'`)

The Boschloo test is the most powerful exact unconditional test. It maximizes the p-value over all possible values of the nuisance parameter, but uses the Fisher exact p-value as the test statistic.

P-value formula: \[p\text{-value} = \max_{\theta \in [0,1]} P_{\theta}(p_{\text{Fisher}}(X_1, X_2) \leq p_{\text{Fisher,obs}})\]

where \(p_{\text{Fisher}}(x_1, x_2)\) is the Fisher exact p-value for the observation \((x_1, x_2)\): \[p_{\text{Fisher}}(x_1, x_2) = P(X_1 \geq x_1 | X_1 + X_2 = x_1 + x_2)\]

Under the null hypothesis \(H_0: p_1 = p_2 = \theta\): \[P_{\theta}(p_{\text{Fisher}}(X_1, X_2) \leq p_{\text{Fisher,obs}}) = \sum_{\substack{(x_1,x_2): \\ p_{\text{Fisher}}(x_1,x_2) \leq p_{\text{Fisher,obs}}}} \binom{n_1}{x_1}\binom{n_2}{x_2}\theta^{x_1+x_2}(1-\theta)^{n_1+n_2-x_1-x_2}\]

# Example: Boschloo test
power_boschloo <- BinaryPower(
  p1 = 0.6, p2 = 0.4, 
  N1 = 30, N2 = 30, 
  alpha = 0.025, 
  Test = 'Boschloo'
)
print(paste("Boschloo test power:", round(power_boschloo, 3)))
#> [1] "Boschloo test power: 0.33"

Characteristics: - Most powerful exact unconditional test - Maintains exact Type I error control - Computationally intensive - Optimal choice when computational resources allow

Mathematical Relationships

The key insight is that:

Conditional tests (Fisher, Fisher mid-p) condition on the total number of successes
Unconditional tests (Z-pool, Boschloo) maximize over the nuisance parameter \(\theta\)
Boschloo test uses Fisher p-values as the ordering statistic, then maximizes over \(\theta\)
Z-pooled test uses the Z-statistic as the ordering statistic, then maximizes over \(\theta\)

Test Comparison

# Compare all five tests
tests <- c('Chisq', 'Fisher', 'Fisher-midP', 'Z-pool', 'Boschloo')
powers <- sapply(tests, function(test) {
  BinaryPower(p1 = 0.6, p2 = 0.4, N1 = 30, N2 = 30, alpha = 0.025, Test = test)
})

comparison_df <- data.frame(
  Test = tests,
  Power = round(powers, 4),
  Type = c("Asymptotic", "Conditional", "Conditional", "Unconditional", "Unconditional"),
  Conservatism = c("Moderate", "High", "Moderate", "Moderate", "Low")
)

print(comparison_df)
#>                    Test  Power          Type Conservatism
#> Chisq             Chisq 0.3494    Asymptotic     Moderate
#> Fisher           Fisher 0.2571   Conditional         High
#> Fisher-midP Fisher-midP 0.3493   Conditional     Moderate
#> Z-pool           Z-pool 0.3298 Unconditional     Moderate
#> Boschloo       Boschloo 0.3298 Unconditional          Low

BSSR Methodology

Design Approaches

1. Restricted Design (`restricted = TRUE`)

In the restricted design, the final sample size must be at least the originally planned sample size:

\(N_{\text{final}} \geq N_{\text{planned}}\)

This approach is conservative and ensures that the study duration doesn’t exceed the originally planned timeline.

2. Unrestricted Design (`restricted = FALSE`)

The unrestricted design allows both increases and decreases in sample size based on the interim data:

\(N_{\text{final}} = \max(N_{\text{interim}}, N_{\text{recalculated}})\)

This provides maximum flexibility but may extend or shorten the study duration.

3. Weighted Design (`weighted = TRUE`)

The weighted approach uses a weighted average across all possible interim scenarios:

\(N_{\text{final}} = \max\left(N_{\text{scenario}}, \sum_{scenarios} w_h \cdot N_h\right)\)

where \(w_h\) are weights based on the probability of each interim scenario.

BSSR Implementation Example

# Detailed BSSR example with different approaches
bssr_results_list <- list()

# Restricted approach
bssr_results_list[["Restricted"]] <- BinaryPowerBSSR(
  asmd.p1 = 0.45, asmd.p2 = 0.09, 
  p = seq(0.1, 0.9, by = 0.1),
  Delta.A = 0.36, Delta.T = 0.36, 
  N1 = 24, N2 = 24, omega = 0.5, r = 1, 
  alpha = 0.025, tar.power = 0.8, 
  Test = 'Z-pool', 
  restricted = TRUE, weighted = FALSE
) %>% mutate(approach = "Restricted")

# Unrestricted approach
bssr_results_list[["Unrestricted"]] <- BinaryPowerBSSR(
  asmd.p1 = 0.45, asmd.p2 = 0.09, 
  p = seq(0.1, 0.9, by = 0.1),
  Delta.A = 0.36, Delta.T = 0.36, 
  N1 = 24, N2 = 24, omega = 0.5, r = 1, 
  alpha = 0.025, tar.power = 0.8, 
  Test = 'Z-pool', 
  restricted = FALSE, weighted = FALSE
) %>% mutate(approach = "Unrestricted")

# Weighted approach
bssr_results_list[["Weighted"]] <- BinaryPowerBSSR(
  asmd.p1 = 0.45, asmd.p2 = 0.09, 
  p = seq(0.1, 0.9, by = 0.1),
  Delta.A = 0.36, Delta.T = 0.36, 
  N1 = 24, N2 = 24, omega = 0.5, r = 1, 
  alpha = 0.025, tar.power = 0.8, 
  Test = 'Z-pool', 
  restricted = FALSE, weighted = TRUE
) %>% mutate(approach = "Weighted")

# Combine results
bssr_results <- do.call(rbind, bssr_results_list)

# Summary statistics
bssr_summary <- bssr_results %>%
  group_by(approach) %>%
  summarise(
    mean_power_bssr = mean(power.BSSR),
    mean_power_trad = mean(power.TRAD),
    min_power_bssr = min(power.BSSR),
    max_power_bssr = max(power.BSSR),
    .groups = 'drop'
  )

print(bssr_summary)
#> # A tibble: 3 × 5
#>   approach     mean_power_bssr mean_power_trad min_power_bssr max_power_bssr
#>   <chr>                  <dbl>           <dbl>          <dbl>          <dbl>
#> 1 Restricted             0.837           0.791          0.786          0.932
#> 2 Unrestricted           0.805           0.791          0.771          0.873
#> 3 Weighted               0.830           0.791          0.786          0.921

Power Calculations

Traditional vs BSSR Power

# Create comprehensive power comparison with vertical layout
power_data <- bssr_results %>%
  select(approach, p, power.BSSR, power.TRAD) %>%
  pivot_longer(
    cols = c(power.BSSR, power.TRAD),
    names_to = "design_type",
    values_to = "power"
  ) %>%
  mutate(
    design_type = case_when(
      design_type == "power.BSSR" ~ "BSSR",
      design_type == "power.TRAD" ~ "Traditional"
    ),
    approach = factor(approach, levels = c("Restricted", "Unrestricted", "Weighted"))
  )

ggplot(power_data, aes(x = p, y = power, color = design_type)) +
  geom_line(linewidth = 1.2) +
  facet_wrap(~approach, ncol = 1, scales = "free_y") +  # Vertical layout
  geom_hline(yintercept = 0.8, linetype = "dashed", color = "gray") +
  scale_color_manual(
    values = c("BSSR" = "#1F78B4", "Traditional" = "#E31A1C"),
    name = "Design Type"
  ) +
  scale_x_continuous(
    breaks = seq(0.2, 0.8, by = 0.2),
    labels = c("0.2", "0.4", "0.6", "0.8")
  ) +
  scale_y_continuous(
    breaks = seq(0.7, 1.0, by = 0.1),
    labels = c("0.7", "0.8", "0.9", "1.0")
  ) +
  labs(
    x = "Pooled Response Rate (θ)",
    y = "Power",
    title = "Power Comparison: Traditional vs BSSR Designs",
    subtitle = "Horizontal dashed line shows target power = 0.8"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, hjust = 0.5, margin = margin(b = 5)),
    plot.subtitle = element_text(size = 11, hjust = 0.5, margin = margin(b = 15)),
    strip.text = element_text(size = 12, face = "bold", margin = margin(t = 8, b = 8)),
    strip.background = element_rect(fill = "gray95", color = "gray80"),
    legend.position = "bottom",
    legend.title = element_text(size = 11),
    legend.text = element_text(size = 10),
    legend.margin = margin(t = 10),
    axis.title.x = element_text(size = 11, margin = margin(t = 10)),
    axis.title.y = element_text(size = 11, margin = margin(r = 10)),
    axis.text = element_text(size = 9),
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(color = "gray92", linewidth = 0.5),
    plot.margin = margin(t = 10, r = 10, b = 10, l = 10)
  )

Practical Implementation Guidelines

Choosing the Right Test

Scenario	Recommended Test	Rationale
Small samples (n < 30 per group)	Boschloo	Most powerful exact test
Moderate samples (30-100 per group)	Z-pool	Good balance of power and computation
Large samples (n > 100 per group)	Chisq	Asymptotically optimal, fast
Regulatory submission	Fisher	Widely accepted, conservative
Exploratory analysis	Fisher-midP	Less conservative than Fisher

Choosing the BSSR Approach

Priority	Recommended Approach	Rationale
Timeline certainty	Restricted	Guarantees study doesn’t extend
Statistical efficiency	Unrestricted	Optimal sample size adaptation
Robust performance	Weighted	Consistent across scenarios

Sample Size Planning

# Sample size planning example
planning_scenarios <- expand.grid(
  p1 = c(0.4, 0.5, 0.6),
  p2 = c(0.2, 0.3),
  test = c('Fisher', 'Z-pool', 'Boschloo')
) %>%
  filter(p1 > p2)

# Calculate sample sizes for each scenario
sample_size_results <- list()
for(i in 1:nrow(planning_scenarios)) {
  result <- BinarySampleSize(
    p1 = planning_scenarios$p1[i], 
    p2 = planning_scenarios$p2[i], 
    r = 1, 
    alpha = 0.025, 
    tar.power = 0.8, 
    Test = planning_scenarios$test[i]
  )
  sample_size_results[[i]] <- result
}

# Combine results
final_results <- do.call(rbind, sample_size_results)
final_results <- final_results[, c("p1", "p2", "Test", "N1", "N2", "N", "Power")]

print(final_results)
#>     p1  p2     Test  N1  N2   N     Power
#> 1  0.4 0.2   Fisher  90  90 180 0.8016798
#> 2  0.5 0.2   Fisher  44  44  88 0.8020894
#> 3  0.6 0.2   Fisher  27  27  54 0.8024322
#> 4  0.4 0.3   Fisher 375 375 750 0.8010219
#> 5  0.5 0.3   Fisher 102 102 204 0.8061477
#> 6  0.6 0.3   Fisher  48  48  96 0.8004594
#> 7  0.4 0.2   Z-pool  84  84 168 0.8035668
#> 8  0.5 0.2   Z-pool  40  40  80 0.8096513
#> 9  0.6 0.2   Z-pool  23  23  46 0.8088250
#> 10 0.4 0.3   Z-pool 359 359 718 0.8001135
#> 11 0.5 0.3   Z-pool  95  95 190 0.8007528
#> 12 0.6 0.3   Z-pool  44  44  88 0.8010988
#> 13 0.4 0.2 Boschloo  84  84 168 0.8023435
#> 14 0.5 0.2 Boschloo  40  40  80 0.8096508
#> 15 0.6 0.2 Boschloo  23  23  46 0.8088248
#> 16 0.4 0.3 Boschloo 360 360 720 0.8004597
#> 17 0.5 0.3 Boschloo  95  95 190 0.8007528
#> 18 0.6 0.3 Boschloo  44  44  88 0.8010988

Regulatory Considerations

Type I Error Control

All methods in bbssr maintain exact Type I error control:

# Demonstrate Type I error control under null hypothesis
null_powers <- sapply(c('Fisher', 'Z-pool', 'Boschloo'), function(test) {
  BinaryPower(p1 = 0.3, p2 = 0.3, N1 = 30, N2 = 30, alpha = 0.025, Test = test)
})

names(null_powers) <- c('Fisher', 'Z-pool', 'Boschloo')
print("Type I error rates under null hypothesis:")
#> [1] "Type I error rates under null hypothesis:"
print(round(null_powers, 4))
#>   Fisher   Z-pool Boschloo 
#>   0.0131   0.0208   0.0183

All values should be ≤ 0.025, confirming exact Type I error control.

Documentation Requirements

For regulatory submissions, document: 1. Rationale for BSSR: Why adaptive design is appropriate 2. Test selection: Justification for chosen statistical test 3. Design approach: Restricted vs unrestricted rationale 4. Simulation studies: Demonstrate operating characteristics 5. Implementation plan: Detailed interim analysis procedures

Advanced Topics

Multiple Allocation Ratios

# Compare different allocation ratios
ratios <- c(1, 2, 3)
ratio_results <- sapply(ratios, function(r) {
  result <- BinarySampleSize(
    p1 = 0.5, p2 = 0.3, r = r, 
    alpha = 0.025, tar.power = 0.8, 
    Test = 'Boschloo'
  )
  c(N1 = result$N1, N2 = result$N2, N_total = result$N)
})

colnames(ratio_results) <- paste0("r=", ratios)
print("Sample sizes for different allocation ratios:")
#> [1] "Sample sizes for different allocation ratios:"
print(ratio_results)
#>         r=1 r=2 r=3
#> N1       95 142 189
#> N2       95  71  63
#> N_total 190 213 252

Sensitivity Analysis

# Sensitivity analysis for key parameters
sensitivity_data <- expand.grid(
  omega = c(0.3, 0.5, 0.7),
  alpha = c(0.01, 0.025, 0.05)
) %>%
  rowwise() %>%
  mutate(
    avg_power = mean(BinaryPowerBSSR(
      asmd.p1 = 0.45, asmd.p2 = 0.09, 
      p = seq(0.2, 0.8, by = 0.1),
      Delta.A = 0.36, Delta.T = 0.36, 
      N1 = 24, N2 = 24, omega = omega, r = 1, 
      alpha = alpha, tar.power = 0.8, 
      Test = 'Z-pool', 
      restricted = FALSE, weighted = FALSE
    )$power.BSSR)
  )

print("Sensitivity analysis results:")
#> [1] "Sensitivity analysis results:"
print(sensitivity_data)
#> # A tibble: 9 × 3
#> # Rowwise: 
#>   omega alpha avg_power
#>   <dbl> <dbl>     <dbl>
#> 1   0.3 0.01      0.796
#> 2   0.5 0.01      0.798
#> 3   0.7 0.01      0.802
#> 4   0.3 0.025     0.796
#> 5   0.5 0.025     0.805
#> 6   0.7 0.025     0.811
#> 7   0.3 0.05      0.807
#> 8   0.5 0.05      0.814
#> 9   0.7 0.05      0.823

Conclusion

The bbssr package provides a comprehensive toolkit for implementing blinded sample size re-estimation in clinical trials with binary endpoints. The choice of statistical test and design approach should be based on:

Sample size considerations
Regulatory requirements
Computational constraints
Risk tolerance

All methods maintain exact statistical validity while providing the flexibility needed for efficient clinical trial conduct.

Statistical Methods in bbssr

Gosuke Homma

2025-06-18

Introduction

Theoretical Foundation of BSSR

The Problem with Fixed Sample Sizes

The BSSR Solution

Mathematical Framework

Exact Statistical Tests

1. Pearson Chi-squared Test (`'Chisq'`)

2. Fisher Exact Test (`'Fisher'`)

3. Fisher Mid-p Test (`'Fisher-midP'`)

4. Z-pooled Exact Unconditional Test (`'Z-pool'`)

5. Boschloo Exact Unconditional Test (`'Boschloo'`)

Mathematical Relationships

Test Comparison

BSSR Methodology

Design Approaches

1. Restricted Design (`restricted = TRUE`)

2. Unrestricted Design (`restricted = FALSE`)

3. Weighted Design (`weighted = TRUE`)

BSSR Implementation Example

Power Calculations

Traditional vs BSSR Power

Practical Implementation Guidelines

Choosing the Right Test

Choosing the BSSR Approach

Sample Size Planning

Regulatory Considerations

Type I Error Control

Documentation Requirements

Advanced Topics

Multiple Allocation Ratios

Sensitivity Analysis

Conclusion

Statistical Methods in bbssr

Gosuke Homma

2025-06-18

Introduction

Theoretical Foundation of BSSR

The Problem with Fixed Sample Sizes

The BSSR Solution

Mathematical Framework

Exact Statistical Tests

1. Pearson Chi-squared Test ('Chisq')

2. Fisher Exact Test ('Fisher')

3. Fisher Mid-p Test ('Fisher-midP')

4. Z-pooled Exact Unconditional Test ('Z-pool')

5. Boschloo Exact Unconditional Test ('Boschloo')

Mathematical Relationships

Test Comparison

BSSR Methodology

Design Approaches

1. Restricted Design (restricted = TRUE)

2. Unrestricted Design (restricted = FALSE)

3. Weighted Design (weighted = TRUE)

BSSR Implementation Example

Power Calculations

Traditional vs BSSR Power

Practical Implementation Guidelines

Choosing the Right Test

Choosing the BSSR Approach

Sample Size Planning

Regulatory Considerations

Type I Error Control

Documentation Requirements

Advanced Topics

Multiple Allocation Ratios

Sensitivity Analysis

Conclusion

1. Pearson Chi-squared Test (`'Chisq'`)

2. Fisher Exact Test (`'Fisher'`)

3. Fisher Mid-p Test (`'Fisher-midP'`)

4. Z-pooled Exact Unconditional Test (`'Z-pool'`)

5. Boschloo Exact Unconditional Test (`'Boschloo'`)

1. Restricted Design (`restricted = TRUE`)

2. Unrestricted Design (`restricted = FALSE`)

3. Weighted Design (`weighted = TRUE`)