This vignette provides a detailed explanation of the statistical
methods implemented in the bbssr
package. We cover the
theoretical foundations of blinded sample size re-estimation (BSSR) and
the five exact statistical tests supported by the package.
Traditional clinical trials use fixed sample sizes determined during the planning phase based on: - Assumed treatment effect size - Expected response rates in each group - Desired power and significance level
However, these assumptions are often inaccurate, leading to: - Underpowered studies when the assumed effect size is too optimistic - Overpowered studies when the assumed effect size is too conservative - Resource inefficiency due to incorrect sample size planning - Ethical concerns about continuing underpowered trials
Blinded Sample Size Re-estimation addresses these issues by:
Let’s define the key parameters:
The bbssr
package implements five exact statistical
tests, each with different characteristics and optimal use cases.
'Chisq'
)The one-sided Pearson chi-squared test uses the test statistic:
\[Z_{ij} = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}\]
where \(\hat{p} = \frac{X_1 + X_2}{n_1 + n_2}\) is the pooled proportion.
P-value formula: \[p\text{-value} = P(Z \geq z_{\text{obs}}) = 1 - \Phi(z_{\text{obs}})\]
where \(\Phi(\cdot)\) is the standard normal cumulative distribution function.
# Example: Chi-squared test
power_chisq <- BinaryPower(
p1 = 0.6, p2 = 0.4,
N1 = 30, N2 = 30,
alpha = 0.025,
Test = 'Chisq'
)
print(paste("Chi-squared test power:", round(power_chisq, 3)))
#> [1] "Chi-squared test power: 0.349"
Characteristics: - Good asymptotic properties for large samples - Computationally efficient - May be anti-conservative for small samples
'Fisher'
)The Fisher exact test conditions on the total number of successes and uses the hypergeometric distribution.
P-value formula: \[p\text{-value} = P(X_1 \geq k | X_1 + X_2 = s) = \sum_{i=k}^{\min(n_1,s)} \frac{\binom{n_1}{i}\binom{n_2}{s-i}}{\binom{n_1+n_2}{s}}\]
where \(k\) is the observed number of successes in group 1, and \(s = X_1 + X_2\) is the total number of successes.
The conditional probability mass function is: \[P(X_1 = i | X_1 + X_2 = s) = \frac{\binom{n_1}{i}\binom{n_2}{s-i}}{\binom{n_1+n_2}{s}}\]
# Example: Fisher exact test
power_fisher <- BinaryPower(
p1 = 0.6, p2 = 0.4,
N1 = 30, N2 = 30,
alpha = 0.025,
Test = 'Fisher'
)
print(paste("Fisher exact test power:", round(power_fisher, 3)))
#> [1] "Fisher exact test power: 0.257"
Characteristics: - Exact Type I error control - Conservative (actual α < nominal α) - Widely accepted by regulatory agencies - Conditional test
'Fisher-midP'
)The Fisher mid-p test reduces the conservatism of the Fisher exact test by including half the probability of the observed outcome.
P-value formula: \[p\text{-value} = P(X_1 > k | X_1 + X_2 = s) + 0.5 \cdot P(X_1 = k | X_1 + X_2 = s)\]
This can be expressed as: \[p\text{-value} = \sum_{i=k+1}^{\min(n_1,s)} \frac{\binom{n_1}{i}\binom{n_2}{s-i}}{\binom{n_1+n_2}{s}} + 0.5 \cdot \frac{\binom{n_1}{k}\binom{n_2}{s-k}}{\binom{n_1+n_2}{s}}\]
# Example: Fisher mid-p test
power_midp <- BinaryPower(
p1 = 0.6, p2 = 0.4,
N1 = 30, N2 = 30,
alpha = 0.025,
Test = 'Fisher-midP'
)
print(paste("Fisher mid-p test power:", round(power_midp, 3)))
#> [1] "Fisher mid-p test power: 0.349"
Characteristics: - Less conservative than Fisher exact - Better power properties - Maintains approximate Type I error control
'Z-pool'
)This test uses the Z-statistic but calculates exact p-values by considering all possible values of the nuisance parameter \(\theta\) (the common success probability under the null hypothesis).
P-value formula: \[p\text{-value} = \max_{\theta \in [0,1]} P_{\theta}(Z \geq z_{\text{obs}})\]
where under the null hypothesis \(H_0: p_1 = p_2 = \theta\): \[P_{\theta}(Z \geq z_{\text{obs}}) = \sum_{(x_1,x_2): z(x_1,x_2) \geq z_{\text{obs}}} \binom{n_1}{x_1}\binom{n_2}{x_2}\theta^{x_1+x_2}(1-\theta)^{n_1+n_2-x_1-x_2}\]
The test statistic is: \[z(x_1,x_2) = \frac{\frac{x_1}{n_1} - \frac{x_2}{n_2}}{\sqrt{\frac{x_1+x_2}{n_1 n_2} \cdot \left(1 - \frac{x_1+x_2}{n_1+n_2}\right)}}\]
# Example: Z-pooled test
power_zpool <- BinaryPower(
p1 = 0.6, p2 = 0.4,
N1 = 30, N2 = 30,
alpha = 0.025,
Test = 'Z-pool'
)
print(paste("Z-pooled test power:", round(power_zpool, 3)))
#> [1] "Z-pooled test power: 0.33"
Characteristics: - Unconditional test - Good balance between power and conservatism - Computationally more intensive than conditional tests
'Boschloo'
)The Boschloo test is the most powerful exact unconditional test. It maximizes the p-value over all possible values of the nuisance parameter, but uses the Fisher exact p-value as the test statistic.
P-value formula: \[p\text{-value} = \max_{\theta \in [0,1]} P_{\theta}(p_{\text{Fisher}}(X_1, X_2) \leq p_{\text{Fisher,obs}})\]
where \(p_{\text{Fisher}}(x_1, x_2)\) is the Fisher exact p-value for the observation \((x_1, x_2)\): \[p_{\text{Fisher}}(x_1, x_2) = P(X_1 \geq x_1 | X_1 + X_2 = x_1 + x_2)\]
Under the null hypothesis \(H_0: p_1 = p_2 = \theta\): \[P_{\theta}(p_{\text{Fisher}}(X_1, X_2) \leq p_{\text{Fisher,obs}}) = \sum_{\substack{(x_1,x_2): \\ p_{\text{Fisher}}(x_1,x_2) \leq p_{\text{Fisher,obs}}}} \binom{n_1}{x_1}\binom{n_2}{x_2}\theta^{x_1+x_2}(1-\theta)^{n_1+n_2-x_1-x_2}\]
# Example: Boschloo test
power_boschloo <- BinaryPower(
p1 = 0.6, p2 = 0.4,
N1 = 30, N2 = 30,
alpha = 0.025,
Test = 'Boschloo'
)
print(paste("Boschloo test power:", round(power_boschloo, 3)))
#> [1] "Boschloo test power: 0.33"
Characteristics: - Most powerful exact unconditional test - Maintains exact Type I error control - Computationally intensive - Optimal choice when computational resources allow
The key insight is that:
# Compare all five tests
tests <- c('Chisq', 'Fisher', 'Fisher-midP', 'Z-pool', 'Boschloo')
powers <- sapply(tests, function(test) {
BinaryPower(p1 = 0.6, p2 = 0.4, N1 = 30, N2 = 30, alpha = 0.025, Test = test)
})
comparison_df <- data.frame(
Test = tests,
Power = round(powers, 4),
Type = c("Asymptotic", "Conditional", "Conditional", "Unconditional", "Unconditional"),
Conservatism = c("Moderate", "High", "Moderate", "Moderate", "Low")
)
print(comparison_df)
#> Test Power Type Conservatism
#> Chisq Chisq 0.3494 Asymptotic Moderate
#> Fisher Fisher 0.2571 Conditional High
#> Fisher-midP Fisher-midP 0.3493 Conditional Moderate
#> Z-pool Z-pool 0.3298 Unconditional Moderate
#> Boschloo Boschloo 0.3298 Unconditional Low
restricted = TRUE
)In the restricted design, the final sample size must be at least the originally planned sample size:
\(N_{\text{final}} \geq N_{\text{planned}}\)
This approach is conservative and ensures that the study duration doesn’t exceed the originally planned timeline.
restricted = FALSE
)The unrestricted design allows both increases and decreases in sample size based on the interim data:
\(N_{\text{final}} = \max(N_{\text{interim}}, N_{\text{recalculated}})\)
This provides maximum flexibility but may extend or shorten the study duration.
weighted = TRUE
)The weighted approach uses a weighted average across all possible interim scenarios:
\(N_{\text{final}} = \max\left(N_{\text{scenario}}, \sum_{scenarios} w_h \cdot N_h\right)\)
where \(w_h\) are weights based on the probability of each interim scenario.
# Detailed BSSR example with different approaches
bssr_results_list <- list()
# Restricted approach
bssr_results_list[["Restricted"]] <- BinaryPowerBSSR(
asmd.p1 = 0.45, asmd.p2 = 0.09,
p = seq(0.1, 0.9, by = 0.1),
Delta.A = 0.36, Delta.T = 0.36,
N1 = 24, N2 = 24, omega = 0.5, r = 1,
alpha = 0.025, tar.power = 0.8,
Test = 'Z-pool',
restricted = TRUE, weighted = FALSE
) %>% mutate(approach = "Restricted")
# Unrestricted approach
bssr_results_list[["Unrestricted"]] <- BinaryPowerBSSR(
asmd.p1 = 0.45, asmd.p2 = 0.09,
p = seq(0.1, 0.9, by = 0.1),
Delta.A = 0.36, Delta.T = 0.36,
N1 = 24, N2 = 24, omega = 0.5, r = 1,
alpha = 0.025, tar.power = 0.8,
Test = 'Z-pool',
restricted = FALSE, weighted = FALSE
) %>% mutate(approach = "Unrestricted")
# Weighted approach
bssr_results_list[["Weighted"]] <- BinaryPowerBSSR(
asmd.p1 = 0.45, asmd.p2 = 0.09,
p = seq(0.1, 0.9, by = 0.1),
Delta.A = 0.36, Delta.T = 0.36,
N1 = 24, N2 = 24, omega = 0.5, r = 1,
alpha = 0.025, tar.power = 0.8,
Test = 'Z-pool',
restricted = FALSE, weighted = TRUE
) %>% mutate(approach = "Weighted")
# Combine results
bssr_results <- do.call(rbind, bssr_results_list)
# Summary statistics
bssr_summary <- bssr_results %>%
group_by(approach) %>%
summarise(
mean_power_bssr = mean(power.BSSR),
mean_power_trad = mean(power.TRAD),
min_power_bssr = min(power.BSSR),
max_power_bssr = max(power.BSSR),
.groups = 'drop'
)
print(bssr_summary)
#> # A tibble: 3 × 5
#> approach mean_power_bssr mean_power_trad min_power_bssr max_power_bssr
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Restricted 0.837 0.791 0.786 0.932
#> 2 Unrestricted 0.805 0.791 0.771 0.873
#> 3 Weighted 0.830 0.791 0.786 0.921
# Create comprehensive power comparison with vertical layout
power_data <- bssr_results %>%
select(approach, p, power.BSSR, power.TRAD) %>%
pivot_longer(
cols = c(power.BSSR, power.TRAD),
names_to = "design_type",
values_to = "power"
) %>%
mutate(
design_type = case_when(
design_type == "power.BSSR" ~ "BSSR",
design_type == "power.TRAD" ~ "Traditional"
),
approach = factor(approach, levels = c("Restricted", "Unrestricted", "Weighted"))
)
ggplot(power_data, aes(x = p, y = power, color = design_type)) +
geom_line(linewidth = 1.2) +
facet_wrap(~approach, ncol = 1, scales = "free_y") + # Vertical layout
geom_hline(yintercept = 0.8, linetype = "dashed", color = "gray") +
scale_color_manual(
values = c("BSSR" = "#1F78B4", "Traditional" = "#E31A1C"),
name = "Design Type"
) +
scale_x_continuous(
breaks = seq(0.2, 0.8, by = 0.2),
labels = c("0.2", "0.4", "0.6", "0.8")
) +
scale_y_continuous(
breaks = seq(0.7, 1.0, by = 0.1),
labels = c("0.7", "0.8", "0.9", "1.0")
) +
labs(
x = "Pooled Response Rate (θ)",
y = "Power",
title = "Power Comparison: Traditional vs BSSR Designs",
subtitle = "Horizontal dashed line shows target power = 0.8"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, hjust = 0.5, margin = margin(b = 5)),
plot.subtitle = element_text(size = 11, hjust = 0.5, margin = margin(b = 15)),
strip.text = element_text(size = 12, face = "bold", margin = margin(t = 8, b = 8)),
strip.background = element_rect(fill = "gray95", color = "gray80"),
legend.position = "bottom",
legend.title = element_text(size = 11),
legend.text = element_text(size = 10),
legend.margin = margin(t = 10),
axis.title.x = element_text(size = 11, margin = margin(t = 10)),
axis.title.y = element_text(size = 11, margin = margin(r = 10)),
axis.text = element_text(size = 9),
panel.grid.minor = element_blank(),
panel.grid.major = element_line(color = "gray92", linewidth = 0.5),
plot.margin = margin(t = 10, r = 10, b = 10, l = 10)
)
Scenario | Recommended Test | Rationale |
---|---|---|
Small samples (n < 30 per group) | Boschloo | Most powerful exact test |
Moderate samples (30-100 per group) | Z-pool | Good balance of power and computation |
Large samples (n > 100 per group) | Chisq | Asymptotically optimal, fast |
Regulatory submission | Fisher | Widely accepted, conservative |
Exploratory analysis | Fisher-midP | Less conservative than Fisher |
Priority | Recommended Approach | Rationale |
---|---|---|
Timeline certainty | Restricted | Guarantees study doesn’t extend |
Statistical efficiency | Unrestricted | Optimal sample size adaptation |
Robust performance | Weighted | Consistent across scenarios |
# Sample size planning example
planning_scenarios <- expand.grid(
p1 = c(0.4, 0.5, 0.6),
p2 = c(0.2, 0.3),
test = c('Fisher', 'Z-pool', 'Boschloo')
) %>%
filter(p1 > p2)
# Calculate sample sizes for each scenario
sample_size_results <- list()
for(i in 1:nrow(planning_scenarios)) {
result <- BinarySampleSize(
p1 = planning_scenarios$p1[i],
p2 = planning_scenarios$p2[i],
r = 1,
alpha = 0.025,
tar.power = 0.8,
Test = planning_scenarios$test[i]
)
sample_size_results[[i]] <- result
}
# Combine results
final_results <- do.call(rbind, sample_size_results)
final_results <- final_results[, c("p1", "p2", "Test", "N1", "N2", "N", "Power")]
print(final_results)
#> p1 p2 Test N1 N2 N Power
#> 1 0.4 0.2 Fisher 90 90 180 0.8016798
#> 2 0.5 0.2 Fisher 44 44 88 0.8020894
#> 3 0.6 0.2 Fisher 27 27 54 0.8024322
#> 4 0.4 0.3 Fisher 375 375 750 0.8010219
#> 5 0.5 0.3 Fisher 102 102 204 0.8061477
#> 6 0.6 0.3 Fisher 48 48 96 0.8004594
#> 7 0.4 0.2 Z-pool 84 84 168 0.8035668
#> 8 0.5 0.2 Z-pool 40 40 80 0.8096513
#> 9 0.6 0.2 Z-pool 23 23 46 0.8088250
#> 10 0.4 0.3 Z-pool 359 359 718 0.8001135
#> 11 0.5 0.3 Z-pool 95 95 190 0.8007528
#> 12 0.6 0.3 Z-pool 44 44 88 0.8010988
#> 13 0.4 0.2 Boschloo 84 84 168 0.8023435
#> 14 0.5 0.2 Boschloo 40 40 80 0.8096508
#> 15 0.6 0.2 Boschloo 23 23 46 0.8088248
#> 16 0.4 0.3 Boschloo 360 360 720 0.8004597
#> 17 0.5 0.3 Boschloo 95 95 190 0.8007528
#> 18 0.6 0.3 Boschloo 44 44 88 0.8010988
All methods in bbssr
maintain exact Type I error
control:
# Demonstrate Type I error control under null hypothesis
null_powers <- sapply(c('Fisher', 'Z-pool', 'Boschloo'), function(test) {
BinaryPower(p1 = 0.3, p2 = 0.3, N1 = 30, N2 = 30, alpha = 0.025, Test = test)
})
names(null_powers) <- c('Fisher', 'Z-pool', 'Boschloo')
print("Type I error rates under null hypothesis:")
#> [1] "Type I error rates under null hypothesis:"
print(round(null_powers, 4))
#> Fisher Z-pool Boschloo
#> 0.0131 0.0208 0.0183
All values should be ≤ 0.025, confirming exact Type I error control.
For regulatory submissions, document: 1. Rationale for BSSR: Why adaptive design is appropriate 2. Test selection: Justification for chosen statistical test 3. Design approach: Restricted vs unrestricted rationale 4. Simulation studies: Demonstrate operating characteristics 5. Implementation plan: Detailed interim analysis procedures
# Compare different allocation ratios
ratios <- c(1, 2, 3)
ratio_results <- sapply(ratios, function(r) {
result <- BinarySampleSize(
p1 = 0.5, p2 = 0.3, r = r,
alpha = 0.025, tar.power = 0.8,
Test = 'Boschloo'
)
c(N1 = result$N1, N2 = result$N2, N_total = result$N)
})
colnames(ratio_results) <- paste0("r=", ratios)
print("Sample sizes for different allocation ratios:")
#> [1] "Sample sizes for different allocation ratios:"
print(ratio_results)
#> r=1 r=2 r=3
#> N1 95 142 189
#> N2 95 71 63
#> N_total 190 213 252
# Sensitivity analysis for key parameters
sensitivity_data <- expand.grid(
omega = c(0.3, 0.5, 0.7),
alpha = c(0.01, 0.025, 0.05)
) %>%
rowwise() %>%
mutate(
avg_power = mean(BinaryPowerBSSR(
asmd.p1 = 0.45, asmd.p2 = 0.09,
p = seq(0.2, 0.8, by = 0.1),
Delta.A = 0.36, Delta.T = 0.36,
N1 = 24, N2 = 24, omega = omega, r = 1,
alpha = alpha, tar.power = 0.8,
Test = 'Z-pool',
restricted = FALSE, weighted = FALSE
)$power.BSSR)
)
print("Sensitivity analysis results:")
#> [1] "Sensitivity analysis results:"
print(sensitivity_data)
#> # A tibble: 9 × 3
#> # Rowwise:
#> omega alpha avg_power
#> <dbl> <dbl> <dbl>
#> 1 0.3 0.01 0.796
#> 2 0.5 0.01 0.798
#> 3 0.7 0.01 0.802
#> 4 0.3 0.025 0.796
#> 5 0.5 0.025 0.805
#> 6 0.7 0.025 0.811
#> 7 0.3 0.05 0.807
#> 8 0.5 0.05 0.814
#> 9 0.7 0.05 0.823
The bbssr
package provides a comprehensive toolkit for
implementing blinded sample size re-estimation in clinical trials with
binary endpoints. The choice of statistical test and design approach
should be based on:
All methods maintain exact statistical validity while providing the flexibility needed for efficient clinical trial conduct.