Chapter 5: Power in TOST

In equivalence testing, it is essential to ensure that the sample size is sufficient to reliably conclude equivalence. A well-powered study minimizes the risk of falsely accepting non-equivalence due to insufficient data.

In this chapter, we demonstrate how to calculate the power of a TOST analysis and estimate whether a given sample size is adequate to achieve the commonly accepted threshold of 80% power.

The example is based on simulated potency assay data introduced earlier in Chapter 3.


Power Calculation Using the TOSTER Package

We use the TOSTER package for power estimation, as it supports raw-scale equivalence margins defined as absolute differences, such as ±1.5 SD. This avoids the need for log-transformed margin conversion (e.g., 80–125%) used in bioequivalence testing.

The equivalence decision in this context is based on two criteria:

  • The 90% confidence interval of the mean difference must fall entirely within the equivalence margin.
  • The calculated power must exceed a pre-specified threshold (typically ≥80%).

We begin by simulating data from a reference and a test group:

set.seed(123)
group_ref <- rnorm(30, mean = 100, sd = 2)  # Reference product
group_test <- rnorm(20, mean = 99, sd = 2) # Test product

Custom Power Estimation Function

The function below estimates the power of TOST while retaining key logic from our custom implementation tost_auto_var_equal():

  • Variance equality is checked via Levene’s test.
  • RLD sample size is capped at 1.5× the test group.
  • Pooled or conservative SD is used depending on variance assumption.
library(TOSTER)
library(car)
## Loading required package: carData
library(tibble)

power_tost_from_data <- function(group1, group2,
                                 margin_sd = 1.5,
                                 alpha = 0.05,
                                 var.equal = NULL) {
  n1 <- length(group1)
  n2 <- length(group2)

  mean1 <- mean(group1)
  mean2 <- mean(group2)
  var1 <- var(group1)
  var2 <- var(group2)
  delta <- mean1 - mean2  # raw difference

  # Variance equality check
  if (is.null(var.equal)) {
    df_test <- data.frame(value = c(group1, group2),
                          group = factor(rep(c("g1", "g2"), times = c(n1, n2))))
    p_var <- leveneTest(value ~ group, data = df_test)[1, "Pr(>F)"]
    var_equal <- p_var > 0.05
  } else {
    var_equal <- var.equal
    p_var <- NA
  }

  # Margin in raw units (absolute difference)
  ref_sd <- sd(group2)
  eqb_raw <- margin_sd * ref_sd  # ±1.5 SD (raw scale)

  # Adjust RLD sample size (FDA guidance)
  n2_adj <- min(1.5 * n1, n2)

  # Harmonic mean sample size for power estimation
  n_adj <- 2 / (1 / n1 + 1 / n2_adj)

  # Pooled SD for power calculation
  if (var_equal) {
    pooled_sd <- sqrt(((n1 - 1)*var1 + (n2 - 1)*var2) / (n1 + n2 - 2))
  } else {
    pooled_sd <- sqrt((var1 + var2) / 2)  # conservative
  }

  # Power estimation via TOSTER on raw scale
  power_result <- power_t_TOST(
    n = n_adj,
    delta = delta,
    sd = pooled_sd,
    eqb = eqb_raw,
    alpha = alpha,
    type = "two.sample"
  )

  power_value <- power_result$power  # extract numeric

  tibble(
    mean_diff = round(delta, 4),
    pooled_sd = round(pooled_sd, 4),
    margin_raw = round(eqb_raw, 4),
    n1 = n1,
    n2_original = length(group2),
    n2_adj = round(n2_adj, 1),
    n_adj = round(n_adj, 1),
    var_equal = var_equal,
    variance_test_p = round(p_var, 4),
    variance_assumption = if (var_equal) {
      "Equal variance assumed (Levene’s test p > 0.05)"
    } else {
      "Unequal variance assumed (Levene’s test p <= 0.05, Welch’s df applied)"
    },
    power = round(power_value, 4)
  )
}

We now apply the function to our simulated data:

power_result <- power_tost_from_data(group_test, group_ref)

We can summarize the results using a helper function below:

summarize_power_result <- function(power_tost) {
  cat("=== Power evaluation Summary ===\n")
  cat("Power:", round(power_result$power, 3),  "\n")
  cat("Difference (delta):", round(power_result$mean_diff, 3), "\n")
  cat("Equivalence margin: ±", round(power_result$margin_raw, 3), "\n")
  cat("Variance assumtion:", power_result$variance_assumption, "\n" )
  if (power_result$power >= 0.8) {
  cat("Conclusion: The current sample size provides sufficient power (≥80%) to assess equivalence.\n")
} else {
  cat("Warning: Power is below 80%. Consider increasing the sample size to ensure reliable equivalence testing.\n")
}
}

Output Example

summarize_power_result(power_tost)
## === Power evaluation Summary ===
## Power: 0.996 
## Difference (delta): -0.593 
## Equivalence margin: ± 2.943 
## Variance assumtion: Equal variance assumed (Levene’s test p > 0.05) 
## Conclusion: The current sample size provides sufficient power (≥80%) to assess equivalence.

Conclusion

Based on the observed data:

  • The study shows a mean difference of –0.593, and a 90% confidence interval entirely within the ±1.5 SD margin.
  • The calculated power is 99.6%, well above the 80% threshold.
  • Variance equality was assumed based on Levene’s test.

Conclusion: The current sample size provides sufficient statistical power to assess equivalence with high confidence.