Chapter 5: Power in TOST

In equivalence testing, it is essential to ensure that the sample size is sufficient to reliably conclude equivalence. A well-powered study minimizes the risk of falsely accepting non-equivalence due to insufficient data.

In this chapter, we demonstrate how to calculate the power of a TOST analysis and estimate whether a given sample size is adequate to achieve the commonly accepted threshold of 80% power.

The example is based on simulated potency assay data introduced earlier in Chapter 3.

Power Calculation Using the `TOSTER` Package

We use the TOSTER package for power estimation, as it supports raw-scale equivalence margins defined as absolute differences, such as ±1.5 SD. This avoids the need for log-transformed margin conversion (e.g., 80–125%) used in bioequivalence testing.

The equivalence decision in this context is based on two criteria:

The 90% confidence interval of the mean difference must fall entirely within the equivalence margin.
The calculated power must exceed a pre-specified threshold (typically ≥80%).

We begin by simulating data from a reference and a test group:

set.seed(123)
group_ref <- rnorm(30, mean = 100, sd = 2)  # Reference product
group_test <- rnorm(20, mean = 99, sd = 2) # Test product

Custom Power Estimation Function

The function below estimates the power of TOST while retaining key logic from our custom implementation tost_auto_var_equal():

Variance equality is checked via Levene’s test.
RLD sample size is capped at 1.5× the test group.
Pooled or conservative SD is used depending on variance assumption.

library(TOSTER)
library(car)

## Loading required package: carData

library(tibble)

power_tost_from_data <- function(group1, group2,
                                 margin_sd = 1.5,
                                 alpha = 0.05,
                                 var.equal = NULL) {
  n1 <- length(group1)
  n2 <- length(group2)

  mean1 <- mean(group1)
  mean2 <- mean(group2)
  var1 <- var(group1)
  var2 <- var(group2)
  delta <- mean1 - mean2  # raw difference

  # Variance equality check
  if (is.null(var.equal)) {
    df_test <- data.frame(value = c(group1, group2),
                          group = factor(rep(c("g1", "g2"), times = c(n1, n2))))
    p_var <- leveneTest(value ~ group, data = df_test)[1, "Pr(>F)"]
    var_equal <- p_var > 0.05
  } else {
    var_equal <- var.equal
    p_var <- NA
  }

  # Margin in raw units (absolute difference)
  ref_sd <- sd(group2)
  eqb_raw <- margin_sd * ref_sd  # ±1.5 SD (raw scale)

  # Adjust RLD sample size (FDA guidance)
  n2_adj <- min(1.5 * n1, n2)

  # Harmonic mean sample size for power estimation
  n_adj <- 2 / (1 / n1 + 1 / n2_adj)

  # Pooled SD for power calculation
  if (var_equal) {
    pooled_sd <- sqrt(((n1 - 1)*var1 + (n2 - 1)*var2) / (n1 + n2 - 2))
  } else {
    pooled_sd <- sqrt((var1 + var2) / 2)  # conservative
  }

  # Power estimation via TOSTER on raw scale
  power_result <- power_t_TOST(
    n = n_adj,
    delta = delta,
    sd = pooled_sd,
    eqb = eqb_raw,
    alpha = alpha,
    type = "two.sample"
  )

  power_value <- power_result$power  # extract numeric

  tibble(
    mean_diff = round(delta, 4),
    pooled_sd = round(pooled_sd, 4),
    margin_raw = round(eqb_raw, 4),
    n1 = n1,
    n2_original = length(group2),
    n2_adj = round(n2_adj, 1),
    n_adj = round(n_adj, 1),
    var_equal = var_equal,
    variance_test_p = round(p_var, 4),
    variance_assumption = if (var_equal) {
      "Equal variance assumed (Levene’s test p > 0.05)"
    } else {
      "Unequal variance assumed (Levene’s test p <= 0.05, Welch’s df applied)"
    },
    power = round(power_value, 4)
  )
}

We now apply the function to our simulated data:

power_result <- power_tost_from_data(group_test, group_ref)

We can summarize the results using a helper function below:

summarize_power_result <- function(power_tost) {
  cat("=== Power evaluation Summary ===\n")
  cat("Power:", round(power_result$power, 3),  "\n")
  cat("Difference (delta):", round(power_result$mean_diff, 3), "\n")
  cat("Equivalence margin: ±", round(power_result$margin_raw, 3), "\n")
  cat("Variance assumtion:", power_result$variance_assumption, "\n" )
  if (power_result$power >= 0.8) {
  cat("Conclusion: The current sample size provides sufficient power (≥80%) to assess equivalence.\n")
} else {
  cat("Warning: Power is below 80%. Consider increasing the sample size to ensure reliable equivalence testing.\n")
}
}

Output Example

summarize_power_result(power_tost)

## === Power evaluation Summary ===
## Power: 0.996 
## Difference (delta): -0.593 
## Equivalence margin: ± 2.943 
## Variance assumtion: Equal variance assumed (Levene’s test p > 0.05) 
## Conclusion: The current sample size provides sufficient power (≥80%) to assess equivalence.

Conclusion

Based on the observed data:

The study shows a mean difference of –0.593, and a 90% confidence interval entirely within the ±1.5 SD margin.
The calculated power is 99.6%, well above the 80% threshold.
Variance equality was assumed based on Levene’s test.

Conclusion: The current sample size provides sufficient statistical power to assess equivalence with high confidence.

Chapter 5: Power in TOST

Power Calculation Using the TOSTER Package

Custom Power Estimation Function

Output Example

Conclusion

Power Calculation Using the `TOSTER` Package