StatisticsAdvancedResource guide

Multiple testing and false discovery rate

An advanced guide explaining why repeated hypothesis testing increases false positives, how family-wise error and false discovery rate differ, and how to report multiple-testing corrections.

Structure

Problem, intuition, method, working, limitations and discussion.

Best for

Students preparing for coursework, analysis, interpretation or revision.

Use with

Learning Hub lessons, tutoring sessions or dissertation planning.

Resource guide

Problem

When students run one statistical test, the chance of a false positive is controlled by the chosen significance level. But when many tests are run, the chance of finding at least one apparently significant result by chance increases. This is a major issue in dissertations with many outcomes, subgroup analyses, omics studies, questionnaire items and exploratory analyses. Multiple testing corrections help control false positive findings, but they are often misunderstood or ignored.

Students run many tests and report only those with p < 0.05.
The risk of false positives increases as the number of tests increases.
Exploratory analyses are presented as confirmatory findings.
Bonferroni correction is used mechanically without explanation.
False discovery rate is misunderstood as the probability a specific result is false.
Adjusted p-values are reported without describing the correction method.
Multiple outcomes and subgroup analyses are not planned in advance.

Resource guide

Intuition

A single p-value threshold is easier to understand when there is one main test. With many tests, random noise has many chances to look significant. Multiple testing methods adjust the interpretation so that the overall error rate is more controlled. Family-wise error rate methods aim to reduce the chance of any false positive. False discovery rate methods aim to control the expected proportion of false discoveries among results called significant.

More tests mean more opportunities for chance findings.
Family-wise error rate focuses on avoiding even one false positive.
Bonferroni correction is simple but can be conservative.
False discovery rate allows some false positives while controlling their expected proportion.
FDR is often useful in high-dimensional settings such as genomics.
Correction choice should match whether the analysis is confirmatory or exploratory.

Resource guide

Method

A good multiple-testing strategy begins with separating primary analyses from secondary or exploratory analyses. The primary outcome should be identified before testing. If many hypotheses are tested, the analyst should decide whether family-wise error rate or false discovery rate control is more appropriate. The correction method should be named, justified and reported clearly.

Step 1: Identify the primary hypothesis or primary outcome.
Step 2: Separate confirmatory analyses from exploratory analyses.
Step 3: Count the family of tests that require correction.
Step 4: Decide whether strong control of any false positive is needed.
Step 5: Consider Bonferroni or related methods for strict control.
Step 6: Consider false discovery rate methods for large exploratory test sets.
Step 7: Report both raw and adjusted p-values where helpful.
Step 8: Interpret corrected results cautiously.
Step 9: Avoid selecting only significant results for reporting.
Step 10: Discuss multiplicity as a limitation when relevant.

Resource guide

Working

Suppose a student tests whether a teaching intervention affects 20 different exam outcomes. If each test uses p < 0.05, some significant findings may occur by chance. A Bonferroni approach would divide the significance level by the number of tests, making the threshold more stringent. In a genomics study with thousands of tests, controlling the false discovery rate is often more practical than trying to avoid every possible false positive.

One test at 5% significance has a controlled type I error for that test.
Twenty tests create many opportunities for chance significance.
Bonferroni threshold for 20 tests is 0.05 / 20 = 0.0025.
Bonferroni reduces false positives but may reduce power.
FDR methods rank p-values and control the expected proportion of false discoveries.
In omics, FDR-adjusted q-values are often reported.
Findings after correction are usually more credible than uncorrected exploratory findings.

Resource guide

Limitations

Multiple-testing correction does not solve all problems. If the hypotheses are poorly chosen, the data are biased or the study design is weak, correction alone cannot make results meaningful. Strict correction can also increase false negatives, especially in small studies. The correction method should be chosen thoughtfully rather than automatically.

Bonferroni can be overly conservative when tests are correlated.
FDR allows some false discoveries by design.
Correction cannot fix bias or confounding.
Correction cannot rescue a poorly planned analysis.
Small studies may lose power after correction.
Defining the family of tests can be subjective.
Exploratory findings may still need replication.

Resource guide

Discussion

A strong report should make clear whether analyses were planned or exploratory. If multiple testing was present, the report should say how it was handled. Students should avoid presenting uncorrected significant findings as definitive when many tests were performed. In exploratory settings, corrected findings can be described as signals requiring confirmation.

State the number or family of tests considered.
Name the correction method used.
Report adjusted p-values where appropriate.
Distinguish primary from exploratory analyses.
Avoid cherry-picking significant results.
Discuss reduced power if strict correction was used.
Recommend replication for exploratory discoveries.

Practical checklist

Before you apply this topic

Have you identified the primary hypothesis?
Have you separated primary and exploratory analyses?
Have you counted the relevant family of tests?
Have you considered whether correction is needed?
Have you chosen an appropriate correction method?
Have you reported the correction method clearly?
Have you avoided cherry-picking significant results?
Have you interpreted adjusted p-values correctly?
Have you discussed false positives and false negatives?
Have you avoided treating exploratory results as confirmatory?
Have you considered replication?
Have you discussed multiplicity as a limitation?

Common mistakes

What to avoid

Running many tests and reporting only significant results.
Ignoring multiple testing entirely.
Using Bonferroni without explaining what was corrected.
Correcting tests that should not be in the same family without thought.
Misinterpreting FDR as the probability that one finding is false.
Calling exploratory findings definitive.
Failing to report non-significant corrected results.
Using multiple testing correction to hide poor analysis planning.
Ignoring loss of power after strict correction.
Not distinguishing raw and adjusted p-values.

How this connects to learning

Use the guide as a bridge between theory and application.

A resource guide should not replace a full course or live teaching session. Instead, it helps you organise your thinking. Use it to identify what you understand, what feels unclear, and what questions you should ask before applying a method to real data.

Before a lesson

Read the intuition and problem sections to prepare.

During analysis

Use the method and checklist to guide decisions.

When writing

Use limitations and discussion to improve interpretation.

Related guides

Continue with related topics.

Understanding p-values, confidence intervals and effect sizes

Common mistakes in dissertation data analysis

Sample size, power and precision explained

Introduction to causal inference and DAGs

RNA-seq and differential expression analysis

Back to all resources Need help applying this?