StatisticsFoundationResource guide

How to choose the correct statistical test

A detailed guide for students deciding between t-tests, ANOVA, chi-square tests, correlation, regression, logistic regression and non-parametric methods.

Structure

Problem, intuition, method, working, limitations and discussion.

Best for

Students preparing for coursework, analysis, interpretation or revision.

Use with

Learning Hub lessons, tutoring sessions or dissertation planning.

Resource guide

Problem

Choosing the correct statistical test is one of the most common difficulties students face in statistics, dissertation analysis and research projects. The mistake usually happens because students start by asking, 'Which test should I use?' before clearly defining the research question, outcome variable, explanatory variable and study design. A statistical test should not be chosen because it is familiar or because software makes it easy. It should be chosen because it matches the question and the structure of the data.

The research question is too vague.
The outcome variable is not clearly identified.
The explanatory variable is confused with the outcome.
The number of groups is ignored.
Independent, paired and repeated observations are mixed up.
A method is chosen before checking assumptions.
The test gives a p-value but does not answer the real research question.

Resource guide

Intuition

A statistical test is a tool for answering a specific type of question. Different questions require different tools. If the question is about comparing average values between groups, we need a method for comparing means. If the question is about association between two categorical variables, we need a method for comparing proportions or frequencies. If the question is about modelling an outcome while adjusting for other variables, we need regression. The intuition is simple: first understand the question, then match the method to the data structure.

A numerical outcome usually leads to methods involving means, differences, correlation or regression.
A binary outcome often leads to proportions, risk comparisons or logistic regression.
A categorical outcome may require chi-square tests or categorical modelling.
A time-to-event outcome may require survival analysis methods.
One predictor gives a simpler analysis; several predictors often require regression.
Paired data require methods that recognise the same person or unit is measured more than once.

Resource guide

Method

A good method-selection process follows a sequence. First, write the research question in plain language. Second, identify the outcome variable. Third, identify the explanatory variable or comparison group. Fourth, understand the study design. Fifth, check whether the data structure and assumptions match the method. This approach prevents random test selection and makes the analysis easier to justify in a report or dissertation.

Step 1: Write the research question clearly.
Step 2: Identify the outcome variable.
Step 3: Classify the outcome as numerical, binary, categorical, ordinal or time-to-event.
Step 4: Identify the explanatory variable, grouping variable or predictor.
Step 5: Decide whether observations are independent, paired, clustered or repeated.
Step 6: Count the number of groups or predictors.
Step 7: Check assumptions such as approximate normality, equal variance, independence and sample size.
Step 8: Choose the method that answers the question most directly.

Resource guide

Working

Suppose a student wants to compare mean exam scores between two independent teaching groups. The outcome is numerical, the grouping variable has two independent categories and the aim is to compare means. An independent samples t-test may be appropriate if assumptions are reasonable. If the same students are measured before and after a teaching intervention, the observations are paired, so a paired t-test is more appropriate. If there are three teaching groups, one-way ANOVA is usually more suitable than running several t-tests.

Numerical outcome plus two independent groups: independent samples t-test.
Numerical outcome plus before-and-after measurements on the same people: paired t-test.
Numerical outcome plus three or more independent groups: one-way ANOVA.
Numerical outcome plus one numerical predictor: correlation or simple linear regression.
Numerical outcome plus several predictors: multiple linear regression.
Categorical outcome plus categorical exposure: chi-square test or Fisher's exact test.
Binary outcome plus several predictors: logistic regression.
Time-to-event outcome: Kaplan-Meier curves, log-rank test or Cox regression.
Non-normal numerical outcome with small samples: consider non-parametric alternatives.

Resource guide

Limitations

Decision rules are helpful, but they are not a substitute for statistical judgement. Real datasets are often messy. They may contain missing values, outliers, small samples, repeated measurements, clustering, confounding or non-linear relationships. A simple test may be mathematically valid but still too limited for the research question. For example, a t-test may compare two groups, but it cannot adjust for age, sex, baseline differences or other covariates. In that case, regression may be more appropriate.

A decision table cannot check assumptions for you.
Small sample sizes can make normality-based methods unreliable.
Outliers can strongly affect means, correlations and regression models.
Repeated measurements need methods that account for within-person dependence.
Confounding may require adjusted regression rather than simple comparison.
A significant p-value does not prove a clinically or practically important effect.
A non-significant p-value does not prove there is no association.

Resource guide

Discussion

The best analysis starts before any software is opened. A student should be able to explain the research question, outcome, comparison, data structure and reason for choosing the method. This is especially important in dissertations and research reports, where the method must be justified. Instead of writing, 'A t-test was used,' a stronger explanation is, 'An independent samples t-test was used because the outcome was numerical and the aim was to compare mean scores between two independent groups.'

Start with the research question, not the software menu.
Explain why the selected method matches the data structure.
Report effect sizes and confidence intervals, not only p-values.
Mention assumptions and whether they were checked.
Discuss limitations honestly.
Use regression when adjustment, prediction or multiple predictors are required.
Use simpler tests only when they genuinely answer the question.

Practical checklist

Before you apply this topic

Can you state the research question in one clear sentence?
Have you identified the outcome variable?
Is the outcome numerical, binary, categorical, ordinal or time-to-event?
Have you identified the explanatory variable or grouping variable?
Are the observations independent, paired, clustered or repeated?
How many groups are being compared?
Are you comparing means, proportions, associations, predictions or time-to-event outcomes?
Do you need adjustment for confounders or covariates?
Are assumptions such as independence, normality and equal variance reasonable?
Can you justify the method in words?
Will you report effect sizes and confidence intervals?
Does the method answer the actual research question?

Common mistakes

What to avoid

Choosing a test before defining the research question.
Choosing a method only because it appears in SPSS, R or Python.
Using several t-tests instead of ANOVA for more than two groups.
Ignoring paired or repeated measurements.
Treating categorical codes as continuous numbers.
Using correlation when regression is needed for prediction or adjustment.
Using a chi-square test when expected cell counts are too small.
Reporting only p-values without effect sizes or confidence intervals.
Ignoring confounding variables.
Changing the analysis plan after searching for significant results.

How this connects to learning

Use the guide as a bridge between theory and application.

A resource guide should not replace a full course or live teaching session. Instead, it helps you organise your thinking. Use it to identify what you understand, what feels unclear, and what questions you should ask before applying a method to real data.

Before a lesson

Read the intuition and problem sections to prepare.

During analysis

Use the method and checklist to guide decisions.

When writing

Use limitations and discussion to improve interpretation.

Related guides

Continue with related topics.

Understanding p-values, confidence intervals and effect sizes

Choosing between correlation and regression

Linear regression assumptions and diagnostics

Logistic regression explained for health and social science students

Non-parametric tests: when and how to use them

How to report regression results in a dissertation

Back to all resources Need help applying this?