StatisticsFoundationResource guide

Chi-square tests, Fisher's exact test and categorical data

A detailed guide to analysing categorical data, including contingency tables, chi-square tests, Fisher's exact test, expected counts, proportions and interpretation.

Structure

Problem, intuition, method, working, limitations and discussion.

Best for

Students preparing for coursework, analysis, interpretation or revision.

Use with

Learning Hub lessons, tutoring sessions or dissertation planning.

Resource guide

Problem

Categorical data are common in student projects, health research and social science. Examples include sex, treatment group, disease status, smoking status, pass or fail, and response categories. Students often analyse categorical data incorrectly by treating category codes as numerical values or by applying tests without checking cell counts. Chi-square tests and Fisher's exact test are important tools, but they must be used with the right data structure.

Category codes are treated as real numbers.
Percentages are reported without denominators.
Chi-square tests are used when expected counts are too small.
Fisher's exact test is used without understanding why.
Students confuse row percentages and column percentages.
Association is interpreted as causation.
Effect size measures such as risk difference or odds ratio are ignored.

Resource guide

Intuition

Categorical data analysis often begins with a contingency table. The table shows how observations are distributed across combinations of categories. The chi-square test asks whether the observed counts differ from what we would expect if the variables were independent. Fisher's exact test is useful when sample sizes are small and the usual chi-square approximation may not be reliable.

Rows and columns represent categories.
Cells contain counts, not means.
Percentages help interpretation but depend on the denominator.
Expected counts describe what would be expected under independence.
Chi-square tests compare observed and expected counts.
Fisher's exact test is useful for small samples or sparse tables.

Resource guide

Method

The analysis should start with a clear table. Students should decide which variable defines rows, which defines columns and which percentages are meaningful for the research question. They should inspect observed and expected counts before choosing the test. If expected counts are adequate, the chi-square test may be suitable. If counts are sparse, Fisher's exact test may be more appropriate.

Step 1: Identify the two categorical variables.
Step 2: Create a contingency table of counts.
Step 3: Decide whether row percentages or column percentages answer the question.
Step 4: Check expected cell counts.
Step 5: Use chi-square test when expected counts are adequate.
Step 6: Use Fisher's exact test for small or sparse tables.
Step 7: Report counts and percentages, not only p-values.
Step 8: Consider effect measures such as risk difference, risk ratio or odds ratio.
Step 9: Interpret association in context.
Step 10: Avoid causal language unless the design supports it.

Resource guide

Working

Suppose a student studies whether smoking status is associated with disease status. A 2 by 2 table shows smokers and non-smokers by disease yes or no. The chi-square test assesses whether disease status is independent of smoking status. If some cells contain very small counts, Fisher's exact test may be used instead. The result should be reported with counts, percentages and an effect measure where possible.

Rows: smoking status.
Columns: disease status.
Cells: number of people in each combination.
Row percentages answer: within each smoking group, what proportion had disease?
Column percentages answer: among disease groups, what proportion were smokers?
Chi-square p-value assesses evidence against independence.
Risk ratio or odds ratio may describe the size of association.

Resource guide

Limitations

Chi-square and Fisher's exact tests identify evidence of association, but they do not measure the size of the association by themselves. They also do not adjust for confounders. If age, sex or another variable may confound the relationship, logistic regression or stratified analysis may be needed. Large samples can make tiny differences statistically significant.

The tests do not prove causation.
They do not adjust for confounding.
They do not directly estimate effect size.
Very large samples can detect trivial differences.
Very small samples may have low power.
Sparse tables can make interpretation unstable.
Incorrect percentages can mislead readers.

Resource guide

Discussion

A strong report should show the table clearly, describe the relevant percentages and explain the test result in plain language. The p-value should not replace interpretation. Students should state whether there is evidence of association and, where possible, describe the magnitude of the association using an appropriate effect measure.

Report counts and percentages together.
Use the correct denominator for percentages.
State whether chi-square or Fisher's exact test was used.
Explain why Fisher's exact test was needed if used.
Report an effect measure for 2 by 2 tables where appropriate.
Avoid saying the exposure caused the outcome unless justified.

Practical checklist

Before you apply this topic

Are both variables categorical?
Have you created a contingency table?
Have you shown counts?
Have you chosen row or column percentages correctly?
Have you checked expected counts?
Is chi-square appropriate?
Is Fisher's exact test needed?
Have you reported the p-value with interpretation?
Have you considered an effect measure?
Have you avoided causal overclaiming?

Common mistakes

What to avoid

Treating category codes as numerical measurements.
Reporting percentages without counts.
Using the wrong denominator.
Using chi-square with very small expected counts.
Using Fisher's exact test for large tables without need.
Reporting only the p-value.
Ignoring effect size.
Ignoring confounding.
Interpreting association as causation.
Using bar charts that hide denominators.

How this connects to learning

Use the guide as a bridge between theory and application.

A resource guide should not replace a full course or live teaching session. Instead, it helps you organise your thinking. Use it to identify what you understand, what feels unclear, and what questions you should ask before applying a method to real data.

Before a lesson

Read the intuition and problem sections to prepare.

During analysis

Use the method and checklist to guide decisions.

When writing

Use limitations and discussion to improve interpretation.

Related guides

Continue with related topics.

How to choose the correct statistical test

Risk ratios, odds ratios and rates in epidemiology

Logistic regression explained for health and social science students

Understanding p-values, confidence intervals and effect sizes

Common mistakes in dissertation data analysis

Back to all resources Need help applying this?