Research methodsIntermediateResource guide

Common mistakes in dissertation data analysis

An advanced guide to the most common statistical, methodological and reporting mistakes students make in dissertation data analysis, with practical ways to avoid them.

Structure

Problem, intuition, method, working, limitations and discussion.

Best for

Students preparing for coursework, analysis, interpretation or revision.

Use with

Learning Hub lessons, tutoring sessions or dissertation planning.

Resource guide

Problem

Many dissertation analyses fail not because the student lacks effort, but because the analysis is not planned carefully. Students often rush from dataset to software output without first defining the research question, identifying variables, checking assumptions or deciding how results should be interpreted. This leads to weak method choice, unclear reporting, overinterpretation of p-values and conclusions that do not match the evidence.

The research question is too broad or vague.
The outcome variable is not clearly defined.
Variables are analysed before their meaning and coding are understood.
The statistical test is chosen from habit rather than from the question.
Data cleaning decisions are undocumented.
Missing data are ignored or handled inconsistently.
Regression models include too many or too few variables.
Results are reported as software output instead of interpreted findings.

Resource guide

Intuition

A dissertation analysis should behave like a structured argument. The research question creates the need for the analysis. The variables provide the evidence. The method connects the evidence to the question. The results show what the data suggest. The discussion explains what the findings mean and what they do not mean. When any part of this chain is weak, the whole analysis becomes difficult to defend.

A good analysis starts with a precise research question.
A good dataset must be understood before it is analysed.
A good method should match the outcome, predictors and study design.
A good result section explains estimates, uncertainty and interpretation.
A good discussion separates evidence from speculation.
A good dissertation is reproducible enough that another person can follow the decisions.

Resource guide

Method

A strong dissertation workflow should be planned before formal analysis begins. Students should create a small statistical analysis plan, even if the dissertation does not formally require one. This plan should define the question, outcome, exposures, covariates, descriptive summaries, main analysis, assumption checks, missing data approach and reporting style. The aim is to reduce random decision-making during analysis.

Step 1: Write the main research question in one clear sentence.
Step 2: Define the primary outcome variable and its type.
Step 3: Define the main exposure, group or predictor.
Step 4: Identify possible confounders or covariates using subject knowledge.
Step 5: Prepare and document the cleaned analysis dataset.
Step 6: Produce descriptive statistics before modelling.
Step 7: Choose the main statistical method based on the outcome and question.
Step 8: Check assumptions or diagnostic issues relevant to the method.
Step 9: Report estimates, confidence intervals and p-values carefully.
Step 10: Discuss limitations, bias, missing data and generalisability.

Resource guide

Working

Suppose a student is studying whether physical activity is associated with depression score among university students. A weak analysis might simply run many tests until something is significant. A stronger analysis defines depression score as the outcome, physical activity as the main predictor, identifies age, sex, sleep and baseline health as possible covariates, checks distributions, chooses linear regression if appropriate and reports the adjusted association with uncertainty.

Weak approach: test every variable against every other variable and report only significant findings.
Better approach: define one main outcome and one main predictor before analysis.
Weak approach: copy p-values from software without explaining effect sizes.
Better approach: interpret coefficients, mean differences, odds ratios or correlations in context.
Weak approach: delete missing values without explanation.
Better approach: report how much data are missing and how missingness was handled.
Weak approach: claim causation from cross-sectional observational data.
Better approach: describe associations and discuss causal limitations honestly.

Resource guide

Limitations

Even a well-planned dissertation analysis has limitations. Student projects often use small samples, secondary datasets, short timelines, imperfect measures and limited scope for advanced sensitivity analysis. The aim is not to pretend the analysis is perfect. The aim is to make the work transparent, coherent and statistically defensible.

Small samples can reduce precision and statistical power.
Secondary datasets may not contain ideal variables.
Measurement error can weaken interpretation.
Cross-sectional studies usually cannot establish temporality.
Unmeasured confounding may remain even after adjustment.
Multiple testing can increase false positive findings.
Time constraints may limit the number of sensitivity analyses.

Resource guide

Discussion

The discussion section should not repeat the results mechanically. It should explain what the findings mean, how they relate to the research question, whether they are consistent with existing literature and what limitations affect interpretation. Strong dissertation writing avoids exaggerated claims and shows awareness of uncertainty.

Connect each main result back to the research question.
Explain the size and direction of effects, not only statistical significance.
Discuss whether findings are practically or clinically meaningful.
Acknowledge missing data, bias and limitations clearly.
Avoid claiming proof when the design only supports association.
Explain how future studies could improve the evidence.

Practical checklist

Before you apply this topic

Is the research question specific enough?
Have you clearly defined the primary outcome?
Have you identified the main exposure or predictor?
Have you checked all variable coding?
Have you documented cleaning decisions?
Have you produced descriptive statistics first?
Have you justified your statistical method?
Have you checked assumptions or diagnostics?
Have you reported effect sizes or estimates?
Have you included confidence intervals?
Have you explained missing data handling?
Have you avoided overclaiming causality?
Have you linked findings back to the dissertation question?

Common mistakes

What to avoid

Starting analysis before defining the research question.
Choosing tests based only on software menus.
Using several tests without a clear analysis plan.
Ignoring variable coding and hidden missing values.
Reporting p-values without effect sizes.
Treating non-significant findings as proof of no association.
Overadjusting regression models with unnecessary variables.
Ignoring confounding in observational data.
Making causal claims from weak study designs.
Writing the discussion as if limitations do not matter.

How this connects to learning

Use the guide as a bridge between theory and application.

A resource guide should not replace a full course or live teaching session. Instead, it helps you organise your thinking. Use it to identify what you understand, what feels unclear, and what questions you should ask before applying a method to real data.

Before a lesson

Read the intuition and problem sections to prepare.

During analysis

Use the method and checklist to guide decisions.

When writing

Use limitations and discussion to improve interpretation.

Related guides

Continue with related topics.

How to prepare your data before analysis

How to choose the correct statistical test

Understanding p-values, confidence intervals and effect sizes

How to report regression results in a dissertation

Missing data: deletion, imputation and reporting

Back to all resources Need help applying this?