Introduction to causal inference and DAGs
An advanced guide introducing causal questions, counterfactual thinking, directed acyclic graphs, confounding, colliders, mediators and why causal inference is more than regression adjustment.
Structure
Problem, intuition, method, working, limitations and discussion.
Best for
Students preparing for coursework, analysis, interpretation or revision.
Use with
Learning Hub lessons, tutoring sessions or dissertation planning.
Resource guide
Problem
Students often use regression models to make causal-sounding claims without first asking whether the study design and assumptions support a causal interpretation. A model may show an association, but association alone does not prove that changing an exposure would change an outcome. Causal inference provides a framework for asking clearer questions about cause and effect. Directed acyclic graphs, or DAGs, help students think visually about relationships between variables before deciding what to adjust for.
- Regression coefficients are often interpreted causally without justification.
- Students adjust for many variables without knowing whether they are confounders, mediators or colliders.
- A variable can introduce bias if adjusted for incorrectly.
- Temporal order is ignored when variables are measured at different stages.
- Causal questions are confused with prediction questions.
- Confounding is treated as a purely statistical problem instead of a design and reasoning problem.
- DAGs are often misunderstood as decorative diagrams rather than analytical tools.
Resource guide
Intuition
Causal inference asks what would happen to the same population under different exposure conditions. For example, instead of asking whether smokers have higher disease risk, a causal question asks what would happen to disease risk if the same people smoked versus did not smoke. Because we cannot observe both realities for the same person at the same time, causal inference depends on study design, assumptions and careful control of bias.
- Association asks whether two variables move together.
- Causation asks whether changing one variable would change another.
- A confounder is a common cause of the exposure and outcome.
- A mediator lies on the pathway between exposure and outcome.
- A collider is a common effect of two variables and should usually not be adjusted for.
- A DAG helps decide which variables should and should not be adjusted for.
- Good causal thinking begins before model fitting.
Resource guide
Method
A causal analysis should begin with a precise causal question. The exposure, outcome, target population, time order and target effect should be defined before modelling. A DAG can then be used to represent assumptions about how variables relate to one another. The adjustment set should be chosen to block backdoor paths from exposure to outcome while avoiding adjustment for mediators and colliders unless the question specifically requires it.
- Step 1: Define the causal question clearly.
- Step 2: Identify the exposure or intervention of interest.
- Step 3: Identify the outcome and relevant time period.
- Step 4: Define the target population.
- Step 5: Draw a DAG based on subject knowledge, not p-values.
- Step 6: Identify confounding paths between exposure and outcome.
- Step 7: Choose an adjustment set that blocks backdoor paths.
- Step 8: Avoid adjusting for colliders.
- Step 9: Avoid adjusting for mediators when estimating the total effect.
- Step 10: Interpret model estimates only within the limits of the assumptions.
Resource guide
Working
Suppose a study asks whether maternal smoking during pregnancy affects offspring birthweight. A simple comparison of birthweight between smokers and non-smokers may be confounded by maternal age, socioeconomic position, nutrition and healthcare access. A DAG can help identify common causes of smoking and birthweight. The analysis should adjust for appropriate pre-exposure confounders, while avoiding variables that lie after smoking on the causal pathway if the aim is the total effect.
- Exposure: maternal smoking during pregnancy.
- Outcome: offspring birthweight.
- Potential confounders: maternal age, socioeconomic position and pre-pregnancy health.
- Potential mediators: foetal growth restriction mechanisms after smoking exposure.
- Do not choose adjustment variables only because they are statistically significant.
- Do not adjust for variables measured after the exposure without understanding their role.
- Use the DAG to explain why variables were included or excluded.
- Report causal conclusions cautiously if the data are observational.
Resource guide
Limitations
DAGs are useful because they make assumptions visible, but they do not prove that the assumptions are correct. A DAG can be wrong if important variables are missing or relationships are misunderstood. Causal inference also cannot fully fix poor measurement, selection bias, missing data or unmeasured confounding. Statistical adjustment is only one part of causal reasoning.
- DAGs depend on subject-matter assumptions.
- Unmeasured confounding can remain even after adjustment.
- Measurement error can bias causal estimates.
- Selection bias can arise from conditioning on colliders.
- Temporal ambiguity weakens causal interpretation.
- Observational studies require more caution than randomised trials.
- A well-drawn DAG does not replace sensitivity analysis or careful study design.
Resource guide
Discussion
A strong causal discussion should separate what the data show from what the assumptions allow us to claim. Students should explain the target causal question, the adjustment strategy and the main limitations. Causal language should be used carefully. If the study design is observational and unmeasured confounding is plausible, the conclusion should be framed as evidence consistent with a possible causal relationship rather than proof.
- State whether the aim is causal, predictive or descriptive.
- Explain the exposure, outcome and target effect.
- Use the DAG to justify covariate adjustment.
- Discuss unmeasured confounding clearly.
- Avoid saying regression adjustment proves causality.
- Distinguish total effects from direct effects.
- Explain how future study designs could strengthen causal evidence.
Practical checklist
Before you apply this topic
- Have you stated a causal question?
- Have you defined the exposure?
- Have you defined the outcome?
- Have you considered time order?
- Have you identified the target population?
- Have you drawn or described a DAG?
- Have you identified potential confounders?
- Have you avoided adjusting for colliders?
- Have you avoided adjusting for mediators when estimating total effects?
- Have you justified your adjustment set?
- Have you discussed unmeasured confounding?
- Have you avoided unsupported causal claims?
Common mistakes
What to avoid
- Assuming regression automatically gives causal effects.
- Adjusting for every available variable.
- Choosing covariates only using p-values.
- Adjusting for mediators when estimating total effects.
- Adjusting for colliders and introducing bias.
- Drawing DAGs after analysis only to justify results.
- Ignoring time order between exposure, confounders and outcome.
- Confusing prediction accuracy with causal validity.
- Claiming causation from cross-sectional data without caution.
- Ignoring unmeasured confounding.
How this connects to learning
Use the guide as a bridge between theory and application.
A resource guide should not replace a full course or live teaching session. Instead, it helps you organise your thinking. Use it to identify what you understand, what feels unclear, and what questions you should ask before applying a method to real data.
Before a lesson
Read the intuition and problem sections to prepare.
During analysis
Use the method and checklist to guide decisions.
When writing
Use limitations and discussion to improve interpretation.
Related guides
