Choosing between correlation and regression
A detailed guide helping students understand when to use correlation, when to use regression, and why the research question matters more than the software menu.
Structure
Problem, intuition, method, working, limitations and discussion.
Best for
Students preparing for coursework, analysis, interpretation or revision.
Use with
Learning Hub lessons, tutoring sessions or dissertation planning.
Resource guide
Problem
Students often confuse correlation and regression because both are used to study relationships between variables. The confusion becomes worse when both methods give a number, a p-value and sometimes a graph with a line. However, correlation and regression answer different questions. Correlation describes the strength and direction of association between two variables. Regression describes how an outcome changes with one or more predictors. Choosing the wrong one can lead to weak interpretation, poor reporting and an analysis that does not answer the actual research question.
- Correlation is often used when regression is actually needed.
- Students may report a correlation coefficient without explaining the research question.
- Correlation treats both variables symmetrically, while regression separates outcome and predictor.
- Regression can adjust for additional variables, but simple correlation cannot.
- Correlation does not provide a prediction equation.
- A strong correlation does not imply causation.
- A weak correlation can hide non-linear patterns or subgroup differences.
Resource guide
Intuition
Correlation asks whether two numerical variables tend to move together. Regression asks how one variable changes when another variable changes. In correlation, the question is usually: are these two measurements associated? In regression, the question is usually: can we explain, estimate, predict or adjust an outcome using one or more predictors? This distinction is extremely important in dissertations, health research and social science analysis.
- Use correlation when both variables are numerical and you only want to describe association.
- Use regression when one variable is clearly the outcome.
- Use regression when you want an equation or estimated change.
- Use regression when you need to adjust for age, sex, baseline score or other covariates.
- Use correlation carefully when the relationship is approximately linear.
- Use graphs before both methods because a single number can hide the pattern.
Resource guide
Method
A reliable decision process begins with the research question. If the question asks whether two numerical variables are associated, correlation may be enough. If the question asks whether one variable predicts, explains or is associated with an outcome after adjustment, regression is usually more appropriate. The statistical method should follow the structure of the question, not the other way round.
- Step 1: Write the research question in one sentence.
- Step 2: Identify whether there is a clear outcome variable.
- Step 3: Check whether both variables are numerical.
- Step 4: Draw a scatterplot before calculating anything.
- Step 5: Ask whether you only need strength and direction of association.
- Step 6: Ask whether you need an estimated change in the outcome.
- Step 7: Ask whether you need adjustment for confounders or covariates.
- Step 8: Choose correlation for simple association and regression for modelling an outcome.
Resource guide
Working
Suppose a student has data on study hours and exam score. If the question is simply whether students who study more tend to score higher, correlation may be useful. If the question is how many extra marks are expected for each additional hour of study, regression is better. If the student also wants to adjust for previous exam score, attendance and teaching group, multiple regression is required.
- Question: Are study hours and exam scores associated? Use correlation.
- Question: How much does exam score change per extra study hour? Use simple linear regression.
- Question: Does study time predict exam score after adjusting for attendance? Use multiple linear regression.
- Question: Can exam score be predicted from several student characteristics? Use regression.
- Question: Are blood pressure and age related? Correlation may describe the association.
- Question: How does blood pressure change with age after adjusting for BMI and smoking? Use regression.
Resource guide
Limitations
Neither correlation nor regression automatically proves causation. Both methods can be distorted by outliers, non-linear patterns, measurement error, restricted ranges and confounding. Correlation is especially limited because it cannot adjust for other variables. Regression is more flexible, but it requires stronger modelling decisions and assumptions.
- Correlation does not distinguish outcome from predictor.
- Correlation cannot adjust for confounding variables.
- Pearson correlation can be misleading when the relationship is non-linear.
- Outliers can strongly influence both correlation and regression.
- Regression coefficients can be misinterpreted as causal effects.
- Regression assumptions should be checked before reporting results.
- A statistically significant association may still be practically small.
Resource guide
Discussion
Good interpretation should explain why the chosen method matches the question. Correlation is useful for describing simple numerical association. Regression is more useful when the analysis has a clear outcome, a need for prediction, an estimated change or adjustment for other variables. In academic writing, students should avoid saying that correlation proves an effect or that regression automatically shows causation.
- Use correlation language for association, not prediction.
- Use regression language for expected change, modelling or adjustment.
- Report the estimate, confidence interval and p-value where appropriate.
- Show a scatterplot when studying relationships between numerical variables.
- Explain whether the analysis is descriptive, predictive or explanatory.
- Avoid causal claims unless the study design and assumptions support them.
Practical checklist
Before you apply this topic
- Have you written the research question clearly?
- Are both variables numerical?
- Is one variable clearly the outcome?
- Do you only need strength and direction of association?
- Do you need an estimated change in the outcome?
- Do you need prediction?
- Do you need adjustment for confounders or covariates?
- Have you drawn a scatterplot?
- Is the relationship approximately linear?
- Have you checked for outliers?
- Have you avoided causal language unless justified?
- Can you explain why correlation or regression is more appropriate?
Common mistakes
What to avoid
- Using correlation when the research question has a clear outcome.
- Using correlation when adjustment for confounding is needed.
- Interpreting correlation as causation.
- Ignoring scatterplots before calculating correlation.
- Reporting only the p-value and not the correlation coefficient.
- Treating regression as automatically causal.
- Using regression without checking assumptions.
- Using correlation for categorical variables without considering other methods.
- Ignoring outliers that drive the association.
- Confusing statistical significance with meaningful association.
How this connects to learning
Use the guide as a bridge between theory and application.
A resource guide should not replace a full course or live teaching session. Instead, it helps you organise your thinking. Use it to identify what you understand, what feels unclear, and what questions you should ask before applying a method to real data.
Before a lesson
Read the intuition and problem sections to prepare.
During analysis
Use the method and checklist to guide decisions.
When writing
Use limitations and discussion to improve interpretation.
Related guides
