Support
← Back to Resources
BiostatisticsIntermediateResource guide

Logistic regression explained for health and social science students

A detailed guide to logistic regression for binary outcomes, including odds, odds ratios, interpretation, adjustment, limitations and common reporting mistakes.

Structure

Problem, intuition, method, working, limitations and discussion.

Best for

Students preparing for coursework, analysis, interpretation or revision.

Use with

Learning Hub lessons, tutoring sessions or dissertation planning.

01

Resource guide

Problem

Many students try to use linear regression for every outcome, even when the outcome has only two categories such as disease yes/no, pass/fail, smoker/non-smoker or readmitted/not readmitted. Linear regression is not designed for binary outcomes because predicted values can fall below 0 or above 1 and the error structure is inappropriate. Logistic regression solves this by modelling the probability of an event through the log-odds scale, but students often struggle to interpret odds ratios correctly.

  • Binary outcomes require different modelling from numerical outcomes.
  • Linear regression can produce impossible probabilities below 0 or above 1.
  • Students often confuse odds with probability.
  • Odds ratios are often interpreted as risk ratios, which can be misleading.
  • Adjusted odds ratios are often reported without explaining covariates.
  • Logistic regression is sometimes treated as automatically causal.
  • Model performance and interpretation are often mixed up.
02

Resource guide

Intuition

Logistic regression is used when the outcome is binary. Instead of modelling the outcome directly as a straight line, it models the log-odds of the event. This keeps predicted probabilities between 0 and 1. The method is widely used in health research, epidemiology, psychology, education and social science because many important outcomes are binary.

  • The outcome has two categories, often coded 0 and 1.
  • The model estimates the probability of the event.
  • The logit link connects predictors to probability through log-odds.
  • Exponentiated coefficients are odds ratios.
  • An odds ratio above 1 suggests higher odds of the event.
  • An odds ratio below 1 suggests lower odds of the event.
  • Adjustment allows comparison while holding other variables constant.
03

Resource guide

Method

A logistic regression analysis begins by defining the binary event. The event coded as 1 should be clear because the interpretation depends on it. Predictors are then selected based on the research question, theory or study design. The model estimates coefficients on the log-odds scale, which are usually converted into odds ratios for reporting. Interpretation should focus on direction, magnitude, uncertainty and context.

  • Step 1: Define the binary outcome clearly.
  • Step 2: Decide which category is the event coded as 1.
  • Step 3: Identify the main exposure or predictor.
  • Step 4: Decide which covariates need adjustment.
  • Step 5: Fit the logistic regression model.
  • Step 6: Convert coefficients to odds ratios if reporting to applied audiences.
  • Step 7: Report confidence intervals and p-values carefully.
  • Step 8: Check whether the model answers an explanatory, predictive or descriptive question.
  • Step 9: Discuss limitations such as confounding, sample size and rare outcomes.
04

Resource guide

Working

Suppose a health dataset studies whether patients are readmitted to hospital within 30 days. The outcome is binary: readmitted yes or no. A logistic regression model can estimate whether age, treatment group, disease severity and previous admissions are associated with the odds of readmission. If the odds ratio for previous admissions is 1.50, this suggests that each increase in previous admissions is associated with higher odds of readmission, assuming the variable is coded that way and other variables in the model are held constant.

  • Outcome: readmitted within 30 days, coded 1 for yes and 0 for no.
  • Predictor: treatment group, age, severity score or previous admission history.
  • Coefficient: estimated on the log-odds scale.
  • Odds ratio: obtained by exponentiating the coefficient.
  • OR = 1 means no difference in odds.
  • OR > 1 means higher odds of the event.
  • OR < 1 means lower odds of the event.
  • Adjusted OR means the estimate accounts for other variables in the model.
05

Resource guide

Limitations

Logistic regression is powerful but often misinterpreted. Odds ratios are not the same as risk ratios, especially when the outcome is common. The model also depends on correct specification, sufficient sample size and sensible covariate choice. In observational data, adjustment reduces some confounding but does not automatically prove causality.

  • Odds ratios can exaggerate interpretation when outcomes are common.
  • Small samples or rare events can produce unstable estimates.
  • Separation can occur when a predictor perfectly predicts the outcome.
  • Important omitted confounders can bias results.
  • Including too many predictors can overfit the model.
  • Linearity in the log-odds may need checking for numerical predictors.
  • Predicted probabilities may be easier to communicate than odds ratios.
06

Resource guide

Discussion

A good logistic regression report should define the event, explain the model purpose, report odds ratios with confidence intervals and interpret results in plain language. Students should be careful not to say that an odds ratio is the same as a percentage increase in probability. They should also distinguish between association, prediction and causation.

  • State how the binary outcome was coded.
  • Report odds ratios with 95% confidence intervals.
  • Interpret odds ratios as odds, not probabilities.
  • Explain adjustment variables clearly.
  • Avoid causal claims unless the design supports them.
  • Consider presenting predicted probabilities for clearer communication.
  • Discuss sample size, rare events and possible confounding.

Practical checklist

Before you apply this topic

  • Is the outcome binary?
  • Have you clearly defined the event coded as 1?
  • Have you identified the main predictor or exposure?
  • Have you justified adjustment variables?
  • Have you checked whether sample size is adequate?
  • Have you reported odds ratios with confidence intervals?
  • Have you avoided interpreting odds ratios as risk ratios?
  • Have you considered predicted probabilities for communication?
  • Have you checked for sparse data or separation?
  • Have you discussed confounding and limitations?
  • Have you linked interpretation back to the research question?
  • Have you avoided claiming causation without justification?

Common mistakes

What to avoid

  • Using linear regression for a binary outcome.
  • Failing to say which category is coded as the event.
  • Interpreting odds ratios as probability ratios.
  • Saying an odds ratio of 2 means the probability doubled.
  • Ignoring confidence intervals.
  • Adding covariates without a reason.
  • Treating adjusted association as proof of causation.
  • Using too many predictors for a small dataset.
  • Ignoring rare outcome problems.
  • Reporting model output without explaining meaning in context.

How this connects to learning

Use the guide as a bridge between theory and application.

A resource guide should not replace a full course or live teaching session. Instead, it helps you organise your thinking. Use it to identify what you understand, what feels unclear, and what questions you should ask before applying a method to real data.

Before a lesson

Read the intuition and problem sections to prepare.

During analysis

Use the method and checklist to guide decisions.

When writing

Use limitations and discussion to improve interpretation.

Related guides

Continue with related topics.

How to choose the correct statistical test
Linear regression assumptions and diagnostics
Understanding p-values, confidence intervals and effect sizes
Confounding, mediation and effect modification
ROC curves, sensitivity, specificity and AUC
How to report regression results in a dissertation