BiostatisticsAdvancedResource guide

Longitudinal data analysis

An advanced guide to repeated measurements over time, within-person correlation, change, trajectories, time effects, mixed models, missing follow-up and careful interpretation.

Structure

Problem, intuition, method, working, limitations and discussion.

Best for

Students preparing for coursework, analysis, interpretation or revision.

Use with

Learning Hub lessons, tutoring sessions or dissertation planning.

Resource guide

Problem

Longitudinal data arise when the same person, patient, student, sample or unit is measured repeatedly over time. These data are richer than a single cross-sectional measurement because they allow researchers to study change, progression, recovery, deterioration and treatment response. However, they are also more complex because repeated observations from the same individual are correlated. If students analyse longitudinal data as if every row is independent, standard errors, confidence intervals and p-values can become misleading.

Repeated measurements from the same individual are not independent.
Students often compare only baseline and final measurements, wasting information.
Time is sometimes treated as a simple label rather than a meaningful variable.
Dropout and missing follow-up can bias conclusions.
Average change can hide different individual trajectories.
Ordinary regression can underestimate uncertainty when repeated measures are ignored.
Graphs of individual trajectories are often skipped before modelling.

Resource guide

Intuition

Longitudinal analysis asks how outcomes change over time and whether that change differs by exposure, treatment, group or individual characteristics. The key idea is that each individual has a trajectory. Some people start high, some start low, some improve, some decline and some remain stable. A good analysis should respect both the average pattern and the within-person structure of the data.

The same individual contributes multiple observations.
Measurements within the same individual tend to be more similar than measurements from different individuals.
Time can be continuous, categorical or visit-based.
Baseline differences and rates of change are different concepts.
A treatment may affect baseline level, rate of change or both.
Missing follow-up is important because people may drop out for reasons related to the outcome.
Longitudinal plots often reveal patterns that tables hide.

Resource guide

Method

A longitudinal workflow begins by identifying the unit of observation, the individual identifier and the time variable. The analyst should describe how many observations each individual has, how follow-up varies and how missingness occurs over time. Simple summaries and trajectory plots should come before modelling. For formal analysis, mixed-effects models, generalised estimating equations or repeated-measures approaches may be used depending on the research question.

Step 1: Identify the individual or cluster ID.
Step 2: Identify what one row represents: a person-visit, sample-timepoint or repeated measurement.
Step 3: Define the outcome measured repeatedly over time.
Step 4: Define the time variable and its scale.
Step 5: Summarise the number of observations per individual.
Step 6: Plot average trajectories and individual trajectories.
Step 7: Check whether missing follow-up is present and whether it is systematic.
Step 8: Choose a model that accounts for within-person correlation.
Step 9: Consider whether time should be linear, categorical or non-linear.
Step 10: Interpret group, time and group-by-time effects carefully.

Resource guide

Working

Suppose a study measures depression score at baseline, 3 months, 6 months and 12 months after an intervention. A weak analysis might compare only baseline and 12-month scores using a simple t-test. A stronger analysis uses all available time points, accounts for repeated measures within individuals and estimates whether the average change over time differs between intervention and control groups.

Outcome: depression score measured repeatedly.
ID variable: participant identifier.
Time variable: baseline, 3 months, 6 months and 12 months.
Main exposure: intervention group.
A time effect estimates average change over follow-up.
A group effect estimates average difference between groups, depending on coding.
A group-by-time interaction estimates whether change over time differs by group.
A random intercept allows participants to have different baseline levels.
A random slope allows participants to have different rates of change over time.

Resource guide

Limitations

Longitudinal analysis can still be biased if follow-up is incomplete, measurement times vary, important confounders change over time or the model does not represent the trajectory well. A sophisticated model does not remove the need to understand the design. Students should also be careful when interpreting change because regression to the mean, natural recovery and selective dropout can all affect results.

Dropout can bias results if related to the outcome.
Irregular follow-up times may require careful modelling.
Assuming linear change may be unrealistic.
Time-varying confounding can complicate causal interpretation.
Small numbers of repeated observations limit trajectory modelling.
Different individuals may have different patterns of change.
A statistically significant time effect may not be clinically meaningful.

Resource guide

Discussion

A strong longitudinal report should describe the follow-up schedule, the number of measurements, missing follow-up, the modelling approach and the interpretation of change over time. It should also explain whether the analysis estimates average population change, individual variation or treatment differences in trajectories. Conclusions should acknowledge missing data and the assumptions used to handle repeated measurements.

State how many time points were measured.
Report how many participants contributed data at each time point.
Describe missing follow-up and dropout.
Show plots of change over time where possible.
Explain whether time was modelled as continuous or categorical.
Interpret group-by-time interactions in plain language.
Discuss whether the observed change is practically meaningful.

Practical checklist

Before you apply this topic

Have you identified the participant or cluster ID?
Have you defined the repeated outcome?
Have you defined the time variable?
Have you checked how many measurements each person has?
Have you plotted individual and average trajectories?
Have you checked missing follow-up?
Have you chosen a method that accounts for repeated measurements?
Have you justified how time is modelled?
Have you interpreted time effects carefully?
Have you interpreted group-by-time effects correctly?
Have you discussed dropout and missing data?
Have you avoided treating repeated rows as independent?

Common mistakes

What to avoid

Using ordinary regression without accounting for repeated measures.
Comparing only first and last measurements when intermediate data exist.
Ignoring missing follow-up.
Assuming all individuals change in the same way.
Treating time as categorical when a trend is more appropriate, or the reverse.
Interpreting a group effect without considering the time coding.
Ignoring baseline imbalance.
Using too complex a model for a small dataset.
Failing to plot trajectories.
Making causal claims without considering time-varying confounding.

How this connects to learning

Use the guide as a bridge between theory and application.

A resource guide should not replace a full course or live teaching session. Instead, it helps you organise your thinking. Use it to identify what you understand, what feels unclear, and what questions you should ask before applying a method to real data.

Before a lesson

Read the intuition and problem sections to prepare.

During analysis

Use the method and checklist to guide decisions.

When writing

Use limitations and discussion to improve interpretation.

Related guides

Continue with related topics.

Introduction to mixed-effects models

Missing data: deletion, imputation and reporting

Confounding, mediation and effect modification

Linear regression assumptions and diagnostics

How to report regression results in a dissertation

Back to all resources Need help applying this?