Support
← Back to course homepage

Applied case studies

Machine Learning in Biostatistics Case Studies

Each module will include one applied case study. These case studies connect the lessons to realistic medical machine learning workflows: prediction question, data structure, modelling, validation, threshold interpretation, reporting and limitations.

Case-study aim

Turn lessons into report-ready medical ML workflows.

The case studies are designed to help students move from model output to careful biostatistical interpretation, limitations and transparent conclusions.

5

Planned case studies

1

Available now

5

Course modules

R + report

Format

One case study per module

Each module ends with a case study that turns the lesson concepts into an applied medical ML workflow.

Report-style learning

Students do not only run code. They learn how to explain results, limitations and clinical meaning.

R-based reproducibility

Each case study is designed to have a downloadable R script, generated figures and a structured interpretation.

Clinical caution

Every case study reinforces leakage checks, validation, calibration, thresholds and responsible claims.

Case study pathway

Five applied projects, one for each course module.

The case studies are designed to grow with the course. The first case study uses the Module 1 foundation workflow. Later case studies will introduce supervised learning, validation, calibration, modern models and final applied reporting.

01

Case study

AvailableModule 1

Foundations of Machine Learning in Biostatistics

Diabetes risk prediction workflow

A full introductory case study showing how to define a prediction question, check predictors, split data, fit a model, evaluate performance and report limitations.

Clinical prediction question
Predictor timing and leakage checks
Train/test split
AUC, Brier score, sensitivity and specificity
Threshold trade-off interpretation
Report-style conclusion

02

Case study

PlannedModule 2

Supervised Learning for Clinical and Health Data

Clinical classification with supervised learning

A supervised learning case study comparing logistic regression, k-nearest neighbours and decision trees for a clinical binary outcome.

Regression as prediction
Logistic classification
Distance-based learning
Decision tree interpretation
Pipeline thinking
Clinical comparison of simple models

03

Case study

PlannedModule 3

Model Evaluation, Validation and Performance

Validation, calibration and decision thresholds

A performance-focused case study using resampling, ROC/AUC, calibration, sensitivity, specificity and clinical threshold analysis.

Cross-validation
Bootstrap validation
ROC and AUC
Calibration plots
Clinical usefulness
Decision threshold reporting

04

Case study

PlannedModule 4

Regularisation, Ensembles and Modern Prediction Models

Modern prediction models for health data

A modern ML case study comparing regularised regression, random forests and gradient boosting while avoiding irresponsible model chasing.

Ridge and lasso
Random forests
Gradient boosting
Hyperparameter tuning
Model comparison
Responsible performance claims

05

Case study

PlannedModule 5

Applied Biostatistical ML Case Studies

Final applied medical ML report

A capstone case study bringing together prediction modelling, validation, missing data, imbalance, fairness, interpretation and final reporting.

End-to-end applied workflow
Missing data checks
Class imbalance
Fairness and subgroup performance
Transparent reporting
Final applied R project

Current progress

Case Study 1 is available. Four more will be added as the course develops.

The case-study plan mirrors the five-module course structure: one applied project per module. This gives students repeated practice in turning ML outputs into careful biostatistical interpretation.