Machine Learning in Biostatistics

Learn clinical machine learning with R, validation and responsible interpretation.

An applied course for students learning machine learning in biostatistics, medical statistics and health data science. Module pages are open for preview. Lesson 1.1 is available now. All remaining lessons are waitlist-only until July 2026.

Open Lesson 1.1 →Start learning Preview modules

Current access policy

Module overview pages remain open so students can see the full structure. Only Lesson 1.1 is available for full study. Upcoming ML lessons currently point students to the waitlist until the full course release in July 2026.

Learning design

Built around scripts, outputs and interpretation.

Advanced conversational lectures with clinical prediction examples

Detailed notes connecting statistics, modelling and medical interpretation

Browser-based R coding labs inside the lesson

Downloadable R scripts and shared datasets

Script outputs interpreted directly inside the report section

Interactive labs for thresholds, risk, validation and model behaviour

Quizzes, reporting guidance and applied cautions

By the end

Students should be able to use ML responsibly in health data.

Define prediction problems clearly in health data

Separate prediction, explanation and causation

Build baseline clinical prediction models in R

Interpret model outputs and performance metrics

Understand overfitting, leakage and validation

Explain accuracy, sensitivity, specificity, ROC, AUC and calibration

Compare models responsibly without overclaiming

Prepare for applied biostatistics, health data science and clinical ML work

Course workflow

Every lesson follows the same applied learning loop.

Students do not just run models. They learn how to define the prediction problem, run the R workflow, interpret the output, write a report and state limitations.

Step 1

Clinical question

Each lesson begins with the health-data question: what outcome is predicted, for whom, using which variables and at what time?

Step 2

R script

Students run a guided R script in the browser and can download the full reproducible version for local study.

Step 3

Outputs

The script generates dataset summaries, model results, prediction tables, confusion matrices and performance metrics.

Step 4

Interpretation

The lesson explains what each output means statistically, clinically and cautiously.

Step 5

Report

Students learn how to write a responsible analysis paragraph from the model output.

Step 6

Cautions

Every lesson highlights leakage, overfitting, causal overclaiming, validation limits and clinical usefulness.

Module pages

All module pages are open for preview.

Students can explore the full ML course structure now. Each module page shows the learning pathway, case-study direction, lesson sequence and applied skills.

01Module open

Foundations of Machine Learning in Biostatistics

5 lessons

Prediction thinking, explanation, causation, learning types, training/testing, overfitting and the full biostatistical ML workflow.

PredictionCausal cautionTraining/testingWorkflow

Open module →

02Module open

Supervised Learning for Clinical Health Data

5 lessons

Regression prediction, logistic classification, K-nearest neighbours, decision trees and model pipelines for clinical datasets.

RegressionClassificationKNNDecision trees

Open module →

03Module open

Model Evaluation, Validation and Performance

5 lessons

Train/test splitting, resampling, sensitivity, specificity, ROC, AUC, calibration, decision curves, leakage and reproducibility.

ValidationROC/AUCCalibrationLeakage

Open module →

04Module open

Regularisation, Ensembles and Modern Prediction Models

5 lessons

Ridge, lasso, elastic net, random forests, gradient boosting, support vector machines and responsible model comparison.

RegularisationForestsBoostingComparison

Open module →

05Module open

Applied Biostatistical ML Case Studies

5 lessons

Clinical risk prediction, survival prediction, high-dimensional omics, missing data, imbalance, fairness and a final applied R project.

Risk predictionSurvival MLOmicsFairness

Open module →

Lesson access

Lesson 1.1 is available now. All remaining lessons open in July 2026.

Locked lessons currently send students to the waitlist. This lets visitors see the full curriculum while keeping the full advanced R-based release controlled.

1.1Open

Module 1

What is machine learning in biostatistics?

Open now

Open lesson →

1.2Locked

Module 1

Prediction, explanation and causal thinking

Locked until July 2026

Start learning →

1.3Locked

Module 1

Types of learning in medical data

Locked until July 2026

Start learning →

1.4Locked

Module 1

Training, testing, overfitting and generalisation

Locked until July 2026

Start learning →

1.5Locked

Module 1

Biostatistical workflow for machine learning projects

Locked until July 2026

Start learning →

2.1Locked

Module 2

Regression as a prediction model

Locked until July 2026

Start learning →

2.2Locked

Module 2

Logistic regression as a classifier

Locked until July 2026

Start learning →

2.3Locked

Module 2

K-nearest neighbours and distance-based learning

Locked until July 2026

Start learning →

2.4Locked

Module 2

Decision trees and rule-based prediction

Locked until July 2026

Start learning →

2.5Locked

Module 2

Model pipelines for clinical datasets

Locked until July 2026

Start learning →

3.1Locked

Module 3

Train/test split and resampling

Locked until July 2026

Start learning →

3.2Locked

Module 3

Classification metrics, sensitivity, specificity, ROC and AUC

Locked until July 2026

Start learning →

3.3Locked

Module 3

Calibration, clinical usefulness and decision curves

Locked until July 2026

Start learning →

3.4Locked

Module 3

Cross-validation and bootstrap validation

Locked until July 2026

Start learning →

3.5Locked

Module 3

Bias, leakage and reproducibility in health ML

Locked until July 2026

Start learning →

4.1Locked

Module 4

Ridge, lasso and elastic net

Locked until July 2026

Start learning →

4.2Locked

Module 4

Random forests

Locked until July 2026

Start learning →

4.3Locked

Module 4

Gradient boosting

Locked until July 2026

Start learning →

4.4Locked

Module 4

Support vector machines and flexible boundaries

Locked until July 2026

Start learning →

4.5Locked

Module 4

Comparing models responsibly

Locked until July 2026

Start learning →

5.1Locked

Module 5

Clinical risk prediction case study

Locked until July 2026

Start learning →

5.2Locked

Module 5

Survival prediction and censored outcomes

Locked until July 2026

Start learning →

5.3Locked

Module 5

High-dimensional omics and feature selection

Locked until July 2026

Start learning →

5.4Locked

Module 5

Missing data, imbalance and fairness

Locked until July 2026

Start learning →

5.5Locked

Module 5

Final applied ML project in R

Locked until July 2026

Start learning →

Join the waitlist

Get access updates when the full ML course opens in July 2026.

The course is being redesigned lesson-by-lesson with advanced R scripts, browser coding labs, downloadable files, visual model outputs and clinical interpretation reports.

Module pages stay open.

Lesson 1.1 remains available.

All remaining lessons open in July 2026.

Waitlist visitors can request early access or release updates.

Waitlist form

Request access.

Recommended start

Begin with the open foundation lesson.

Lesson 1.1 introduces machine learning as a biostatistical prediction workflow: clinical question, outcome, predictors, training/testing, R output, interpretation and responsible reporting.

Open Lesson 1.1 →