Module 1 · Machine Learning in Biostatistics

Foundations of machine learning in biostatistics.

This module builds the judgement needed before algorithms. You will learn how clinical prediction questions become machine learning workflows, why validation matters, how R output should be interpreted, and why medical ML reports must avoid causal overclaiming.

Start Lesson 1.1 →View all lessons Start learning

5

Lessons

1

Open now

R

Browser lab

July 2026

Full release

What this module builds

Safe clinical prediction thinking.

Prediction before algorithms

Students learn to define the clinical prediction question before choosing a model. The module begins with outcome, predictors, target population and prediction timing.

Responsible interpretation

The module separates prediction, explanation and causation so students do not overclaim what an ML model can prove.

Validation awareness

Training, testing, overfitting, leakage and generalisation are treated as core biostatistical ideas, not technical afterthoughts.

R output to report

Lesson 1.1 introduces the course style: run an R script, inspect the output, interpret the results and write a cautious report.

Clinical ML judgement

Students learn why model accuracy alone is not enough for health-data decisions and why clinical usefulness must be considered.

By the end

Students should understand the modelling workflow.

Define machine learning as a prediction workflow in biostatistics.

Identify outcomes, predictors, target population and prediction timing.

Explain why prediction is not the same as causation.

Recognise supervised, unsupervised and semi-supervised learning tasks.

Explain overfitting, data leakage and poor generalisation.

Interpret first model outputs from R scripts.

Write cautious conclusions from prediction-model results.

Module pathway

From clinical question to cautious report.

The first module teaches the thinking structure that every later model will follow: define the question, choose valid predictors, fit a model, validate it, interpret the output and report responsibly.

Step 1

Question

What clinical or health-data outcome do we want to predict?

Step 2

Outcome

What is the response variable, and how is it measured?

Step 3

Predictors

Which variables are available at the time prediction is made?

Step 4

Model

Which prediction rule is fitted, and what does it output?

Step 5

Validation

Does the model work on observations not used to train it?

Step 6

Report

What can we honestly conclude, and what must be cautioned?

Lesson design

The ML course is script-led and interpretation-led.

Conversational lecture with clinical prediction examples

Detailed notes connecting statistics, ML and interpretation

Browser-based R coding lab

Downloadable R script and shared dataset

Output guide explaining script results

Report section translating output into interpretation

Quiz and applied checks

Current release state

One full lesson is open as the preview.

Lesson 1.1 demonstrates the final course format: lecture, detailed notes, interactive lab, browser R console, script output guide, report section and quiz. The remaining lessons are locked while they are being redesigned in the same style.

Open Lesson 1.1 →

Module lessons

Study the lessons in order.

Lesson 1.1 is available now. Lessons 1.2–1.5 currently route to the waitlist until the full July 2026 release.

1.1

Lesson

Open90 minPrediction mindset

What is machine learning in biostatistics?

Understand machine learning as a biostatistical prediction workflow using clinical questions, outcomes, predictors, R output, validation and responsible reporting.

Prediction workflowBrowser R labOutput interpretation

Open now

Open lesson →

1.2

Lesson

Locked90–100 minInterpretation discipline

Prediction, explanation and causal thinking

Separate prediction models from explanatory and causal models, and learn how to avoid unsafe causal claims in medical machine learning reports.

Prediction vs causationConfoundingReporting caution

Locked until July 2026

Start learning →

1.3

Lesson

Locked90–100 minLearning structure

Types of learning in medical data

Classify supervised, unsupervised and semi-supervised learning problems using outcomes, labels, predictors and clinical data structure.

Supervised learningUnsupervised learningClinical labels

Locked until July 2026

Start learning →

1.4

Lesson

Locked100–110 minValidation thinking

Training, testing, overfitting and generalisation

Understand train/test splitting, overfitting, generalisation to unseen patients, data leakage and why training performance can be misleading.

Train/test splitOverfittingLeakage

Locked until July 2026

Start learning →

1.5

Lesson

Locked100–120 minApplied workflow

Biostatistical workflow for machine learning projects

Bring the module together through clinical question design, predictor timing, validation, thresholds, interpretation and transparent reporting.

Clinical workflowThresholdsModel reporting

Locked until July 2026

Start learning →

Join the waitlist

Get access updates when the full module opens in July 2026.

Lessons 1.2–1.5 are currently locked while they are being redesigned with R labs, downloadable scripts, visual outputs, interpretation reports and applied clinical examples.

Lesson 1.1 remains available.

Lessons 1.2–1.5 remain opening in July 2026.

The full module will follow the same structure as Lesson 1.1.

Waitlist visitors can request early access or release updates.

Waitlist form

Request access.

Recommended start

Begin with the open ML foundation lesson.

Lesson 1.1 introduces the full learning style for this course: clinical question, R script, browser output, interpretation, report writing, quiz and responsible modelling caution.

Open Lesson 1.1 →