Support
← Back to ML modules

Module 1

Foundations of Machine Learning in Biostatistics

This module builds the core language of medical machine learning: prediction, explanation, causal caution, learning types, validation, overfitting, leakage and responsible biostatistical reporting.

Module aim

Build judgement before algorithms.

The purpose of this module is to help students understand what a medical prediction model can and cannot support before moving into supervised learning methods.

5

Lessons

R

Coding labs

Complete

Module status

Clinical

Prediction focus

Prediction thinking

Students learn to define a prediction question before choosing an algorithm.

Interpretation discipline

The module separates prediction, explanation and causation to avoid unsafe claims.

Validation awareness

Training, testing, overfitting, generalisation and leakage are treated as core ideas.

Module lessons

Study the lessons in order.

Each lesson adds one layer of judgement: what ML means in health data, how to avoid causal overclaiming, how to classify learning tasks, how to validate honestly and how to report a complete biostatistical ML workflow.

Learning route

Finish this module before moving into supervised learning.

Module 2 assumes that students understand prediction questions, predictor timing, validation logic, overfitting, leakage and the difference between prediction, explanation and causation.

Continue to Module 2 →

Case study route

Apply Module 1 ideas to diabetes risk prediction.

After completing the foundation lessons, use the case study to see how prediction, validation, thresholds and reporting appear in a health-data example.

Open case study →