Reproducible analysis with R Markdown or Quarto
An advanced guide to reproducible statistical analysis using literate programming, project structure, versioned scripts, dynamic reports, transparent decisions and reliable workflows.
Structure
Problem, intuition, method, working, limitations and discussion.
Best for
Students preparing for coursework, analysis, interpretation or revision.
Use with
Learning Hub lessons, tutoring sessions or dissertation planning.
Resource guide
Problem
Many student and research projects become difficult to trust because the analysis is not reproducible. Results are copied manually from software into Word documents, figures are saved with unclear names, cleaning decisions are forgotten and different versions of the dataset produce different answers. Reproducible analysis aims to make the workflow transparent enough that another person, or the future version of the same analyst, can understand how the results were produced.
- Tables and figures are copied manually from software output.
- Dataset versions are unclear.
- Cleaning decisions are not recorded.
- Analysis scripts are scattered across folders.
- Results cannot be recreated from raw data.
- Small manual edits create inconsistencies between tables, figures and text.
- Supervisors or collaborators cannot easily audit the workflow.
Resource guide
Intuition
Reproducible analysis is about connecting data, code, explanation and output. R Markdown and Quarto allow students to write narrative text and analysis code in the same document. When the document is rendered, the code runs and produces updated tables, figures and results. This reduces manual copying and makes the analysis easier to check, revise and defend.
- The raw data should remain unchanged.
- Cleaning steps should be written as code or clearly documented.
- Tables and figures should be generated from the analysis dataset.
- Narrative text should explain why each step was done.
- Dynamic reports reduce copy-and-paste errors.
- A clear project folder structure makes collaboration easier.
- Reproducibility is a habit, not a final decoration.
Resource guide
Method
A reproducible workflow begins with a clean project structure. Raw data, processed data, scripts, outputs and reports should be separated. The analysis should move from raw data to cleaned data to results through documented steps. R Markdown or Quarto can then be used to combine explanation, code, output and interpretation in one report. Version control can further track changes over time.
- Step 1: Create a project folder with clear subfolders.
- Step 2: Keep raw data unchanged.
- Step 3: Store cleaned data separately from raw data.
- Step 4: Write scripts for data cleaning, analysis and figures.
- Step 5: Use clear file names and dates only when helpful.
- Step 6: Create a dynamic report using R Markdown or Quarto.
- Step 7: Generate tables and figures from code rather than manual copying.
- Step 8: Record package versions or computational environment where needed.
- Step 9: Use version control for important projects.
- Step 10: Render the report from start to finish before submission.
Resource guide
Working
Suppose a dissertation uses survey data. A reproducible workflow stores the original survey export in a raw-data folder, uses a cleaning script to recode variables and create an analysis dataset, uses an analysis script or Quarto document to fit models and generates tables directly from the results. If a coding error is found later, the student can fix the cleaning step and regenerate the entire report consistently.
- Raw data folder: contains the untouched original dataset.
- Processed data folder: contains cleaned analysis-ready data.
- Scripts folder: contains cleaning, analysis and plotting scripts.
- Outputs folder: contains generated tables and figures.
- Report file: explains the analysis and produces dynamic output.
- README file: explains the project structure.
- Version control: records changes over time.
- Final render: checks that the full analysis runs from beginning to end.
Resource guide
Limitations
Reproducibility does not guarantee that the analysis is statistically correct. A perfectly reproducible wrong analysis is still wrong. Reproducible workflows also require discipline, time and organisation. Sensitive data may not be shareable, and some manual decisions may still need written documentation. The goal is not perfection but transparency and reliability.
- Reproducible code can still contain statistical mistakes.
- Sensitive datasets may require restricted access.
- Large files may be difficult to version.
- Complex projects need documentation to remain understandable.
- Package updates can change results if environments are not controlled.
- Beginners may need time to learn project organisation.
- Reproducibility should support thinking, not replace it.
Resource guide
Discussion
A strong project report should make the analysis traceable. Even if the raw data cannot be shared, the structure, code logic, variable definitions and decisions should be clear. For dissertations, reproducibility helps students answer supervisor questions, correct mistakes, update results and defend methodological choices. For research, it improves transparency, collaboration and credibility.
- Explain where data came from and how they were cleaned.
- Describe major recoding and exclusion decisions.
- Use dynamic tables and figures where possible.
- Avoid manually editing final numbers in the report.
- Keep a record of assumptions and limitations.
- Render the full report before submission.
- Treat reproducibility as part of academic integrity.
Practical checklist
Before you apply this topic
- Have you kept raw data unchanged?
- Have you separated raw and cleaned data?
- Have you documented cleaning decisions?
- Have you used clear folder names?
- Have you generated tables and figures from code?
- Have you avoided manual copy-and-paste where possible?
- Have you created a dynamic report?
- Have you checked that the report renders from start to finish?
- Have you recorded package or software details where needed?
- Have you used version control for important changes?
- Have you written a README or project notes?
- Can another person understand how results were produced?
Common mistakes
What to avoid
- Overwriting the raw dataset.
- Saving many unclear file versions such as final, final2 and final_real.
- Copying numbers manually into a report.
- Editing figures manually after export.
- Not documenting recoding decisions.
- Mixing raw data, scripts and outputs in one folder.
- Running code chunks out of order and trusting stale results.
- Submitting a report without rendering it from start to finish.
- Ignoring package versions in long projects.
- Thinking reproducibility is only needed for professional researchers.
How this connects to learning
Use the guide as a bridge between theory and application.
A resource guide should not replace a full course or live teaching session. Instead, it helps you organise your thinking. Use it to identify what you understand, what feels unclear, and what questions you should ask before applying a method to real data.
Before a lesson
Read the intuition and problem sections to prepare.
During analysis
Use the method and checklist to guide decisions.
When writing
Use limitations and discussion to improve interpretation.
Related guides
