Computational Reproducibility

UKRN Training Workshop

Clement Lee (Mathematics, Statistics & Physics)

2025-10-22 (Wed)

House rules

Outline

  1. Introduction, RStudio

  2. Reproducible documents with R Markdown

  3. Practical

  4. Advanced stuff

 

Why coding?

What tools do you use?

  • Statistical analysis: Python, MS Excel?

  • Writing presentations: MS Powerpoint?

  • Writing papers: MS Word, Overleaf?

 

We can do all in one unified approach (in R)

  • Statistical analysis: Python, MS Excel?

  • Writing presentations: MS Powerpoint?

  • Writing papers: MS Word, Overleaf?

 

  • There are some caveats

  • Showing you what’s possible

  • Particularly useful if you have a lot of numbers / tables / figures from your analysis

Scope

  • Can others run your code and get identical results?

  • Can reproducibility be done in an efficient manner?

 

Not covered

  • Can others perform the same (non-computational) experiment themselves, and reach the same conclusion with broadly the same results?

  • Is my hypothesis valid? Are my methods / models useful?

  • Python

Why not Python?

R

RStudio

So far so good?

The case for computational reproducibility

  • Coding is only the first step

  • Reproducibility not guaranteed in the whole process

 

The old-school way

What if your analysis or data changes?

Another scenario

  • Wanting to revert to previous version of the analysis, but the scripts have already been updated

  • Save a new set of scripts every time

  • Eventually it becomes difficult to find the right version again

  • Autosave & track changes help, but still not perfect

 

Side track: Literate programming

Enter R Markdown

Installation

Chunk options

Originally

The script (.Rmd)

```{r}


plot(cars)
```
The figure above shows the distance
against speed in the cars data set.

 

The output (.pdf / .html)

plot(cars)

The figure above shows the distance against speed in the cars data set.

Hide the code

The script (.Rmd)

```{r}
#| echo: FALSE

plot(cars)
```
The figure above shows the distance
against speed in the cars data set.

 

The output (.pdf / .html) The figure above shows the distance against speed in the cars data set.

Make the plot smaller

The script (.Rmd)

```{r}
#| echo: FALSE
#| out.width: "70%"
plot(cars)
```
The figure above shows the distance
against speed in the cars data set.

 

The output (.pdf / .html) The figure above shows the distance against speed in the cars data set.

Proper figure caption

The script (.Rmd)

```{r}
#| echo: FALSE
#| fig.cap: "Distance against speed in cars data set."
plot(cars)
```

 

The output (.pdf / .html)
Distance against speed in cars data set.

Distance against speed in cars data set.

Don’t evaluate the code

The script (.Rmd)

```{r}
#| eval: FALSE

plot(cars)
```

Also useful when showing code that doesn’t work

 

The output (.pdf / .html)

plot(cars)

Hide the plots (but still evaluate)

The script (.Rmd)

```{r}
#| fig.show: "hide"

plot(cars)
```

 

The output (.pdf / .html)

plot(cars)

Hide the code & results (but still evaluate)

The script (.Rmd)

```{r}
#| include: FALSE

plot(cars)
```

 

The output (.pdf / .html)

More chunk options

“But I use LaTeX …”

$$ \frac{\pi}{4} = 1-\frac{1}{3}+\frac{1}{5}-\frac{1}{7} + \cdots = \sum_{k=0}^{\infty}\frac{(-1)^k}{2k+1} $$

\[ \frac{\pi}{4} = 1-\frac{1}{3}+\frac{1}{5}-\frac{1}{7} + \cdots = \sum_{k=0}^{\infty}\frac{(-1)^k}{2k+1} \]

“But I use other languages …”

The script (.Rmd)

```{python}
import math
round(2 * math.pi, 4)
```

 

The output (.pdf / .html)

## 6.2832

Other features (if there’s time)

Your turn – Practical

Goals

  • Think about which part of your analysis can be made reproducible

  • Convert an existing script, or create a new analysis

  • Generate the output in multiple format

 

Troubleshooting

  • No laptop? Use a university computer

  • Don’t know R? Some notes to get you started

  • No data? Consider the following:

  • Excel spreadsheet (.xlsx, .xls)?

    # do this once only
    install.packages("readxl")
    # do the following every time
    library(readxl)
    mydata <- read_xlsx("<filename>.xlsx")

Advanced 1: Quarto

Advanced 2: Bookdown – R Markdown upgrade

A collection of .Rmd files

  • index.Rmd
  • 01-multivariate-data.Rmd
  • 02-pca.Rmd
  • 03-cluster.Rmd
  • 04-regression.Rmd
  • 05-regularisation.Rmd
  • 06-classification.Rmd
  • 07-matrix.Rmd
  • 08-factorisation.Rmd
  • 09-references.Rmd

 

index.Rmd

---
title: "MAS8383 Statistical Learning Methodology"
author: "Clement Lee"
date: Semester 1, 2023/2024  
output: bookdown::gitbook
documentclass: book
papersize: a4
geometry:
- margin=1in
fontsize: 12pt
bibliography: [references.bib]
biblio-style: apalike
link-citations: yes
---
library(bookdown)
render_book("index.Rmd", 
            output_format = c("bookdown::gitbook", "bookdown::pdf_book"))

Advanced 2.5: Writing a whole thesis!

Some blog posts / guides

An ongoing project

 

Advanced 3: Writing journal articles!

# Run this once
install.packages("rticles")

Advanced 3: Writing journal articles!

  • What if a journal has reproducibility requirements?

    • Just give them the Rmd (and data)!
  • No need to give multiple scripts with convoluted instructions

  • Related question: Can somebody reproduce your results on their computer?

    • Absolute paths vs relative paths
    • Package dependencies, environments
  • Scope for pre-submission checks within the university?

 

Advanced 4: Ensuring package versions

  • Package version on your machine \(\neq\) that on mine

  • Changes between versions might break the code

  • Goal: A clean environment for full reproducibility

# Run this once
install.packages("renv")

 

  1. initialise {renv} in your project

    • Creates a local project library under ./renv/library

    • Creates a renv.lock file listing all package versions

    • Activates automatic isolation of your environment when you work in that folder

    library(renv)
    renv::init()

Advanced 4 (cont’d)

  1. install packages inside the {renv} environment

    • Ensures the packages & their versions go into your project’s local library
  2. write & knit R Markdown as usual

    • Local library & locked package versions automatically used as {renv} active in project
    renv::install("readxl")

 

  1. snapshot your environment when done

    • Updates / creates renv.lock file with precise versions of all packages used
    renv::snapshot()
  2. reproduce elsewhere and/or in the future

    • Installs the versions recorded in renv.lock

    • All the other machine needs is the Rmd and lockfile

    # On another machine
    library(renv)
    renv::init() # or below
    #renv::restore()

Summary 1 – scaffolding

  • Standalone outputs via R Markdown

    • R Markdown comes with RStudio
  • Thesis / lecture notes

  • Journal articles templates

  • Python content within

  • Standalone output via Quarto

 

  • LaTeX & packages required for pdf outputs

    • Installation outside R, or
    • R package {tinytex}
    # Run once
    install.packages("tinytex")
    library(tinytex)
    install_tinytex() # installs LaTeX (within R!)
    tlmgr_install() # installs LaTeX packages
  • Full reproducibility with package versions

Summary 2 – practical tips

  • First generate the output without altering the template

  • Can the output be generated and results updated if the data / analysis changes?

  • Can the output be generated on someone’s else computer?

    • Don’t include commands like install.packages()
    • Don’t use absolute paths (see right)
  • Don’t include interactive commands e.g. View()

  • pdf issues can usually be solved with the use of the {tinytex} package

 

Summary 3 – bigger picture

Cons

  • Initial cost (time) to convert \(\qquad\rightarrow\)

  • Not as flexible as LaTeX \(\qquad\rightarrow\)

  • Collaboration trickier \(\qquad\rightarrow\)

 

Pros

Lastly

  • Coding is the first step of computational reproducibility

  • Computational reproducibility is the first step of:

    • Open research
    • Useful analysis / method / model
  • If you find it useful, spread the word & let us know!

 

https://xkcd.com/2054/

https://xkcd.com/2054/