Clement Lee (Mathematics, Statistics & Physics)
2025-03-13 (Thu)
Help yourselves with the refreshments
Not aware of fire alarm testing
Ask questions at any point
Introduction, RStudio
Reproducible documents with R Markdown
Practical
Advanced stuff
Today’s slides: https://bit.ly/ukrn-comp-repro
Further training:
https://bit.ly/literate-programming
Converting from LaTeX:
https://bit.ly/bookdown2024
Future-proof your analysis
for others to reproduce
including your future self
ChatGPT & others making it easier to start
Also the stepping stone for machine learning and AI
but don’t start with something completely unfamiliar
garbage in, garbage out
Version control, CI/CD, and beyond
Statistical analysis: Python, MS Excel?
Writing presentations: MS Powerpoint?
Writing papers: MS Word, Overleaf?
Statistical analysis: Python, MS Excel?
Writing presentations: MS Powerpoint?
Writing papers: MS Word, Overleaf?
There are some caveats
Showing you what’s possible
Particularly useful if you have a lot of numbers / tables / figures from your analysis
Can others run your code and get identical results?
Can reproducibility be done in an efficient manner?
Not covered
Can others perform the same (non-computational) experiment themselves, and reach the same conclusion with broadly the same results?
Is my hypothesis valid? Are my methods / models useful?
Python
Python great for big data pipelines
R is better in some other aspects
tidyverse
Understand which one is better / more suited for your tasks
Free & open source
Download from https://cran.r-project.org/
Anyone can contribute:
The best interface for using R
Download from https://posit.co/download/rstudio-desktop/
Readily available on a university computer
Let’s have a look
Coding is only the first step
Reproducibility not guaranteed in the whole process
Write some scripts with all the code for your analysis
Save the required plots as some image files
Save the numbers / tables as screenshots, or copy the numbers manually to MS Word / Overleaf
Edit the text and generate the output (slides, paper, webpage)
Edit the code in the scripts on your computer
Update the results (plots, numbers, tables) in the plain text files
Repeat if needed
Sooner or later: likely mismatch between reported results and those from running the scripts
Wanting to revert to previous version of the analysis, but the scripts have already been updated
Save a new set of scripts every time
Eventually it becomes difficult to find the right version again
Autosave & track changes help, but still not perfect
I shall pass around the two books
Imagine how painstaking it was to copy all the code, results and plots to the text editor software
What if the code could be run automatically, the results generated on the fly?
Just R + Markdown
Markdown similar to LaTeX
After writing the script, some commands / button clicks create a pdf
Comes with RStudio
Need a bit more within RStudio in order to get pdf to work
Let’s see how it works
What if I don’t want to show the code?
Is it possible to embed figure captions?
The script (.Rmd)
The output (.pdf / .html)
The figure above shows the distance against speed in the cars data set.
The script (.Rmd)
The output (.pdf / .html) The figure above shows the distance against speed in the cars data set.
The script (.Rmd)
The output (.pdf / .html) The figure above shows the distance against speed in the cars data set.
The script (.Rmd)
Distance against speed in cars data set.
The script (.Rmd)
Also useful when showing code that doesn’t work
The output (.pdf / .html)
Caching
Code decoration
Side-by-side plots
…
Full list here: https://yihui.org/knitr/options/
Output in multiple formats
Not all is lost
\[ \frac{\pi}{4} = 1-\frac{1}{3}+\frac{1}{5}-\frac{1}{7} + \cdots = \sum_{k=0}^{\infty}\frac{(-1)^k}{2k+1} \]
Conversion from a clunky language (LaTeX) to a light-weight one (Markdown)
Presentation on conversion: https://bit.ly/bookdown2024
Supports Python, SQL, Julia, etc.
{reticulate} package for Python
The script (.Rmd)
The output (.pdf / .html)
## 6.2832
\(\checkmark\) Inline code (\(\rightarrow\) no need to hard-code results)
\(\checkmark\) External figures (locally or on the web)
\(\checkmark\) Tables
\(\checkmark\) Referencing figures & tables
\(\checkmark\) Citations via BibTeX
\(\checkmark\) Use LaTeX packages
\(\checkmark\) Resolving LaTeX issues within RStudio
Goals
Think about which part of your analysis can be made reproducible
Convert an existing script, or create a new analysis
Generate the output in multiple format
Troubleshooting
No laptop? Use a university computer
Don’t know R? Some notes to get you started
No data? Consider the following:
mtcars
(a built-in data set)Excel spreadsheet (.xlsx, .xls)?
A collection of .Rmd files
Some blog posts / guides
https://tysonbarrett.com/jekyll/update/2018/02/11/r_dissertation/
https://cran.r-project.org/web/packages/iheiddown/vignettes/thesis.html
https://ourcodingclub.github.io/tutorials/rmarkdown-dissertation/
An ongoing project
Templates for different institutions
Not yet for Newcastle University
{rticles} package provides templates – again, you can contribute
Examples at https://pkgs.rstudio.com/rticles/articles/examples.html
There might be more issues as the output is usually pdf
Let’s see how it works
What if a journal has reproducibility requirements?
No need to give multiple scripts with convoluted instructions
Related question: Can somebody reproduce your results on their computer?
Scope for pre-submission checks within the university?
Standalone outputs via R Markdown
Thesis / lecture notes
Journal articles templates
Python content within
Standalone output via Quarto
LaTeX & packages required for pdf outputs
First generate the output without altering the template
Can the output be generated and results updated if the data / analysis changes?
Can the output be generated on someone’s else computer?
install.packages()
Don’t include interactive commands e.g. View()
pdf issues can usually be solved with the use of the {tinytex} package
Cons
Initial cost (time) to convert \(\qquad\rightarrow\)
Not as flexible as LaTeX \(\qquad\rightarrow\)
Collaboration trickier \(\qquad\rightarrow\)
Pros
The cost to switch increases over time
Sufficiently developed over the years
Possible with RStudio cloud
Or through version control
Coding is the first step of computational reproducibility
Computational reproducibility is the first step of:
If you find it useful, spread the word & let us know!
Email: clement.lee@newcastle.ac.uk
I will send the link to the slides (or the Rmd?)
Questions?