I used a lot of different tools and strategies to make this paper easily reproducible at different levels. There's Docker container for the highest level of reproducibility, and package versions are managed with renv. The data used in the paper is hosted on Zenodo to avoid long queue times when downloading from the Climate Data Store and future-proof for when it goes away and checksumed before using it.
The method is trained on the data that were available, but it is meant to be re-trainable as soon as new data are published. It would be great to be really sure that even someone else will be able to do it. In case we receive any feedback, we would be really happy to improve our Github repository so as to make the reproduction easier!
I suggested a few papers last year. I’m hoping that we’ve improved our reproducibility with this one, this year. We’ve done our best to package it up both in Docker and as an R package. I’d be curious to know what the best way to reproduce it is found to be. Working through vignettes or spinning up a Docker instance. Which is the preferred method?
I tried hard to make this paper as reproducible as possible, but as techniques and dependencies become more complex, it is hard to make it 100% clear. Any form of feedback is more than welcome.
I guess it could be a cool learning experience. The paper is written with knitr, uses a seed, is part of the R package it describes, was openly written using version control (SVN, R-Forge) and is available in an open access journal (@up_jors).
It uses the drake R package that should make reproducibility of R projects much easier (just run make.R and you're done). However, it does depend on very specific package versions, which are provided by the accompanying docker image.
This paper is reproduced weekly in a docker container on continuous integration, but it is also set up to work via local installs as well. It would be interesting to see if it's reproducible with a human operator who knows nothing of the project or toolchain.