I used a lot of different tools and strategies to make this paper easily reproducible at different levels. There's Docker container for the highest level of reproducibility, and package versions are managed with renv. The data used in the paper is hosted on Zenodo to avoid long queue times when downloading from the Climate Data Store and future-proof for when it goes away and checksumed before using it.
The code and data are both on GitHub. The paper has been published in Wellcome Open Research and has been replicated by multiple other authors.
The method is trained on the data that were available, but it is meant to be re-trainable as soon as new data are published. It would be great to be really sure that even someone else will be able to do it. In case we receive any feedback, we would be really happy to improve our Github repository so as to make the reproduction easier!
This paper presents a fine example of high-throughput computational materials screening studies, mainly focusing on the carbon nanoclusters of different sizes. In the paper, a set of diverse empirical and machine-learned interatomic potentials, which are commonly used to simulate carbonaceous materials, is benchmarked against the higher-level density functional theory (DFT) data, using a range of diverse structural features as the comparison criteria. Trying to reproduce the data presented here (even if you only consider a subset of the interaction potentials) will help you devise an understanding as to how you could approach a high-throughput structure prediction problem. Even though we concentrate here on isolated/finite nanoclusters, AIRSS (and other similar approaches like USPEX, CALYPSO, GMIN, etc.,) can also be used to predict crystal structures of different class of materials with applications in energy storage, catalysis, hydrogen storage, and so on.
Metadata annotation is key to reproducibility in sequencing experiments. Reproducing this research using the scripts provided will also show the current level of annotation in years since 2015 when the paper was published.
I suggested a few papers last year. I’m hoping that we’ve improved our reproducibility with this one, this year. We’ve done our best to package it up both in Docker and as an R package. I’d be curious to know what the best way to reproduce it is found to be. Working through vignettes or spinning up a Docker instance. Which is the preferred method?
It uses the drake R package that should make reproducibility of R projects much easier (just run make.R and you're done). However, it does depend on very specific package versions, which are provided by the accompanying docker image.
This paper is reproduced weekly in a docker container on continuous integration, but it is also set up to work via local installs as well. It would be interesting to see if it's reproducible with a human operator who knows nothing of the project or toolchain.