We invested a lot of work to make the analyses from the paper reproducible and we are very curious how the documentation could be improved and if people run into any problems.
I used a lot of different tools and strategies to make this paper easily reproducible at different levels. There's Docker container for the highest level of reproducibility, and package versions are managed with renv. The data used in the paper is hosted on Zenodo to avoid long queue times when downloading from the Climate Data Store and future-proof for when it goes away and checksumed before using it.
The direct numerical simulations (DNS) for this paper were conducted using Basilisk (http://basilisk.fr/). As Basilisk is a free software program written in C, it can be readily installed on any Linux machine, and it should be straightforward to then run the driver code to re-produce the DNS from this paper. Given this, the numerical solutions presented in this paper are a result of many high-fidelity simulations, which each took approximately 24 CPU hours running between 4 to 8 cores. Hence the difficulty in reproducing the results should mainly be in the amount of computational resources it would take, so HPC resources will be required. The DNS in this paper were used to validate the presented analytical solutions, as well as extend the results to a longer timescale. Reproducing these numerical results will build confidence in these results, ensuring that they are independent of the system architecture they were produced on.
This paper presents a fine example of high-throughput computational materials screening studies, mainly focusing on the carbon nanoclusters of different sizes. In the paper, a set of diverse empirical and machine-learned interatomic potentials, which are commonly used to simulate carbonaceous materials, is benchmarked against the higher-level density functional theory (DFT) data, using a range of diverse structural features as the comparison criteria. Trying to reproduce the data presented here (even if you only consider a subset of the interaction potentials) will help you devise an understanding as to how you could approach a high-throughput structure prediction problem. Even though we concentrate here on isolated/finite nanoclusters, AIRSS (and other similar approaches like USPEX, CALYPSO, GMIN, etc.,) can also be used to predict crystal structures of different class of materials with applications in energy storage, catalysis, hydrogen storage, and so on.
I suggested a few papers last year. I’m hoping that we’ve improved our reproducibility with this one, this year. We’ve done our best to package it up both in Docker and as an R package. I’d be curious to know what the best way to reproduce it is found to be. Working through vignettes or spinning up a Docker instance. Which is the preferred method?
This paper shows a fun and interesting simulation result. I find it (of course) very important that our results are reproducible. In this paper, however, we did not include the exact code for these specific simulations, but the results should be reproducible using the code of our previous paper in PLOS Computational Biology (Van Oers, Rens et al. https://doi.org/10.1371/journal.pcbi.1003774). I am genuinely curious to see if there is sufficient information for the Biophys J paper or if we should have done better. Other people have already successfully built upon the 2014 (PLOS) paper using our code; see e.g., https://journals.aps.org/pre/abstract/10.1103/PhysRevE.97.012408 and https://doi.org/10.1101/701037).
It uses the drake R package that should make reproducibility of R projects much easier (just run make.R and you're done). However, it does depend on very specific package versions, which are provided by the accompanying docker image.
This paper is reproduced weekly in a docker container on continuous integration, but it is also set up to work via local installs as well. It would be interesting to see if it's reproducible with a human operator who knows nothing of the project or toolchain.