I hope that the evaluation framework introduced in the paper can become used by other researchers working on mutational signatures.
This paper is fully reproducible; we provide the protocol that the different modelers used, the data produced from these models, the observed data, and the code to run the analysis that led to the results of the paper, figures, and text. I have not come across any other paper in forestry that is as fully reproducible as our paper, so it might also be a rare example in this field and hopefully a motivation to others to do so. Please notice that we do not provide the models that were used to run the simulations, as these are the results used (or data collection), but we do provide the data resulting from these simulations.
The method is trained on the data that were available, but it is meant to be re-trainable as soon as new data are published. It would be great to be really sure that even someone else will be able to do it. In case we receive any feedback, we would be really happy to improve our Github repository so as to make the reproduction easier!
Even though the approach in the paper focuses on a specific measurement (clumped isotopes) and how to optimize which and how many standards we use, I hope that the problem is general enough that insight can translate to any kind of measurement that relies on machine calibration. I've committed to writing a literate program (plain text interspersed with code chunks) to explain what is going on and to make the simulations one step at a time. I really hope that this is understandable to future collaborators and scientists in my field, but I have not had any code review internally and I also didn't receive any feedback on it from the reviewers. I would love to see if what in my mind represents "reproducible code" is actually reproducible, and to learn what I can improve for future projects!