I used a lot of different tools and strategies to make this paper easily reproducible at different levels. There's Docker container for the highest level of reproducibility, and package versions are managed with renv. The data used in the paper is hosted on Zenodo to avoid long queue times when downloading from the Climate Data Store and future-proof for when it goes away and checksumed before using it.
Popular descriptors for machine learning potentials such as the Behler-Parinello atom centred symmetry functions (ACSF) or the Smooth Overlap of Interatomic Potentials (SOAP) are widely used but so far not much attention has been paid to optimising how many descriptor components need to be included to give good results.