The code and data are both on GitHub. The paper has been published in Wellcome Open Research and has been replicated by multiple other authors.
Popular descriptors for machine learning potentials such as the Behler-Parinello atom centred symmetry functions (ACSF) or the Smooth Overlap of Interatomic Potentials (SOAP) are widely used but so far not much attention has been paid to optimising how many descriptor components need to be included to give good results.
Metadata annotation is key to reproducibility in sequencing experiments. Reproducing this research using the scripts provided will also show the current level of annotation in years since 2015 when the paper was published.
Paper and codes+data have been published 4 years ago, will they still work? I always try to release data and codes to reproduce my papers, but I seldom receive feedback. It would be useful to have comments from a reproducers' team, in order to improve sharing for future research (I switched from MATLAB to Python already).
I tried hard to make this paper as reproducible as possible, but as techniques and dependencies become more complex, it is hard to make it 100% clear. Any form of feedback is more than welcome.