I hope that the evaluation framework introduced in the paper can become used by other researchers working on mutational signatures.
The methods are widely applicable to other DNA sequence clustering problems. Someone may obtain contradicting results with a new algorithm. In such a case, rerunning our scripts on the same or new data may help elucidate the source of the differences between the results.
The method is trained on the data that were available, but it is meant to be re-trainable as soon as new data are published. It would be great to be really sure that even someone else will be able to do it. In case we receive any feedback, we would be really happy to improve our Github repository so as to make the reproduction easier!
Metadata annotation is key to reproducibility in sequencing experiments. Reproducing this research using the scripts provided will also show the current level of annotation in years since 2015 when the paper was published.
If all went right, the analysis should be fully reproducible without the need to make any adjustments. The paper aims to find optimal locations for new parkruns, but we were not 100% sure how 'optimal' should be defined. We provide a few examples, but the code was meant to be flexible enough to allow potential decision makers to specify their own, alternative objectives. The spatial data set is also quite interesting and fun to play around with. Cave: The full analysis takes a while to run (~30+ min) and might require >= 8gb ram.