HPC ReproHack @ Warwick

HPC ReproHack @ Warwick

      March 21, 2022, 10 a.m. - March 31, 2022, 1 p.m. (Europe/London)

   PS1.28 Physical Sciences Building, University of Warwick, Coventry, CV4 7AL, United Kingdom

Hosted by: University of Warwick


Event submitted by: annakrystalli on Jan. 5, 2022, 2:25 p.m.

Event Description

Powerful computational tools and methods are becoming ubiquitous in academic research. However, with this increase in computational power and complexity comes increased responsibility to ensure robustness and reliability of research outputs. Reproducibility, the ability to reproduce reported results from their underlying data, computer code and reported methodology, is the minimum requirement for assessing such robustness.

To promote reproducibility and provide opportunity for researchers to engage with it in practice we’ve developed ReproHacks, one-day reproducibility hackathons where participants attempt to reproduce research from published code and data, usually on their own laptop. It is also an opportunity for researchers to help others learn from their work by submitting their papers, code and data for reproduction and review. However, the traditional format, while appropriate for the time budget and level of interest of most researchers, excludes the examination of the reproducibility of computationally intensive research.

As such, with support from the EPSRC, we have partnered with the University of Warwick to develop the first High Performance Computing ReproHack event format!. The aim of this extended event is for participants to reproduce computationally intensive published research from associated code and data on the Sulis Tier 2 HPC system and feedback their experiences to authors as well as the group of participants. In addition to practical experience of general research reproducibility, participants will gain a better understanding of the particulars of reproducible computational environments on HPC systems. The event also provides an opportunity to explore the reproducibility of computationally intensive research.

While the current event will be piloted with University of Warwick CDT students, we aim for the format developed and materials produced to form a prototype for future HPC ReproHack events.

Paper Submissions

We invite authors who would like feedback on the reproducibility of computationally intensive research to submit details of their paper, code and data for reproduction and review. You can submit your paper on the ReproHack Hub. To ensure your paper is associated with this event, please make sure to associate it with it during submission. If you would like to make your work available for future HPC ReproHacks, we recommend including an HPC tag. Please see our Author Guidelines for more information.

Event Format

To accommodate the potential additional involvement, support and execution time required to complete the challenge, the event will run over 11 days, beginning with a launch session on the 21st of March 2022 and ending with a closing event on the 31st March 2022, with drop in support sessions held in between.

Launch Day

During the launch we will have a welcome and introduction to the event followed by a morning of talks and training sessions aiming to help prepare participants for reproducing. In the afternoon, participants will form teams around the papers they wish to tackle and begin attempting to reproduce the work on the Sulis HPC system with support from academic mentors and Research Software Engineers (RSEs).

Drop In Support Sessions

While participants will be free to work on their papers in their own time over the next 10 days, four 2 hr drop in sessions will also be held where participants will be able to get support if they are stuck with any aspect of the challenge from their CDT mentors and RSEs.

Closing Event

At the closing celebratory event, participants will regroup to present their experiences and share lessons learnt. Lunch will also be provided. The closing event will be held at Space 30 of the Radcliffe Conference Centre.

Directions: From the main entrance follow the corridor to the left and follow signs for Space 30 (Map : Space 30 shows up as MR9)

Folk are welcome to arrive early and help themselves to coffee and nibbles in the main lounge, but we’ll also have our own coffee area immediately adjacent to the conference room.

Agenda

Date Time Event
21st March 10:00 - 10:15 Anna Krystalli: Welcome, Introduction to event.
10:15 - 10:40 Martin Callaghan: HPC in the age of Data Science
10:40 - 11:30 Twin Karmakharm: Reproducibility on HPC
11:30 - 11:45 COFFEE
11:45 - 12:10 Heather Ratcliffe: Introduction to Sulis
12:10 - 12:35 Mike Croucher: Reproducible MATLAB on HPC
12:35 - 12:55 Mozhgan Kabiri Chimeh: NVIDIA AI & HPC Containers
13:00 - 14:00 LUNCH
14:00 - 14:20 Anna Krystalli: Introduction to ReproHacking
14:20 - 17:00 Initial ReproHack Session
22nd March 14:00 - 16:00 Drop In support session
24th March 14:00 - 16:00 Drop In support session
28th March 14:00 - 16:00 Drop In support session
30th March 14:00 - 16:00 Drop In support session
31st March 10:00 - 13:00 Closing Event

Associated papers

  • PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals

    Authors: Henry Charlesworth and Giovanni Montana
    DOI:  None
    Submitted by gmontana74      
      Mean reproducibility score:   10.0/10   |   Number of reviews:   1
    Why should we attempt to reproduce this paper?

    This paper proposes a probabilistic planner that can solve goal-conditional tasks such as complex continuous control problems. The approach reaches state-of-the-art performance when compared to current deep reinforcement learning algorithms. However, the method relies on an ensemble of deep generative models and is computationally intensive. It would be interesting to reproduce the results presented in this paper on their robotic manipulation and navigation problems as these are very challenging problems that current reinforcement learning methods cannot easily solve (and when they do, they require a significantly larger number of experiences). Can the results be reproduced out-of-the-box with the provided code?

  • Highly efficient conversion of laser energy to hard X-rays in high intensity laser-solid simulations

    Authors: S. Morris, A. Robinson, C. Ridgers
    DOI:  10.1063/5.0055398
    Submitted by Stuart_Morris      

    Why should we attempt to reproduce this paper?

    There are many applications to multi-MeV X-rays. Their penetrative properties make them good for scanning dense objects for industry, and their ionising properties can destroy tumours in radiotherapy. They are also around the energy of nuclear transitions, so they can trigger nuclear reactions to break down nuclear waste into medical isotopes, or to reveal smuggled nuclear-materials for port security. Laser-driven X-ray generation offers a compact and efficient way to create a bright source of X-rays, without having to construct a large synchrotron. To fully utilise this capability, work on optimising the target design and understanding the underlying X-ray mechanisms are essential. The hybrid-PIC code is in a unique position to model the full interaction, so its ease-of-use and reproducibility are crucial for this field to develop.

  • Thermodynamics of stacking disorder in ice nuclei

    Authors: David Quigley
    DOI:  10.1063/1.4896376
    Submitted by dquigley      
      Mean reproducibility score:   3.0/10   |   Number of reviews:   1
    Why should we attempt to reproduce this paper?

    The results of this paper have been used in multiple subsequent studies as a benchmark against which other methods of performing the same calculation have been tested. Other groups have challenged the results as suffering from finite size effects, in particular the calculations on mixtures of cubic and hexagonal ice. Should there be time during in the event, participants could check this by performing calculations on larger unit cells. Each individual calculation should converge adequately within 96 hours making it amenable to a HPC ReproHack. Given modern HPC hardware many such calculations could be run concurrently on a single HPC node.

  • New Insight into the Stability of CaCO3 Surfaces and Nanoparticles via Molecular Simulation

    Authors: A. Matthew Bano, P. Mark Rodger, and David Quigley
    DOI:  10.1021/la501409j
    Submitted by dquigley      

    Why should we attempt to reproduce this paper?

    The negative surface enthalpies in figure 5 are surprising. At least one group has attempted to reproduce these using a different code and obtained positive enthalpies. This was attributed to the inability of that code to independently relax the three simulation cell vectors resulting in an unphysical water density. This demonstrates how sensitive these results can be to the particular implementation of simulation algorithms in different codes. Similarly the force field used is now very popular. Its functional form and full set of parameters can be found in the literature. However differences in how different simulation codes implement truncation, electrostatics etc can lead to significant difference in results such as these. It would be a valuable exercise to establish if exactly the same force field as that used here can be reproduced from only its specification in the literature. The interfacial energies of interest should be reproducible with simulations on modest numbers of processors (a few dozen) with run times for each being 1-2 days. Each surface is an independent calculation and so these can be run concurrently during the ReproHack.

  • Encapsulated Nanowires: Boosting Electronic Transport in Carbon Nanotubes

    Authors: Andrij Vasylenko, Jamie Wynn, Paulo Medeiros, Andrew J Morris, Jeremy Sloan, David Quigley
    DOI:  10.1103/PhysRevB.95.121408
    Submitted by dquigley      
      Mean reproducibility score:   5.0/10   |   Number of reviews:   2
    Why should we attempt to reproduce this paper?

    DFT calculations are in principle reproducible between different codes, but differences can arise due to poor choice of convergence tolerances, inappropriate use of pseudopotentials and other numerical considerations. An independent validation of the key quantities needed to compute electrical conductivity would be valuable. In this case we have published our input files for calculating the four quantities needed to parametrise the transport simulations from which we compute the electrical conductivity. These are specifically electronic band structure, phonon dispersions, electron-phonon coupling constants and third derivatives of the force constants. Each in turn in more sensitive to convergence tolerances than the last, and it is the final quantity on which the conclusions of the paper critically depend. Reference output data is provided for comparison at the data URL below. We note that the pristine CNT results (dark red line) in figure 3 are an independent reproduction of earlier work and so we are confident the Boltzmann transport simulations are reproducible. The calculated inputs to these from DFT (in the case of Be encapsulation) have not been independently reproduced to our knowledge.

  • Sensitivity and dimensionality of atomic environment representations used for machine learning interatomic potentials

    Authors: Berk Onat, Christoph Ortner and James Kermode
    DOI:  10.1063/5.0016005
    Submitted by jameskermode      

    Why should we attempt to reproduce this paper?

    Popular descriptors for machine learning potentials such as the Behler-Parinello atom centred symmetry functions (ACSF) or the Smooth Overlap of Interatomic Potentials (SOAP) are widely used but so far not much attention has been paid to optimising how many descriptor components need to be included to give good results.

  • Synergistic coupling in ab initio-machine learning simulations of dislocations

    Authors: Petr Grigorev, Alexandra M. Goryaeva, Mihai-Cosmin Marinica, James R. Kermode, Thomas D. Swinburnea
    DOI:  https://arxiv.org/abs/2111.11262
    Submitted by jameskermode      

    Why should we attempt to reproduce this paper?

    Systematically improvable machine learning potentials could have a significant impact on the range of properties that can be modelled, but the toolchain associated with using them presents a barrier to entry for new users. Attempting to reproduce some of our results will help us improve the accessibility of the approach.

  • Molecular Dynamics of Solids at Constant Pressure and Stress Using Anisotropic Stochastic Cell Rescaling

    Authors: Vittorio Del Tatto, Paolo Raiteri, Mattia Bernetti, Giovanni Bussi
    DOI:  10.3390/app12031139
    Submitted by giovannibussi      

    Why should we attempt to reproduce this paper?

    We do care about reproducibility. In case we receive any feedback, we would be really happy to improve our Github repository so as to make the reproduction easier!

  • Automatic learning of hydrogen-bond fixes in an AMBER RNA force field

    Authors: Thorben Fröhlking, Vojtěch Mlýnský, Michal Janeček, Petra Kührová, Miroslav Krepl, Pavel Banáš, Jiří Šponer, Giovanni Bussi
    DOI:  None
    Submitted by giovannibussi      

    Why should we attempt to reproduce this paper?

    We do care about reproducibility. In case we receive any feedback, we would be really happy to improve our Github repository and/or submitted manuscript so as to make the reproduction easier!

  • Accelerating the prediction of large carbon clusters via structure search: Evaluation of machine-learning and classical potentials

    Authors: Bora Karasulu, Jean-Marc Leyssale, Patrick Rowe, Cedric Weber, Carla de Tomas
    DOI:  10.1016/j.carbon.2022.01.031
    Submitted by bkarasulu    
      Mean reproducibility score:   2.0/10   |   Number of reviews:   1
    Why should we attempt to reproduce this paper?

    This paper presents a fine example of high-throughput computational materials screening studies, mainly focusing on the carbon nanoclusters of different sizes. In the paper, a set of diverse empirical and machine-learned interatomic potentials, which are commonly used to simulate carbonaceous materials, is benchmarked against the higher-level density functional theory (DFT) data, using a range of diverse structural features as the comparison criteria. Trying to reproduce the data presented here (even if you only consider a subset of the interaction potentials) will help you devise an understanding as to how you could approach a high-throughput structure prediction problem. Even though we concentrate here on isolated/finite nanoclusters, AIRSS (and other similar approaches like USPEX, CALYPSO, GMIN, etc.,) can also be used to predict crystal structures of different class of materials with applications in energy storage, catalysis, hydrogen storage, and so on.

  • Machine learning a model for RNA structure prediction

    Authors: Nicola Calonaci, Alisha Jones, Francesca Cuturello, Michael Sattler, Giovanni Bussi
    DOI:  10.1093/nargab/lqaa090
    Submitted by giovannibussi      

    Why should we attempt to reproduce this paper?

    The method is trained on the data that were available, but it is meant to be re-trainable as soon as new data are published. It would be great to be really sure that even someone else will be able to do it. In case we receive any feedback, we would be really happy to improve our Github repository so as to make the reproduction easier!

  • Droplet impact onto a spring-supported plate: analysis and simulations

    Authors: Michael J. Negus, Matthew R. Moore, James M. Oliver, Radu Cimpeanu
    DOI:  https://doi.org/10.1007/s10665-021-10107-5
    Submitted by MNegus      
      Mean reproducibility score:   8.0/10   |   Number of reviews:   1
    Why should we attempt to reproduce this paper?

    The direct numerical simulations (DNS) for this paper were conducted using Basilisk (http://basilisk.fr/). As Basilisk is a free software program written in C, it can be readily installed on any Linux machine, and it should be straightforward to then run the driver code to re-produce the DNS from this paper. Given this, the numerical solutions presented in this paper are a result of many high-fidelity simulations, which each took approximately 24 CPU hours running between 4 to 8 cores. Hence the difficulty in reproducing the results should mainly be in the amount of computational resources it would take, so HPC resources will be required. The DNS in this paper were used to validate the presented analytical solutions, as well as extend the results to a longer timescale. Reproducing these numerical results will build confidence in these results, ensuring that they are independent of the system architecture they were produced on.

  • Near-100 MeV protons via a laser-driven transparency-enhanced hybrid acceleration scheme

    Authors: A. Higginson, R. J. Gray, M. King, R. J. Dance, S. D. R. Williamson, N. M. H. Butler, R. Wilson, R. Capdessus, C. Armstrong, J. S. Green, S. J. Hawkes, P. Martin, W. Q. Wei, S. R. Mirfayzi, X. H. Yuan, S. Kar, M. Borghesi, R. J. Clarke, D. Neely & P. McKenna
    DOI:  https://doi.org/10.1038/s41467-018-03063-9
    Submitted by mking  

    Why should we attempt to reproduce this paper?

    This work is well cited in the field of laser driven ion acceleration and provides a good study on the interaction of multiple laser driven ion acceleration mechanisms along with the impact of relativistic induced transparency.