Software engineering to sustain a high-performance computing scientific application: QMCPACK

We provide an overview of the software engineering efforts and their impact in QMCPACK, a production-level ab-initio Quantum Monte Carlo open-source code targeting high-performance computing (HPC) systems. Aspects included are: (i) strategic expansion of continuous integration (CI) targeting CPUs, using GitHub Actions runners, and NVIDIA and AMD GPUs in pre-exascale systems, using self-hosted hardware; (ii) incremental reduction of memory leaks using sanitizers, (iii) incorporation of Docker containers for CI and reproducibility, and (iv) refactoring efforts to improve maintainability, testing coverage, and memory lifetime management. We quantify the value of these improvements by providing metrics to illustrate the shift towards a predictive, rather than reactive, sustainable maintenance approach. Our goal, in documenting the impact of these efforts on QMCPACK, is to contribute to the body of knowledge on the importance of research software engineering (RSE) for the sustainability of community HPC codes and scientific discovery at scale.

[1]  William F. Godoy,et al.  For the Public Good: Connecting, Retaining, and Recognizing Current and Future RSEs at U.S. National Research Laboratories and Agencies , 2022, Computing in Science & Engineering.

[2]  Pooya Rostami Mazrae,et al.  On the Use of GitHub Actions in Software Development Repositories , 2022, 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[3]  P. Kent,et al.  A High-Performance Design for Hierarchical Parallelism in the QMCPACK Monte Carlo code , 2022, 2022 IEEE/ACM International Workshop on Hierarchical Parallelism for Exascale Computing (HiPar).

[4]  William F. Godoy,et al.  Application Experiences on a GPU-Accelerated Arm-based HPC Testbed , 2022, HPC Asia Workshops.

[5]  G. R. Watson,et al.  Research Software Engineering at Oak Ridge National Laboratory , 2022, Computing in Science & Engineering.

[6]  Georgia K. Stuart,et al.  Continuous Integration for HPC with Github Actions and Tapis , 2022, PEARC.

[7]  L. McInnes,et al.  The Science of Scientific-Software Development and Use , 2022 .

[8]  Andrew R. Siegel,et al.  How community software ecosystems can unlock the potential of exascale computing , 2021, Nature Computational Science.

[9]  Nasir U. Eisty,et al.  Testing Research Software: A Case Study , 2020, ICCS.

[10]  Alessio Sclocco,et al.  Lessons Learned in a Decade of Research Software Engineering GPU Applications , 2020, ICCS.

[11]  Edgar Josué Landinez Borda,et al.  QMCPACK: Advances in the development, efficiency, and application of auxiliary field and real-space variational and diffusion quantum Monte Carlo. , 2020, The Journal of chemical physics.

[12]  Katherine Yelick,et al.  Exascale applications: skin in the game , 2020, Philosophical Transactions of the Royal Society A.

[13]  Anshu Dubey,et al.  Understanding the landscape of scientific software used on high-performance computing platforms , 2020, Int. J. High Perform. Comput. Appl..

[14]  Elaine M. Raybourn,et al.  Lightweight Software Process Improvement Using Productivity and Sustainability Improvement Planning (PSIP) , 2019, HUST/SE-HER/WIHPC@SC.

[15]  Stephen Lee,et al.  Exascale Computing in the United States , 2019, Computing in Science & Engineering.

[16]  John D. Leidel,et al.  Extreme Heterogeneity 2018 - Productive Computational Science in the Era of Extreme Heterogeneity: Report for DOE ASCR Workshop on Extreme Heterogeneity , 2018 .

[17]  Mathias Payer,et al.  How memory safety violations enable exploitation of programs , 2018, The Continuing Arms Race.

[18]  Ying Wai Li,et al.  QMCPACK: an open source ab initio quantum Monte Carlo package for the electronic structure of atoms, molecules and solids , 2018, Journal of physics. Condensed matter : an Institute of Physics journal.

[19]  Lois C. McInnes,et al.  xSDK Foundations: Toward an Extreme-scale Scientific Software Development Kit , 2017, Supercomput. Front. Innov..

[20]  Bronis R. de Supinski,et al.  The Spack package manager: bringing order to HPC software chaos , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Brad Gallagher,et al.  Evolution of FLASH, a multi-physics scientific simulation code for high-performance computing , 2014, Int. J. High Perform. Comput. Appl..

[22]  Bernd Brügge,et al.  Need of Software Engineering Methods for High Performance Computing Applications , 2012, 2012 11th International Symposium on Parallel and Distributed Computing.

[23]  Markus Schmidberger,et al.  Software Engineering as a Service for HPC , 2012, 2012 11th International Symposium on Parallel and Distributed Computing.

[24]  Derek Bruening,et al.  AddressSanitizer: A Fast Address Sanity Checker , 2012, USENIX Annual Technical Conference.

[25]  Armando Fox,et al.  Cloud Computing—What's in It for Me as a Scientist? , 2011, Science.

[26]  Jeffrey C. Carver,et al.  Understanding the High-Performance-Computing Community: A Software Engineer's Perspective , 2008, IEEE Software.

[27]  Jeffrey C. Carver,et al.  Software Development Environments for Scientific and Engineering Software: A Series of Case Studies , 2007, 29th International Conference on Software Engineering (ICSE'07).

[28]  Rubenstein,et al.  Advances in the development , efficiency , and application of auxiliary field and real-space variational and diffusion Quantum , 2020 .

[29]  Jaron T. Krogel,et al.  Nexus: A modular workflow management system for quantum simulation codes , 2016, Comput. Phys. Commun..

[30]  Ruyman Reyes,et al.  SYCL: Single-source C++ accelerator programming , 2015, PARCO.

[31]  R. Needs,et al.  Quantum Monte Carlo simulations of solids , 2001 .

[32]  Geoffrey C. Fox,et al.  A Study of Software Development for High Performance Computing , 1994 .