Preparation and optimization of a diverse workload for a large-scale heterogeneous system

Productivity from day one on supercomputers that leverage new technologies requires significant preparation. An institution that procures a novel system architecture often lacks sufficient institutional knowledge and skills to prepare for it. Thus, the "Center of Excellence" (CoE) concept has emerged to prepare for systems such as Summit and Sierra, currently the top two systems in the Top 500. This paper documents CoE experiences that prepared a workload of diverse applications and math libraries for a heterogeneous system. We describe our approach to this preparation, including our management and execution strategies, and detail our experiences with and reasons for using different programming approaches. Our early science and performance results show that the project enabled significant early seismic science with up to a l4X throughput increase over Cori. In addition to our successes, we discuss our challenges and failures so others may benefit from our experience.

Bronis R. de Supinski | John A. Gunnels | Yoonho Park | Matthew P. LeGendre | Todd Gamblin | Xinyu Que | Peter D. Barnes | Guojing Cong | Robert D. Falgout | Ulrike Meier Yang | D. A. Beckingsale | I-Feng W. Kuo | Peng Wang | Kevin O'Brien | Kathryn M. O'Brien | Tong Chen | José R. Brunheroto | Brian Van Essen | Giacomo Domeniconi | Ruipeng Li | Claudia Misale | Thomas Epperly | Carol S. Woodward | Jaime H. Moreno | Lu Wang | Daniel A. White | Martin Schulz | Shelby Lockhart | Sara Kokkila Schumacher | Xiaohua Zhang | Shiv Sundram | James N. Glosli | Carlos H. A. Costa | Naoya Maruyama | Pei-Hung Lin | Ian Karlin | David F. Richards | Eun Kyung Lee | James M. Brase | James C. Sexton | Tzanio V. Kolev | David Böhme | Steve H. Langer | Chris Ward | Bert Still | David J. Gardner | Aaron Fisher | Levi Barnes | Slaven Peles | Bob Anderson | Kathleen Shoga | Christopher Young | Phil Regier | Jamie A. Bramwell | Johann Dahm | Alexey Voronin | Barry Chen | Ramesh Pankajakshan | Björn Sjögreen | Max Katz | Hui-Fang Wen | Jarom Nelson | David Appelhans | Roger Pearce | Bob Walkup | Sorin Bastea | Jonathan Wong | Rob Neely | Robert Blake | Hai Le | Jamie A. Bramwell | Charway R. Cooper | Tony Degroot | Kathleen McCandless | Rao Nimmakayala | Steve Rennich | Howard Scott | Guillaume Thomas-Collignon | Cyril Zeller | Edward Zywicz | Barry Y. Chen | D. White | U. Yang | B. V. Essen | Guojing Cong | R. Pearce | B. Supinski | N. Maruyama | C. Zeller | Ruipeng Li | J. Sexton | B. Sjögreen | M. Schulz | R. Falgout | T. Epperly | J. Brunheroto | P. Barnes | T. Kolev | R. Neely | D. Beckingsale | C. Woodward | A. Fisher | T. Gamblin | M. LeGendre | J. Brase | Yoonho Park | J. Moreno | Xiaohua Zhang | E. Zywicz | K. McCandless | Lu Wang | J. Glosli | R. Pankajakshan | Tong Chen | K. O'Brien | S. Rennich | Xinyu Que | S. Langer | I. Karlin | Claudia Misale | D. J. Gardner | S. Peles | H. Wen | S. Bastea | D. Richards | S. K. Schumacher | D. Appelhans | Peng Wang | B. Still | R. Blake | Johann Dahm | Giacomo Domeniconi | H. Le | Eun Kyung Lee | J. Wong | B. Anderson | L. Barnes | David Böhme | T. Degroot | Max Katz | I. Kuo | Pei-Hung Lin | S. Lockhart | Jarom Nelson | R. Nimmakayala | Phillip Regier | Howard Scott | Kathleen Shoga | S. Sundram | G. Thomas-Collignon | Alexey Voronin | B. Walkup | Chris Ward | Christopher Young

[1]  Howard A. Scott,et al.  Cretin—a radiative transfer capability for laboratory plasmas , 2001 .

[2]  William A. Arbaugh,et al.  Copilot - a Coprocessor-based Kernel Runtime Integrity Monitor , 2004, USENIX Security Symposium.

[3]  Carol S. Woodward,et al.  Enabling New Flexibility in the SUNDIALS Suite of Nonlinear and Differential/Algebraic Equation Solvers , 2020, ACM Trans. Math. Softw..

[4]  John A. Gunnels,et al.  100 + TFlop Solidification Simulations on BlueGene / L , 2005 .

[5]  James N Glosli,et al.  Beyond finite-size scaling in solidification simulations. , 2006, Physical review letters.

[6]  D. Tieleman,et al.  The MARTINI force field: coarse grained model for biomolecular simulations. , 2007, The journal of physical chemistry. B.

[7]  Robert D. Falgout,et al.  Scaling Hypre's Multigrid Solvers to 100, 000 Cores , 2011, High-Performance Scientific Computing.

[8]  Viatcheslav Gurev,et al.  Toward real-time modeling of human heart ventricles at cellular resolution: Simulation of drug-induced arrhythmias , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[10]  Berk Hess,et al.  GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers , 2015 .

[11]  Maya Gokhale,et al.  Towards Scalable Graph Analytics on Time Dependent Graphs , 2015 .

[12]  Yijun Huang,et al.  Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.

[13]  Ray W. Grout,et al.  Accelerated application development: The ORNL Titan experience , 2015, Comput. Electr. Eng..

[14]  Yann LeCun,et al.  Deep learning with Elastic Averaging SGD , 2014, NIPS.

[15]  R. A. Sacks,et al.  The virtual beamline (VBL) laser simulation code , 2015, Photonics West - Lasers and Applications in Science and Engineering.

[16]  Kostas Katrinis,et al.  Towards Memory-Optimized Data Shuffling Patterns for Big Data Analytics , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[17]  Ulrike Meier Yang,et al.  Scalability of Classical Algebraic Multigrid for Elasticity to Half a Million Parallel Tasks , 2016, Software for Exascale Computing.

[18]  Bronis R. de Supinski,et al.  Early Experiences Porting Three Applications to OpenMP 4.5 , 2016, IWOMP.

[19]  Victor Lee,et al.  The Trinity Center of Excellence Co-Design Best Practices , 2017, Computing in Science & Engineering.

[20]  Kostas Katrinis,et al.  Leveraging Adaptive I/O to Optimize Collective Data Shuffling Patterns for Big Data Analytics , 2017, IEEE Transactions on Parallel and Distributed Systems.

[21]  N. Anders Petersson,et al.  Toward Exascale Earthquake Ground Motion Simulations for Near-Fault Engineering Analysis , 2017, Computing in Science & Engineering.

[22]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Bronis R. de Supinski,et al.  Application Modernization at LLNL and the Sierra Center of Excellence , 2017, Computing in Science & Engineering.

[24]  Hal Finkel,et al.  FY18 Proxy App Suite Release. Milestone Report for the ECP Proxy App Project , 2018 .

[25]  Fan Zhou,et al.  On the convergence properties of a K-step averaging stochastic gradient descent algorithm for nonconvex optimization , 2017, IJCAI.

[26]  Bronis R. de Supinski,et al.  The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[27]  Marc Snir,et al.  Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[28]  Chih-Chieh Yang,et al.  Video Action Recognition With an Additional End-to-End Trained Temporal Stream , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[29]  Peer-Timo Bremer,et al.  A massively parallel infrastructure for adaptive multiscale simulations: modeling RAS initiation pathway for cancer , 2019, SC.