Characterizing Parallel Scientific Applications on Commodity Clusters: An Empirical Study of a Tapered Fat-Tree

Understanding the characteristics and requirements of applications that run on commodity clusters is key to properly configuring current machines and, more importantly, procuring future systems effectively. There are only a few studies, however, that are current and characterize realistic workloads. For HPC practitioners and researchers, this limits our ability to design solutions that will have an impact on real systems. We present a systematic study that characterizes applications with an emphasis on communication requirements. It includes cluster utilization data, identifying a representative set of applications from a U.S. Department of Energy laboratory, and characterizing their communication requirements. The driver for this work is understanding application sensitivity to a tapered fat-tree network. These results provided key insights into the procurement of our next generation commodity systems. We believe this investigation can provide valuable input to the HPC community in terms of workload characterization and requirements from a large supercomputing center.

[1]  Courtenay T. Vaughan,et al.  Application Performance on the Tri-Lab Linux Capacity Cluster - TLCC , 2010, Int. J. Distributed Syst. Technol..

[2]  T Moody Adam,et al.  System Noise Revisited: Enabling Application Scalability and Reproducibility with SMT , 2016 .

[3]  Scott D. Pakin,et al.  Performance Comparison of Luna and Typhoon , 2012 .

[4]  J R Neely,et al.  Performance of ALE3D on the ASCI machines , 1999 .

[5]  Daniel E. Laney,et al.  Large-eddy simulation of Rayleigh–Taylor instability , 1997 .

[6]  Sadaf R. Alam,et al.  An Analysis of System Balance Requirements for Scientific Applications , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[7]  Leonid Oliker,et al.  Scientific Application Performance on Candidate PetaScale Platforms , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[8]  David L. Cotrell,et al.  Scaling the incompressible Richtmyer-Meshkov instability , 2007 .

[9]  Laxmikant V. Kalé,et al.  Optimizing the performance of parallel applications on a 5D torus via task mapping , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[10]  Leonid Oliker,et al.  Communication Requirements and Interconnect Optimization for High-End Scientific Applications , 2007, IEEE Transactions on Parallel and Distributed Systems.

[11]  Jeffrey S. Vetter,et al.  Communication characteristics of large-scale scientific applications for contemporary cluster architectures , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[12]  P Nowak,et al.  Radiation transport calculations on unstructured grids using a spatially decomposed and threaded algorithm , 1999 .

[13]  Laxmikant V. Kalé,et al.  Understanding Application Performance via Micro-benchmarks on Three Large Supercomputers: Intrepid, Ranger and Jaguar , 2010, Int. J. High Perform. Comput. Appl..

[14]  Simon D. Hammond,et al.  Analysis of Cray XC30 Performance Using Trinity-NERSC-8 Benchmarks and Comparison with Cray XE6 and IBM BG/Q , 2013, PMBS@SC.

[15]  P. G. Eltgroth,et al.  KULL: LLNL's ASCI Inertial Confinement Fusion Simulation Code , 2000 .

[16]  Courtenay T. Vaughan,et al.  Application Sensitivity to Link and Injection Bandwidth on a Cray XT4 System , 2008 .

[17]  Avinash Sodani,et al.  Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).

[18]  P S Brantley,et al.  Recent Advances in the Mercury Monte Carlo Particle Transport Code , 2013 .

[19]  Edward I. Moses,et al.  The National Ignition Facility: Ushering in a new age for high energy density science , 2009 .

[20]  V. E. Henson,et al.  BoomerAMG: a parallel algebraic multigrid solver and preconditioner , 2002 .

[21]  Tzanio V. Kolev,et al.  High order curvilinear finite elements for elastic-plastic Lagrangian dynamics , 2014, J. Comput. Phys..

[22]  Katherine E. Isaacs,et al.  There goes the neighborhood: Performance degradation due to nearby jobs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[23]  Eli Rosenthal,et al.  Characterizing Application Sensitivity to Network Performance , 2014 .

[24]  Courtenay T. Vaughan,et al.  Unprecedented Scalability and Performance of the New NNSA Tri-Lab Linux Capacity Cluster 2 , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[25]  Fabio Checconi,et al.  Characterization of the Communication Patterns of Scientific Applications on Blue Gene/P , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[26]  G. Johnson,et al.  A Performance Comparison Through Benchmarking and Modeling of Three Leading Supercomputers: Blue Gene/L, Red Storm, and Purple , 2006, ACM/IEEE SC 2006 Conference (SC'06).