Scaling to a million cores and beyond: Using light-weight simulation to understand the challenges ahead on the road to exascale

As supercomputers scale to 1000 PFlop/s over the next decade, investigating the performance of parallel applications at scale on future architectures and the performance impact of different architecture choices for high-performance computing (HPC) hardware/software co-design is crucial. This paper summarizes recent efforts in designing and implementing a novel HPC hardware/software co-design toolkit. The presented Extreme-scale Simulator (xSim) permits running an HPC application in a controlled environment with millions of concurrent execution threads while observing its performance in a simulated extreme-scale HPC system using architectural models and virtual timing. This paper demonstrates the capabilities and usefulness of the xSim performance investigation toolkit, such as its scalability to 227 simulated Message Passing Interface (MPI) ranks on 960 real processor cores, the capability to evaluate the performance of different MPI collective communication algorithms, and the ability to evaluate the performance of a basic Monte Carlo application with different architectural parameters. Simulation of different future high-performance computing architectures at scale.Demonstrates scalability to 134,217,728 simulated MPI ranks on 960 real cores.Evaluates MPI collective communication performance on 2,097,152 simulated ranks.Estimates the performance of a Monte Carlo solver on 16,777,216 simulated ranks.

[1]  Matthias S. Müller,et al.  The Vampir Performance Analysis Tool-Set , 2008, Parallel Tools Workshop.

[2]  George Bosilca,et al.  Recovery Patterns for Iterative Methods in a Parallel Unstable Environment , 2007, SIAM J. Sci. Comput..

[3]  Christian Engelmann,et al.  Simulation of Large-Scale HPC Architectures , 2011, 2011 40th International Conference on Parallel Processing Workshops.

[4]  Laxmikant V. Kale,et al.  Programming Petascale Applications with Charm , 2007 .

[5]  Christian Engelmann,et al.  Facilitating co-design for extreme-scale systems through lightweight simulation , 2010, 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).

[6]  Jesús Labarta,et al.  Validation of Dimemas Communication Model for MPI Collective Operations , 2000, PVM/MPI.

[7]  Laxmikant V. Kalé,et al.  BigSim: a parallel simulator for performance prediction of extremely large parallel machines , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[8]  Kalyan S. Perumalla,et al.  μπ: a scalable and transparent system for simulating MPI programs , 2010, SimuTools.

[9]  Bruce Jacob,et al.  The structural simulation toolkit , 2006, PERV.

[10]  Marc Garbey,et al.  Fault tolerant algorithms for heat transfer problems , 2008, J. Parallel Distributed Comput..

[11]  Toni Cortes,et al.  PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .

[12]  Christian Engelmann INVESTIGATING OPERATING SYSTEM NOISE IN EXTREME-SCALE HIGH-PERFORMANCE COMPUTING SYSTEMS USING SIMULATION , 2013 .

[13]  Zizhong Chen,et al.  Algorithm-based checkpoint-free fault tolerance for parallel matrix computations on volatile resources , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[14]  Christian Engelmann,et al.  Super-Scalable Algorithms for Computing on 100, 000 Processors , 2005, International Conference on Computational Science.

[15]  Christian Engelmann,et al.  xSim: The extreme-scale simulator , 2011, 2011 International Conference on High Performance Computing & Simulation.