Using Discrete Event Simulation for Programming Model Exploration at Extreme-Scale: Macroscale Components for the Structural Simulation Toolkit (SST)

Discrete event simulation provides a powerful mechanism for designing and testing new extreme- scale programming models for high-performance computing. Rather than debug, run, and wait for results on an actual system, design can first iterate through a simulator. This is particularly useful when test beds cannot be used, i.e. to explore hardware or scales that do not yet exist or are inaccessible. Here we detail the macroscale components of the structural simulation toolkit (SST). Instead of depending on trace replay or state machines, the simulator is architected to execute real code on real software stacks. Our particular user-space threading framework allows massive scales to be simulated even on small clusters. The link between the discrete event core and the threading framework allows interesting performance metrics like call graphs to be collected from a simulated run. Performance analysis via simulation can thus become an important phase in extreme-scale programming model and runtime system design via the SST macroscale components.

[1]  Alan Wagner,et al.  MPI-NeTSim: A Network Simulation Module for MPI , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[2]  James Demmel,et al.  Communication-optimal parallel algorithm for strassen's matrix multiplication , 2012, SPAA '12.

[3]  Khachik Sargsyan,et al.  Validation and Uncertainty Assessment of Extreme-Scale HPC Simulation through Bayesian Inference , 2013, Euro-Par.

[4]  Amith R. Mamidala,et al.  PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[5]  Torsten Hoefler,et al.  Characterizing the Influence of System Noise on Large-Scale Applications by Simulation , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Franck Cappello,et al.  Addressing failures in exascale computing , 2014, Int. J. High Perform. Comput. Appl..

[7]  Dan Bonachea GASNet Specification, v1.1 , 2002 .

[8]  Mateo Valero,et al.  A Simulation Framework to Automatically Analyze the Communication-Computation Overlap in Scientific Applications , 2010, 2010 IEEE International Conference on Cluster Computing.

[9]  Thomas Hérault,et al.  DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[10]  K. Mani Chandy,et al.  Distributed Simulation: A Case Study in Design and Verification of Distributed Programs , 1979, IEEE Transactions on Software Engineering.

[11]  Paul F. Reynolds A spectrum of options for parallel simulation , 1988, WSC '88.

[12]  Michael A. Laurenzano,et al.  PSINS: An Open Source Event Tracer and Execution Simulator , 2009, HiPC 2009.

[13]  Richard W. Vuduc,et al.  Performance evaluation of concurrent collections on high-performance multicore computing systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[14]  Rizos Sakellariou,et al.  Compiler-Optimized Simulation of Large-Scale Applications on High Performance Architectures , 2002, J. Parallel Distributed Comput..

[15]  Guang R. Gao,et al.  ParalleX: A Study of A New Parallel Computation Model , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[16]  Graham R. Nudd,et al.  Pace—A Toolset for the Performance Prediction of Parallel and Distributed Systems , 2000, Int. J. High Perform. Comput. Appl..

[17]  Yuichi Inadomi,et al.  Performance prediction of large-scale parallell system and application using macro-level simulation , 2008, HiPC 2008.

[18]  Ali Pinar,et al.  A Simulator for Large-Scale Parallel Computer Architectures , 2010, Int. J. Distributed Syst. Technol..

[19]  Randal E. Bryant,et al.  SIMULATION OF PACKET COMMUNICATION ARCHITECTURE COMPUTER SYSTEMS , 1977 .

[20]  A. Lumsdaine,et al.  LogGOPSim: simulating large-scale applications in the LogGOPS model , 2010, HPDC '10.

[21]  David Jefferson,et al.  Fast Concurrent Simulation Using the Time Warp Mechanism. Part I. Local Control. , 1982 .

[22]  Frédéric Suter,et al.  Improving the Accuracy and Efficiency of Time-Independent Trace Replay , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[23]  Kurt B. Ferreira,et al.  A Simulation Infrastructure for Examining the Performance of Resilience Strategies at Scale , 2013 .

[24]  Alexander Aiken,et al.  Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[25]  Stephen A. Jarvis,et al.  WARPP: a toolkit for simulating high-performance parallel scientific codes , 2009, SIMUTools 2009.

[26]  Michael Laurenzano,et al.  PSINS: An Open Source Event Tracer and Execution Simulator , 2009, 2009 DoD High Performance Computing Modernization Program Users Group Conference.

[27]  Rolf Riesen,et al.  Instruction-level simulation of a cluster at scale , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[28]  Barbara M. Chapman,et al.  Introducing OpenSHMEM: SHMEM for the PGAS community , 2010, PGAS '10.

[29]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[30]  Christopher D. Carothers,et al.  Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation , 2011, 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation.

[31]  Laxmikant V. Kalé,et al.  Simulation-Based Performance Prediction for Large Parallel Machines , 2005, International Journal of Parallel Programming.

[32]  Yuichi Inadomi,et al.  Performance prediction of large-scale parallell system and application using macro-level simulation , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[33]  Jian Li,et al.  A framework for end-to-end simulation of high-performance computing systems , 2008, SimuTools.

[34]  Gengbin Zheng,et al.  A uGNI-based Asynchronous Message-driven Runtime System for Cray Supercomputers with Gemini Interconnect , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[35]  Scott Klasky,et al.  Terascale direct numerical simulations of turbulent combustion using S3D , 2008 .

[36]  Jesús Labarta,et al.  A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[37]  Thomas Heller,et al.  Application of the ParalleX execution model to stencil-based problems , 2012, Computer Science - Research and Development.

[38]  L. H. Howell,et al.  CASTRO: A NEW COMPRESSIBLE ASTROPHYSICAL SOLVER. I. HYDRODYNAMICS AND SELF-GRAVITY , 2010, 1005.0114.

[39]  Laxmikant V. Kalé,et al.  BigSim: a parallel simulator for performance prediction of extremely large parallel machines , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[40]  Laxmikant V. Kalé,et al.  Avoiding hot-spots on two-level direct networks , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[41]  R. M. Fujimoto,et al.  Parallel discrete event simulation , 1989, WSC '89.

[42]  Burkhard D. Steinmacher-Burow,et al.  The IBM Blue Gene/Q Interconnection Fabric , 2012, IEEE Micro.

[43]  K. Balakrishnan,et al.  A framework for performance modeling of SWIM , 2012, 2012 Integrated Communications, Navigation and Surveillance Conference.

[44]  Martin Berzins,et al.  Large Scale Parallel Solution of Incompressible Flow Problems Using Uintah and Hypre , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[45]  Helgi Adalsteinsson,et al.  Using simulation to design extremescale applications and architectures: programming model exploration , 2011, PERV.

[46]  K. Mani Chandy,et al.  Asynchronous distributed simulation via a sequence of parallel computations , 1981, CACM.

[47]  Christopher D. Carothers,et al.  ROSS: a high-performance, low memory, modular time warp system , 2000, PADS '00.

[48]  Sameer Kumar,et al.  Acceleration of an Asynchronous Message Driven Programming Paradigm on IBM Blue Gene/Q , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[49]  Jeremiah J. Wilke Coordination Languages and MPI Perturbation Theory: The FOX Tuple Space Framework for Resilience , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[50]  S. Ethier,et al.  Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms , 2005 .