Visual Data-Analytics of Large-Scale Parallel Discrete-Event Simulations

Parallel discrete-event simulation (PDES) is an important tool in the codesign of extreme-scale systems because PDES provides a cost-effective way to evaluate designs of high-performance computing systems. Optimistic synchronization algorithms for PDES, such as Time Warp, allow events to be processed without global synchronization among the processing elements. A rollback mechanism is provided when events are processed out of timestamp order. Although optimistic synchronization protocols enable the scalability of large-scale PDES, the performance of the simulations must be tuned to reduce the number of rollbacks and provide an improved simulation runtime. To enable efficient large-scale optimistic simulations, one has to gain insight into the factors that affect the rollback behavior and simulation performance. We developed a tool for ROSS model developers that gives them detailed metrics on the performance of their large-scale optimistic simulations at varying levels of simulation granularity. Model developers can use this information for parameter tuning of optimistic simulations in order to achieve better runtime and fewer rollbacks. In this work, we instrument the ROSS optimistic PDES framework to gather detailed statistics about the simulation engine. We have also developed an interactive visualization interface that uses the data collected by the ROSS instrumentation to understand the underlying behavior of the simulation engine. The interface connects real time to virtual time in the simulation and provides the ability to view simulation data at different granularities. We demonstrate the usefulness of our framework by performing a visual analysis of the dragonfly network topology model provided by the CODES simulation framework built on top of ROSS. The instrumentation needs to minimize overhead in order to accurately collect data about the simulation performance. To ensure that the instrumentation does not introduce unnecessary overhead, we perform a scaling study that compares instrumented ROSS simulations with their noninstrumented counterparts in order to determine the amount of perturbation when running at different simulation scales.

[1]  William J. Dally,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[2]  Bernd Mohr,et al.  Automatic performance analysis of hybrid MPI/OpenMP applications , 2003, Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings..

[3]  Robert B. Ross,et al.  Enabling Parallel Simulation of Large-Scale HPC Network Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.

[4]  Laxmikant V. Kale,et al.  Performance Visualization and Analysis of Parallel Discrete Event Simulations with Projections , 2005 .

[5]  Laxmikant V. Kalé,et al.  Scaling applications to massively parallel machines using Projections performance analysis tool , 2006, Future Gener. Comput. Syst..

[6]  Christopher D. Carothers,et al.  ROSS: a high-performance, low memory, modular time warp system , 2000, PADS '00.

[7]  Robert B. Ross,et al.  A case study in using massively parallel simulation for extreme-scale torus network codesign , 2014, SIGSIM PADS '14.

[8]  Robert Latham,et al.  Techniques for modeling large-scale HPC I/O workloads , 2015, PMBS '15.

[9]  Stephen A. Jarvis,et al.  WARPP: a toolkit for simulating high-performance parallel scientific codes , 2009, SimuTools.

[10]  Frank Mueller,et al.  Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs? , 2011, PERV.

[11]  Jun Wang,et al.  Manifold: A parallel simulation framework for multicore systems , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[12]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[13]  Peter M. A. Sloot,et al.  Parallel Discrete Event Simulation Performance Modeling and Evaluation , 1995 .

[14]  Adelinde M. Uhrmacher,et al.  A Simulation Approach to Facilitate Parallel and Distributed Discrete-Event Simulator Development , 2006, 2006 Tenth IEEE International Symposium on Distributed Simulation and Real-Time Applications.

[15]  Robert B. Ross,et al.  Modeling a Million-Node Slim Fly Network Using Parallel Discrete-Event Simulation , 2016, SIGSIM-PADS.

[16]  Wolfgang E. Nagel,et al.  VAMPIR: Visualization and Analysis of MPI Resources , 2010 .

[17]  Bruce Jacob,et al.  The structural simulation toolkit , 2006, PERV.

[18]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[19]  Bernd Hamann,et al.  State of the Art of Performance Visualization , 2014, EuroVis.

[20]  Friedemann Mattern,et al.  Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation , 1993, J. Parallel Distributed Comput..

[21]  Christopher D. Carothers,et al.  On deciding between conservative and optimistic approaches on massively parallel platforms , 2010, Proceedings of the 2010 Winter Simulation Conference.

[22]  Christopher D. Carothers,et al.  Analysis of time warp on a 32,768 processor ibm blue Gene/L supercomputer , 2008 .

[23]  William Gropp,et al.  Toward Scalable Performance Visualization with Jumpshot , 1999, Int. J. High Perform. Comput. Appl..

[24]  Christopher D. Carothers,et al.  Visualizing parallel simulations in network computing environments: a case study , 1997, WSC '97.

[25]  Bernd Mohr,et al.  The Scalasca performance toolset architecture , 2010, Concurr. Comput. Pract. Exp..

[26]  Robert B. Ross,et al.  CODES: Enabling Co-Design of Multi-Layer Exascale Storage Architectures , 2011 .

[27]  Christopher D. Carothers,et al.  Scalable Time Warp on Blue Gene Supercomputers , 2009, 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation.

[28]  Robert B. Ross,et al.  Modeling a Million-Node Dragonfly Network Using Massively Parallel Discrete-Event Simulation , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[29]  Christopher D. Carothers,et al.  Efficient optimistic parallel simulations using reverse computation , 1999, Workshop on Parallel and Distributed Simulation.

[30]  Bernd Mohr,et al.  Automatic Performance Analysis of MPI Applications Based on Event Traces , 2000, Euro-Par.

[31]  Christopher D. Carothers,et al.  Warp speed: executing time warp on 1,966,080 cores , 2013, SIGSIM-PADS.

[32]  Bernd Hamann,et al.  Combing the Communication Hairball: Visualizing Parallel Execution Traces using Logical Time , 2014, IEEE Transactions on Visualization and Computer Graphics.