GVT algorithms and discrete event dynamics on 129K+ processor cores

Parallel discrete event simulation (PDES) represents a class of codes that are challenging to scale to large number of processors due to tight global timestamp-ordering and fine-grained event execution. One of the critical factors in scaling PDES is the efficiency of the underlying global virtual time (GVT) algorithm needed for correctness of parallel execution and speed of progress. Although many GVT algorithms have been proposed previously, few have been proposed for scalable asynchronous execution and none customized to exploit one-sided communication. Moreover, the detailed performance effects of actual GVT algorithm implementations on large platforms are unknown. Here, three major GVT algorithms intended for scalable execution on high-performance systems are studied: (1) a synchronous GVT algorithm that affords ease of implementation, (2) an asynchronous GVT algorithm that is more complex to implement but can relieve blocking latencies, and (3) a variant of the asynchronous GVT algorithm, proposed and studied for the first time here, to exploit one-sided communication in extant supercomputing platforms. Performance results are presented of implementations of these algorithms on up to 129,024 cores of a Cray XT5 system, exercised on a range of parameters: optimistic and conservative synchronization, fine-to medium-grained event computation, synthetic and non-synthetic applications, and different lookahead values. Performance to the tune of tens of billions of events executed per second are registered, exceeding the speeds of any known PDES engine, and showing asynchronous GVT algorithms to outperform state-of-the-art synchronous GVT algorithms. Detailed PDES-specific runtime metrics are presented to further the understanding of tightly-coupled discrete event dynamics on massively parallel platforms.

[1]  R.M. Fujimoto,et al.  Parallel and distributed simulation systems , 2001, Proceeding of the 2001 Winter Simulation Conference (Cat. No.01CH37304).

[2]  Sudip K. Seal,et al.  Discrete event modeling and massively parallel execution of epidemic outbreak phenomena , 2012, Simul..

[3]  Kalyan S. Perumalla,et al.  /spl mu/sik - a micro-kernel for parallel/distributed simulation systems , 2005, Workshop on Principles of Advanced and Distributed Simulation (PADS'05).

[4]  Kalyan S. Perumalla,et al.  Switching to High Gear: Opportunities for Grand-Scale Real-Time Parallel Simulations , 2009, 2009 13th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications.

[5]  Friedemann Mattern,et al.  Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation , 1993, J. Parallel Distributed Comput..

[6]  Christopher D. Carothers,et al.  On deciding between conservative and optimistic approaches on massively parallel platforms , 2010, Proceedings of the 2010 Winter Simulation Conference.

[7]  Christopher D. Carothers,et al.  Analysis of time warp on a 32,768 processor ibm blue Gene/L supercomputer , 2008 .

[8]  Kalyan S. Perumalla,et al.  μπ: a scalable and transparent system for simulating MPI programs , 2010, SimuTools.

[9]  Christopher D. Carothers,et al.  Scalable Time Warp on Blue Gene Supercomputers , 2009, 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation.

[10]  Boleslaw K. Szymanski,et al.  DSIM: scaling time warp to 1,033 processors , 2005, Proceedings of the Winter Simulation Conference, 2005..

[11]  Keith D. Underwood,et al.  Implementation and Performance of Portals 3.3 on the Cray XT3 , 2005, 2005 IEEE International Conference on Cluster Computing.

[12]  Paul F. Reynolds,et al.  Design and Performance Analysis of Hardware Support for Parallel Simulations , 1993, J. Parallel Distributed Comput..

[13]  Richard M. Fujimoto,et al.  Virtual time synchronization over unreliable network transport , 2001, Proceedings 15th Workshop on Parallel and Distributed Simulation.

[14]  Kalyan S. Perumalla Scaling time warp-based discrete event execution to 104 processors on a Blue Gene supercomputer , 2007, CF '07.

[15]  Karsten Schwan,et al.  Supporting parallel applications on clusters of workstations: The intelligent network interface approach , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).