Discrete Event Execution with One-Sided and Two-Sided GVT Algorithms on 216,000 Processor Cores

Global Virtual Time (GVT) computation is a key determinant of the efficiency and runtime dynamics of Parallel Discrete Event Simulations (PDES), especially on large-scale parallel platforms. Here, three execution modes of a generalized GVT computation algorithm are studied on high-performance parallel computing systems: (1) a synchronous GVT algorithm that affords ease of implementation, (2) an asynchronous GVT algorithm that is more complex to implement but can relieve blocking latencies, and (3) a variant of the asynchronous GVT algorithm to exploit one-sided communication in extant supercomputing platforms. Performance results are presented of implementations of these algorithms on up to 216,000 cores of a Cray XT5 system, exercised on a range of parameters: optimistic and conservative synchronization, fine- to medium-grained event computation, synthetic and nonsynthetic applications, and different lookahead values. Detailed PDES-specific runtime metrics are presented to further the understanding of tightly coupled discrete event dynamics on massively parallel platforms.

[1]  Ronald C. de Vries,et al.  Reducing Null Messages in Misra's Distributed Discrete Event Simulation Method , 1990, IEEE Trans. Software Eng..

[2]  Murat Yuksel,et al.  Seven-O'Clock: a new distributed GVT algorithm using network atomic operations , 2005, Workshop on Principles of Advanced and Distributed Simulation (PADS'05).

[3]  Charles L. Seitz,et al.  Variants of the Chandy-Misra-Bryant Distributed Discrete-Event Simulation Algorithm , 1988 .

[4]  K. Mani Chandy,et al.  Asynchronous distributed simulation via a sequence of parallel computations , 1981, CACM.

[5]  Christopher D. Carothers,et al.  Scalable Time Warp on Blue Gene Supercomputers , 2009, 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation.

[6]  Peter Reiher,et al.  Providing determinism in the Time Warp operating system-costs, benefits, and implications , 1990, IEEE Workshop on Experimental Distributed Systems.

[7]  Sudip K. Seal,et al.  Discrete event modeling and massively parallel execution of epidemic outbreak phenomena , 2012, Simul..

[8]  Kalyan S. Perumalla,et al.  Improving Multi-million Virtual Rank MPI Execution in [MUPI] , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[9]  Richard J. Lipton,et al.  T ime Warp vs. Chandy-Misra: A worst-case comparison , 1990 .

[10]  Richard M. Fujimoto,et al.  Virtual time synchronization over unreliable network transport , 2001, Proceedings 15th Workshop on Parallel and Distributed Simulation.

[11]  Kalyan S. Perumalla Scaling time warp-based discrete event execution to 104 processors on a Blue Gene supercomputer , 2007, CF '07.

[12]  Wayne M. Loucks,et al.  Null Message Cancellation in Conservative Distributed Simulation , 1991 .

[13]  Hao Wu,et al.  Large-scale network simulation: how big? how fast? , 2003, 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003..

[14]  Jeff S. Steinman,et al.  SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation , 1992 .

[15]  Don E Maxwell,et al.  Reducing Application Runtime Variability on Jaguar XT5 , 2010 .

[16]  R.M. Fujimoto,et al.  Parallel and distributed simulation systems , 2001, Proceeding of the 2001 Winter Simulation Conference (Cat. No.01CH37304).

[17]  Philip Heidelberger,et al.  Optimization of MPI collective communication on BlueGene/L systems , 2005, ICS '05.

[18]  David Jefferson,et al.  Fast Concurrent Simulation Using the Time Warp Mechanism. Part I. Local Control. , 1982 .

[19]  Pen-Chung Yew,et al.  Parallel discrete event simulation on shared-memory multiprocessors , 1991 .

[20]  Pen-Chung Yew,et al.  Synchronous Parallel Discrete Event Simulation on Shared-Memory Multiprocessors , 1992 .

[21]  Carl Tropper,et al.  An Efficient Gvt Computation Using Snapshots , 1998 .

[22]  Kalyan S. Perumalla,et al.  /spl mu/sik - a micro-kernel for parallel/distributed simulation systems , 2005, Workshop on Principles of Advanced and Distributed Simulation (PADS'05).

[23]  Richard M. Fujimoto,et al.  Computing global virtual time in shared-memory multiprocessors , 1997, TOMC.

[24]  Amith R. Mamidala,et al.  MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations , 2009, 2009 17th IEEE Symposium on High Performance Interconnects.

[25]  Richard M. Fujimoto,et al.  Middleware for real‐time distributed simulations , 2004, Concurr. Pract. Exp..

[26]  Karsten Schwan,et al.  Supporting parallel applications on clusters of workstations: The intelligent network interface approach , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[27]  David M. Nicol,et al.  Global synchronization for optimistic parallel discrete event simulation , 1993, PADS '93.

[28]  Friedemann Mattern,et al.  Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation , 1993, J. Parallel Distributed Comput..

[29]  Vinod Tipparaju,et al.  GVT algorithms and discrete event dynamics on 129K+ processor cores , 2011, 2011 18th International Conference on High Performance Computing.

[30]  Christopher D. Carothers,et al.  On deciding between conservative and optimistic approaches on massively parallel platforms , 2010, Proceedings of the 2010 Winter Simulation Conference.

[31]  Christopher D. Carothers,et al.  Analysis of time warp on a 32,768 processor ibm blue Gene/L supercomputer , 2008 .

[32]  Boleslaw K. Szymanski,et al.  DSIM: scaling time warp to 1,033 processors , 2005, Proceedings of the Winter Simulation Conference, 2005..

[33]  Keith D. Underwood,et al.  Implementation and Performance of Portals 3.3 on the Cray XT3 , 2005, 2005 IEEE International Conference on Cluster Computing.

[34]  Kalyan S. Perumalla,et al.  μπ: a scalable and transparent system for simulating MPI programs , 2010, SimuTools.

[35]  David M. Nicol,et al.  The cost of conservative synchronization in parallel discrete event simulations , 1993, JACM.

[36]  Paul F. Reynolds,et al.  Design and Performance Analysis of Hardware Support for Parallel Simulations , 1993, J. Parallel Distributed Comput..

[37]  Boleslaw K. Szymanski,et al.  Time Quantum GVT: A Scalable Computation of the Global Virtual Time in Parallel Discrete Event Simulations , 2007, Scalable Comput. Pract. Exp..

[38]  D. Skinner,et al.  Understanding the causes of performance variability in HPC workloads , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[39]  Yi-Bing Lin,et al.  Determining the Global Virtual Time in a Distributed Simulation , 1990, ICPP.

[40]  Behrokh Samadi Distributed simulation, algorithms and performance analysis (load balancing, distributed processing) , 1985 .

[41]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[42]  Richard M. Fujimoto,et al.  Parallel and Distribution Simulation Systems , 1999 .

[43]  Richard M. Fujimoto,et al.  Conservative synchronization of large-scale network simulations , 2004, 18th Workshop on Parallel and Distributed Simulation, 2004. PADS 2004..