Probabilistic checkpointing in time warp parallel simulation

In the Time Warp (TW) protocol, the system state must be checkpointed to facilitate the rollback operation. While increasing the checkpointing frequency increases the state saving cost, an infrequent scheme also escalates the coast forward effort when a large number of executed events are redone. This paper proposes a probabilistic approach to checkpointing. We derive the rollback probability, and compute the expected coast forward effort if a state is not saved. To reduce implementation overheads, the rollback probability and coast forward cost are predetermined and make available at runtime as a lookup table. Based on the derived expectation, a store vector is saved only if the expected coast forward effort is larger than the state saving cost and vice versa. Our experiments show that the cost model reduces the simulation elapsed time by close to 30% as compared to saving the system state after each event execution, and saving the system state at a predefined interval.

[1]  Hussam M. Soliman On the Selection of the State Saving Strategy In Time Warp Parallel Simulations , 1999, Simul..

[2]  Yong Meng Teo,et al.  Structured parallel simulation modeling and programming , 1998, Proceedings 31st Annual Simulation Symposium.

[3]  Richard M. Fujimoto,et al.  Parallel discrete event simulation , 1990, CACM.

[4]  K. Mani Chandy,et al.  Asynchronous distributed simulation via a sequence of parallel computations , 1981, CACM.

[5]  Francesco Quaglia Event history based sparse state saving in time warp , 1998, Workshop on Parallel and Distributed Simulation.

[6]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[7]  Wayne M. Loucks,et al.  On the Trade-off between Time and Space in Optimistic Parallel Discrete-Event Simulation , 1992 .

[8]  Jack Dongarra,et al.  Pvm 3 user's guide and reference manual , 1993 .

[9]  Francesco Quaglia Combining periodic and probabilistic checkpointing in optimistic simulation , 1999, Proceedings Thirteenth Workshop on Parallel and Distributed Simulation. PADS 99. (Cat. No.PR00155).

[10]  Yi-Bing Lin,et al.  Optimality considerations of 'Time Warp' parallel simulation , 1990 .

[11]  B. Unger,et al.  Multiplexed State Saving For Bounded Rollback , 1997, Winter Simulation Conference Proceedings,.

[12]  Richard M. Fujimoto,et al.  Adaptive memory management and optimism control in time warp , 1997, TOMC.

[13]  Fabian Gomes,et al.  State saving for interactive optimistic simulation , 1997 .

[14]  Francesco Quaglia A Cost Model for Selecting Checkpoint Positions in Time Warp Parallel Simulation , 2001, IEEE Trans. Parallel Distributed Syst..

[15]  Rassul Ayani,et al.  Adaptive checkpointing in Time Warp , 1994, PADS '94.

[16]  K. Mani Chandy,et al.  Distributed Simulation: A Case Study in Design and Verification of Distributed Programs , 1979, IEEE Transactions on Software Engineering.

[17]  Vittorio Cortellessa,et al.  Rollback-based parallel discrete event simulation by using hybrid state saving , 1997 .

[18]  Darrin West,et al.  Automatic incremental state saving , 1996, Workshop on Parallel and Distributed Simulation.

[19]  Robert Rönngren,et al.  Event sensitive state saving in time warp parallel discrete event simulations , 1996, WSC.