Combining periodic and probabilistic checkpointing in optimistic simulation

This paper presents a checkpointing scheme for optimistic simulation which is a mixed approach between periodic and probabilistic checkpointing. The latter based on statistical data collected during the simulation, aims at recording as checkpoints states of a logical process that have high probability to be restored due to rollback (this is done in order to make those states immediately available). The periodic part prevents performance degradation due to state reconstruction (coasting forward) cost whenever the collected statistics do not allow to identify states highly likely to be restored. Hence, this scheme can be seen as a highly general solution to tackle the checkpoint problem in optimistic simulation. A performance comparison with previous solutions is carried out through a simulation study of a store-and-forward communication network in a two-dimensional torus topology.

[1]  Fabian Gomes,et al.  State saving for interactive optimistic simulation , 1997 .

[2]  John G. Cleary,et al.  An external state management system for optimistic parallel simulation , 1993, WSC '93.

[3]  S. Skold,et al.  Event sensitive state saving in time warp parallel discrete event simulations , 1996, Proceedings Winter Simulation Conference.

[4]  Ganesh Gopalakrishnan,et al.  Design and Evaluation of the Rollback Chip: Special Purpose Hardware for Time Warp , 1992, IEEE Trans. Computers.

[5]  Philip A. Wilsey,et al.  An analytical comparison of periodic checkpointing and incremental state saving , 1993, PADS '93.

[6]  Philip A. Wilsey,et al.  Adaptive checkpoint intervals in an optimistically synchronised parallel digital system simulator , 1993, VLSI.

[7]  Vaidy S. Sunderam,et al.  PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..

[8]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[9]  Steven F. Bellenot State skipping performance with the time warp operating system , 1991 .

[10]  Alois Ferscha Probabilistic adaptive direct optimism control in Time Warp , 1995, PADS.

[11]  Wayne M. Loucks,et al.  Effects of the checkpoint interval on time and space in time warp , 1994, TOMC.

[12]  Yi-Bing Lin,et al.  Selecting the checkpoint interval in time warp simulation , 1993, PADS '93.

[13]  Vittorio Cortellessa,et al.  Rollback-based parallel discrete event simulation by using hybrid state saving , 1997 .

[14]  Alois Ferscha,et al.  Estimating rollback overhead for optimism control in Time Warp , 1995, Proceedings of Simulation Symposium.

[15]  Rassul Ayani,et al.  Adaptive checkpointing in Time Warp , 1994, PADS '94.

[16]  R. M. Fujimoto,et al.  Parallel discrete event simulation , 1989, WSC '89.

[17]  Philip A. Wilsey,et al.  Comparative analysis of periodic state saving techniques in time warp simulators , 1995, PADS.

[18]  Robert Rönngren,et al.  Event sensitive state saving in time warp parallel discrete event simulations , 1996, WSC.

[19]  Jeff S. Steinman,et al.  Incremental State Saving in Speedes Using C++ , 1993, Proceedings of 1993 Winter Simulation Conference - (WSC '93).

[20]  David Jefferson,et al.  Fast Concurrent Simulation Using the Time Warp Mechanism. Part I. Local Control. , 1982 .

[21]  F. Quaglia Event history based sparse state saving in Time Warp , 1998, Proceedings. Twelfth Workshop on Parallel and Distributed Simulation PADS '98 (Cat. No.98TB100233).

[22]  Jeff S. Steinman Incremental state saving in SPEEDES using C++ , 1993, WSC '93.

[23]  Theodore R. Bashkow,et al.  A large scale, homogeneous, fully distributed parallel machine, I , 1977, ISCA '77.