Optimizing incremental state-saving and restoration

Computer simulation is a valuable tool for the design and analysis of complex systems. However, the simulation of large complex systems such as, telecommunication, traffic, manufacturing, combat, training, ecological, engineering, and computer systems, can require massive resources in terms of processing times and memory. The parallel execution of such discrete event simulations offers the potential of substantially reducing this processing time. Optimistic synchronization has been proposed to exploit the inherent parallelism within systems and has been shown to be capable of impressive speedup through parallel execution. Two serious problems with optimistic methods have yet to be adequately resolved. The first is the saving, or logging, of model state information during forward execution, so that rollback to a previous state can be accomplished. The second problem is to solve the first in a way which is transparent to the programmer, i.e. that doesn't substantially complicate model development. This thesis examines these two key related problems, efficient state logging and transparency. A compiler based design for an incremental state saving and restoration mechanism is presented that addresses both of these problems. This design was evaluated using a real world telecommunication simulation that was developed over an 18 month period by an 8 person team who had little experience in parallel discrete event simulation. State logging overheads were demonstrated to be less than 15% of the forward computation cost for this telecommunication benchmark. The mechanism is also completely transparent to the simulation programmer. It is further argued that the compiler design not only solves the transparency problem of logging state in optimistic parallel discrete event simulation, but is also applicable to more general computation that requires support for rollback. These more general applications include: editors, fault-tolerant systems, transaction based systems, playback debuggers, versioning systems, and logic programming systems.

[1]  Harold W. Thimbleby,et al.  User interface design , 1990, ACM Press Frontier Series.

[2]  James S. Plank Efficient checkpointing on MIMD architectures , 1993 .

[3]  Rhonda Righter,et al.  Distributed simulation of discrete event systems , 1989, Proc. IEEE.

[4]  Uyless Black ATM foundation for broadband networks , 1995 .

[5]  Ganesh Gopalakrishnan,et al.  Design and Evaluation of the Rollback Chip: Special Purpose Hardware for Time Warp , 1992, IEEE Trans. Computers.

[6]  Charles N. Fischer,et al.  Crafting a Compiler , 1988 .

[7]  Richard M. Fujimoto,et al.  GTW: a time warp system for shared memory multiprocessors , 1994, Proceedings of Winter Simulation Conference.

[8]  Philip A. Wilsey,et al.  An analytical comparison of periodic checkpointing and incremental state saving , 1993, PADS '93.

[9]  David M. Nicol,et al.  State of the art in parallel simulation , 1992, WSC '92.

[10]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[11]  Richard M. Fujimoto,et al.  Time Warp on a Shared Memory Multiprocessor , 1989, ICPP.

[12]  Ronald G. Loeliger,et al.  Threaded interpretive languages , 1981 .

[13]  John A. Campbell,et al.  Implementations of Prolog. , 1984 .

[14]  Charlotte Baltus Distributed Computing Environments , 1996 .

[15]  Fabian Gomes,et al.  A fast asynchronous GVT algorithm for shared memory multiprocessor architectures , 1995, PADS.

[16]  John G. Cleary,et al.  An external state management system for optimistic parallel simulation , 1993, WSC '93.

[17]  Donald T. Gantz,et al.  Proceedings of the 1985 winter simulation conference , 1985 .

[18]  Erol Gelenbe,et al.  On the Optimum Checkpoint Interval , 1979, JACM.

[19]  K. Mani Chandy,et al.  Asynchronous distributed simulation via a sequence of parallel computations , 1981, CACM.

[20]  Bjarne Stroustrup,et al.  Parameterized Types for C++ , 1989, C++ Conference.

[21]  Rainer Händel,et al.  ATM Networks: Concepts, Protocols, Applications , 1998 .

[22]  Bjarne Stroustrup,et al.  The Annotated C++ Reference Manual , 1990 .

[23]  Fred J. Kaudel,et al.  A literature survey on distributed discrete event simulation , 1987, SIML.

[24]  P. M. Melliar-Smith,et al.  A program structure for error detection and recovery , 1974, Symposium on Operating Systems.

[25]  Steven F. Bellenot State skipping performance with the time warp operating system , 1991 .

[26]  Rassul Ayani,et al.  Adaptive checkpointing in Time Warp , 1994, PADS '94.

[27]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[28]  David A. McAllester,et al.  Nondeterministic Lisp as a Substrate for Constraint Logic Programming , 1993, AAAI.

[29]  Walter F. Tichy,et al.  Rcs — a system for version control , 1985, Softw. Pract. Exp..

[30]  Allen D. Malony,et al.  Parallel Discrete Event Simulation Using Shared Memory , 1988, IEEE Trans. Software Eng..

[31]  Richard M. Fujimoto,et al.  Multicomputer Networks: Message-Based Parallel Processing , 1987 .

[32]  Jock D. Mackinlay,et al.  The information visualizer, an information workspace , 1991, CHI.

[33]  Jong-Deok Choi,et al.  A Mechanism for Efficient Debugging of Parallel Programs , 1988, PLDI.

[34]  K. Mani Chandy,et al.  Distributed Simulation: A Case Study in Design and Verification of Distributed Programs , 1979, IEEE Transactions on Software Engineering.

[35]  Gershon Kedem,et al.  Parallel mixed-level simulation of digital circuits using virtual time , 1990 .

[36]  Dhiraj K. Pradhan,et al.  Processor- and memory-based checkpoint and rollback recovery , 1993, Computer.

[37]  R. M. Fujimoto,et al.  Parallel discrete event simulation , 1989, WSC '89.

[38]  Bjarne Stroustrup,et al.  The C++ programming language (2nd ed.) , 1991 .

[39]  Horst Mehl,et al.  Shared variables in distributed simulation , 1993, PADS '93.

[40]  Satish K. Tripathi,et al.  Parallel and distributed simulation of discrete event systems , 1994 .

[41]  David Bruce The treatment of state in optimistic systems , 1995, PADS.

[42]  Herbert Bauer,et al.  Reducing Rollback Overhead In Time-warp Based Distributed Simulation With Optimized Incremental State Saving , 1993, [1993] Proceedings 26th Annual Simulation Symposium.

[43]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[44]  Bharat Bhargava,et al.  A review of concurrency and reliability issues in distributed database systems , 1987 .

[45]  Fabian Gomes,et al.  SimKit: a high performance logical process simulation class library in C++ , 1995, WSC '95.

[46]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[47]  Stuart I. Feldman,et al.  IGOR: a system for program debugging via reversible execution , 1988, PADD '88.

[48]  K. Mani Chandy,et al.  A Survey of Analytic Models of Rollback and Recovery Stratergies , 1975, Computer.

[49]  Leo Brodie,et al.  Starting Forth , 1981 .

[50]  Matthew S. Hecht,et al.  Flow Analysis of Computer Programs , 1977 .

[51]  Richard M. Fujimoto,et al.  Parallel Discrete Event Simulation Using Space-Time Memory , 1991, ICPP.

[52]  Yi-Bing Lin,et al.  Understanding the limits of optimistic and conservative parallel simulation , 1990 .

[53]  Randy Brown,et al.  Calendar queues: a fast 0(1) priority queue implementation for the simulation event set problem , 1988, CACM.

[54]  Philip A. Wilsey,et al.  Comparative analysis of periodic state saving techniques in time warp simulators , 1995, PADS.

[55]  David A. McAllester,et al.  Non-Deterministic Lisp with Dependency-directed Backtracking , 1987, AAAI.

[56]  Paul F. Reynolds A spectrum of options for parallel simulation , 1988, WSC '88.

[57]  Jeff S. Steinman Incremental state saving in SPEEDES using C++ , 1993, WSC '93.

[58]  Yi-Bing Lin,et al.  Optimal memory management for time warp parallel simulation , 1991, TOMC.

[59]  Richard M. Fujimoto,et al.  The virtual time machine , 1989, SPAA '89.

[60]  Jeffrey Scott Vitter,et al.  US&R: A new framework for redoing (Extended Abstract) , 1984, SDE 1.

[61]  Neil D. Jones,et al.  Program flow analysis - theory and applications , 1981, Prentice Hall software series.

[62]  Y. Yang,et al.  A new conceptual model for interactive user recovery and command reuse facilities , 1988, CHI '88.

[63]  Wayne M. Loucks,et al.  Effects of the checkpoint interval on time and space in time warp , 1994, TOMC.

[64]  Richard M. Fujimoto Parallel and distributed discrete event simulation: algorithms and applications , 1993, WSC '93.