Fault-Tolerant Distributed Simulation : A Position Paper

Distributed Simulation is characterized by the fact that a simulation system is executed on multiple computing nodes that cooperate by exchanging messages. Regardless of the reasons for using distributed simulation in the first place (e.g. performance reasons), the execution of a distributed simulation depends on the proper functioning of all of the processing nodes and the underlying network. Depending on the level of reliability neccessary for a simulation system, the integration of fault-tolerance mechanisms is crucial. It turns out that there has not been much work on fault-tolerance in distributed simulation. The intention of this paper is to summarize the existing work and to point out possible research topics in this area.

[1]  IEEE Standard for Modeling and Simulation (M&S) High Level Architecture (HLA) — Framework and Rules , 2001 .

[2]  Victor P. Nelson Fault-tolerant computing: fundamental concepts , 1990, Computer.

[3]  Michael Schreckenberg,et al.  A cellular automaton model for freeway traffic , 1992 .

[4]  Johannes Lüthi,et al.  The resource sharing system: dynamic federate mapping for HLA-based distributed simulation , 2001, Proceedings 15th Workshop on Parallel and Distributed Simulation.

[5]  Anish Arora,et al.  Closure and Convergence: A Foundation of Fault-Tolerant Computing , 1993, IEEE Trans. Software Eng..

[6]  Shay Kutten,et al.  Fault-local distributed mending (extended abstract) , 1995, PODC '95.

[7]  Stephen John Turner,et al.  Hierarchical federations: an architecture for information hiding , 2001, Proceedings 15th Workshop on Parallel and Distributed Simulation.

[8]  Divyakant Agrawal,et al.  Recovering from Multiple Process Failures in the Time Warp Mechanism , 1992, IEEE Trans. Computers.

[9]  Divyakant Agrawal,et al.  Replicated objects in time warp simulations , 1992, WSC '92.

[10]  Kai Nagel,et al.  Two-lane traffic rules for cellular automata: A systematic approach , 1997, cond-mat/9712196.

[11]  Richard M. Weatherly,et al.  The aggregate level simulation protocol: an evolving system , 1994, Proceedings of Winter Simulation Conference.

[12]  Johannes Lüthi,et al.  Concepts for dependable distributed discrete event simulation , 2000, ESM.

[13]  Arobinda Gupta,et al.  Fault-containing self-stabilizing algorithms , 1996, PODC '96.

[14]  Alois Ferscha,et al.  Parallel and Distributed Simulation , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.

[15]  Philip A. Wilsey,et al.  A Formal Specification and Verification Framework for Time Warp-Based Parallel Simulation , 2002, IEEE Trans. Software Eng..

[16]  J. Banks,et al.  Discrete-Event System Simulation , 1995 .

[17]  Christina L. Bouwens The DIS Vision: A Map to the Future of Distributed Simulation , 1993 .

[18]  Anish Arora,et al.  Low-cost fault-tolerance in barrier synchronizations , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[19]  Richard M. Fujimoto,et al.  PROCEEDINGS OF THE 1997 WINTER SIMULATION CONFERENCE , 1997 .

[20]  Roland Chrobok,et al.  Traffic forecast using simulations of large scale networks , 2001, ITSC 2001. 2001 IEEE Intelligent Transportation Systems. Proceedings (Cat. No.01TH8585).

[21]  Bojan Groselj,et al.  Fault-tolerant distributed simulation , 1991, 1991 Winter Simulation Conference Proceedings..

[22]  Judith S. Dahmann The High Level Architecture and beyond: technology challenges , 1999, Proceedings Thirteenth Workshop on Parallel and Distributed Simulation. PADS 99. (Cat. No.PR00155).

[23]  Pankaj Jalote,et al.  Fault tolerance in distributed systems , 1994 .

[24]  Ronald J. Watro,et al.  Mathematical foundations for time warp systems , 1993, TOPL.

[25]  M. Schreckenberg,et al.  Microscopic Simulation of Urban Traffic Based on Cellular Automata , 1997 .