A framework for fault-tolerance in HLA-based distributed simulations

The widespread use of simulation in future military systems depends, among others, on the degree of reuse and availability of simulation models. Simulation support in such systems must also cope with failure in software or hardware. Research in fault-tolerant distributed simulation, especially in the context of the high level architecture (HLA), has been quite sparse. Nor does the HLA standard itself cover fault-tolerance extensively. This paper describes a framework, named distributed resource management system (DRMS), for robust execution of federations. The implementation of the framework is based on Web services and semantic Web technology, and provides fundamental services and a consistent mechanism for description of resources managed by the environment. To evaluate the proposed framework, a federation has been developed that utilizes time-warp mechanism for synchronization. In this paper, we describe our approach to fault tolerance and give an example to illustrate how DRMS behaves when it faces faulty federates

[1]  Johannes Lüthi,et al.  The resource sharing system: dynamic federate mapping for HLA-based distributed simulation , 2001, Proceedings 15th Workshop on Parallel and Distributed Simulation.

[2]  Luciano Bononi,et al.  HLA-based adaptive distributed simulation of wireless mobile systems , 2003, Seventeenth Workshop on Parallel and Distributed Simulation, 2003. (PADS 2003). Proceedings..

[3]  Brian McBride,et al.  Jena: A Semantic Web Toolkit , 2002, IEEE Internet Comput..

[4]  F. Vardanega,et al.  A generic rollback manager for optimistic HLA simulations , 2000, Proceedings Fourth IEEE International Workshop on Distributed Simulation and Real-Time Applications (DS-RT 2000).

[5]  Stephen John Turner,et al.  Optimistic synchronization in HLA based distributed simulation , 2004, 18th Workshop on Parallel and Distributed Simulation, 2004. PADS 2004..

[6]  Stephen John Turner,et al.  Optimistic Synchronization in HLA-Based Distributed Simulation , 2005, Simul..

[7]  Vijay K. Garg,et al.  Fault-tolerant distributed simulation , 1998, Workshop on Parallel and Distributed Simulation.

[8]  Tobias Kiesling,et al.  Fault-Tolerant Distributed Simulation : A Position Paper , 2003 .

[9]  Stephen John Turner,et al.  A load management system for running HLA-based distributed simulations over the grid , 2002, Proceedings. Sixth IEEE International Workshop on Distributed Simulation and Real-Time Applications.

[10]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[11]  Rassul Ayani,et al.  Peer-to-Peer-Based Resource Management in Support of HLA-Based Distributed Simulations , 2004, Simul..

[12]  Johannes Lüthi,et al.  Concepts for dependable distributed discrete event simulation , 2000, ESM.

[13]  Guoji Sun,et al.  Research on time warp mechanism in HLA , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[14]  Rassul Ayani,et al.  HLA federate migration , 2005, 38th Annual Simulation Symposium.