Research on independent and dynamic fault-tolerant and migration technology for cloud simulation resources

To solve the fault-tolerant and migration problems of simulation grid applications, the art of related research works is introduced firstly. Then, by adopting reflective software analysis and modeling method, the independent and dynamic fault-tolerant and migration model for simulation resources is proposed. Further more, the research fruits on several related key technologies are presented in detail, which include: to inspect the resources, the loading model is defined and the independent error forecast is achieved; the consistency of the simulation time is ensured by the harmonious advance of the state management of federates and the distributed simulation time management; and the automatic storage and restoring of state is accomplished. The research fruits above have been applied to the development of fault-tolerant and migration service for COSIM-CSP1.0v, and gained well validation in some typical applications. The multidisciplinary, distributed and collaborative simulation application for undercarriage virtual prototype is introduced as an example. Finally the conclusion is given.

[1]  John F. Karpovich,et al.  Resource management in Legion , 1999, Future Gener. Comput. Syst..

[2]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[3]  Jeffrey S. Vetter,et al.  Real-Time Performance Monitoring, Adaptive Control, and Interactive Steering of Computational Grids , 2000, Int. J. High Perform. Comput. Appl..

[4]  Heinz W. Schmidt,et al.  An agent oriented proactive fault-tolerant framework for grid computing , 2005, First International Conference on e-Science and Grid Computing (e-Science'05).

[5]  S. Jafar,et al.  Fault-tolerance for macro dataflow parallel computations on grid , 2004, Proceedings. 2004 International Conference on Information and Communication Technologies: From Theory to Applications, 2004..

[6]  Marian Bubak,et al.  Towards a grid management system for HLA-based interactive simulations , 2003, Proceedings Seventh IEEE International Symposium on Distributed Simulation and Real-Time Applications.

[7]  Francesco Tisato,et al.  Shifting Up Reflection from the Implementation to the Analysis Level , 1999, Reflection and Software Engineering.

[8]  Cheng-Zhong Xu,et al.  Service migration in distributed virtual machines for adaptive grid computing , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[9]  Soon Young Jung,et al.  A resource manager for optimal resource selection and fault tolerance service in Grids , 2004 .

[10]  Van Jacobson,et al.  A tool to infer characteristics of internet paths , 1997 .

[11]  Wang Bo,et al.  A fault-tolerance mechanism in grid , 2003, IEEE International Conference on Industrial Informatics, 2003. INDIN 2003. Proceedings..

[12]  Brian Tierney,et al.  NetLogger: A Toolkit for Distributed System Performance Tuning and Debugging , 2003, Integrated Network Management.