Quality-of-service-aware fault tolerance for grid-enabled applications

This study first reviews how grid-enabled applications can be provided with fault tolerance. Existing methods, implemented either in the grid application/middleware or in a Generalized Multi-Protocol Label Switching (GMPLS)-based network, are outlined. Then, the paper shows the advantages of integrating application/middleware fault-tolerant schemes, such as service replication, with GMPLS network-layer fault-tolerant schemes, such as path restoration. An integrated fault-tolerant scheme is capable of providing flexible QoS-aware fault tolerance while minimizing the necessary computational and network resources. In the end, the implementation of the proposed integrated scheme in a Video-on-Demand (VoD) application is experimentally validated.

[1]  Lorenzo Alvisi,et al.  Wrapping server-side TCP to mask connection failures , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[2]  Luca Valcarenghi,et al.  IP restoration vs. WDM protection: is there an optimal choice? , 2000, IEEE Netw..

[3]  K. H. Kim Distributed Computing: Fault-Tolerant Distributed Computing: Evolution and Issues , 2002, IEEE Distributed Syst. Online.

[4]  Gigi Karmous-Edwards,et al.  Optical control plane for the grid community , 2007, IEEE Communications Surveys & Tutorials.

[5]  P. Castoldi,et al.  Integrated multi-layer bandwidth recovery for multimedia communications , 2006, 2006 Workshop on High Performance Switching and Routing.

[6]  Biswanath Mukherjee,et al.  Survivable WDM mesh networks , 2003 .

[7]  Federico Silla,et al.  On the development of a communication-aware task mapping technique , 2004, J. Syst. Archit..

[8]  Jon B. Weissman Fault Tolerant Wide-Area Parallel Computing , 2000, IPDPS Workshops.

[9]  Dejan S. Milojicic,et al.  Process migration , 1999, ACM Comput. Surv..

[10]  Luca Valcarenghi,et al.  On the Advantages of Integrating Service Migration and GMPLS Path Restoration for Grid Network Failure Recovery , 2004 .

[11]  Matti A. Hiltunen,et al.  Fault-tolerant grid services using primary-backup: feasibility and performance , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).