WS-Membership - Failure Management in a Web-Services World

An important factor in the successful deployment of federated web-services-based business activities will be the ability to guarantee reliable distributed operation and execution. Failure management is essential for any reliable distributed operation but especially for the target areas of web-services, where the activities can be constructed out of services located at different enterprises, and are accessed over heterogeneous networks topologies. This paper describes ws-membership, a coordination service that provides a generic web-service interface for tracking registered web-services and for providing membership monitoring information. A prototype membership service based on epidemic protocol techniques has been implemented and is described in detail in this paper. The specification and implementation have been developed in the context of the Huygens project which focuses global scalable distributed systems based on web-service technologies.

[1]  Doug Terry,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[2]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[3]  Werner Vogels World wide failures , 1996, EW 7.

[4]  Butler W. Lampson,et al.  How to Build a Highly Available System Using Consensus , 1996, WDAG.

[5]  Robbert van Renesse,et al.  A Gossip-Style Failure Detection Service , 2009 .

[6]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[7]  R. V. Renesse,et al.  Software for Reliable Networks , 1996 .

[8]  Robbert van Renesse,et al.  Six misconceptions about reliable distributed computing , 1998, EW 8.

[9]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[10]  Rachid Guerraoui,et al.  Failure detectors as first class objects , 1999, Proceedings of the International Symposium on Distributed Objects and Applications.

[11]  Dan Dumitriu,et al.  An overview of the Galaxy management framework for scalable enterprise cluster computing , 2000, Proceedings IEEE International Conference on Cluster Computing. CLUSTER 2000.

[12]  Michel Raynal,et al.  An adaptive failure detection protocol , 2001, Proceedings 2001 Pacific Rim International Symposium on Dependable Computing.

[13]  Indranil Gupta,et al.  On scalable and efficient distributed failure detectors , 2001, PODC '01.

[14]  Abhinandan Das,et al.  SWIM: scalable weakly-consistent infection-style process group membership protocol , 2002, Proceedings International Conference on Dependable Systems and Networks.

[15]  Johannes Klein,et al.  Web services transaction (ws-transaction) , 2002 .

[16]  Robbert van Renesse,et al.  Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining , 2003, TOCS.

[17]  Anthony Nadalin,et al.  Web Services Coordination (WS- Coordination) , 2004 .

[18]  Gregor von Laszewski,et al.  A fault detection service for wide area distributed computations , 2004, Cluster Computing.

[19]  Werner Vogels Technology challenges for the global real-time enterprise , 2004, J. Knowl. Manag..