Information Propagation on the phi Failure Detector

It is widely recognized that distributed systems would greatly benefit from the availability of a generic failure detection service. There are however several issues that must be addressed before such a service can actually be implemented. In this paper, we highlight the issue related to propagating information on failures in the phi failure detector for large-scale systems. Traditionally, failure detection systems provide information on suspects to every processes. However, it is not the efficient way in the large-scale system. We consider the notification system that propagates information on suspicions with content-based filtering

[1]  Robbert van Renesse,et al.  A Gossip-Style Failure Detection Service , 2009 .

[2]  Péter Urbán,et al.  Definition and specification of accrual failure detectors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[3]  Indranil Gupta,et al.  On scalable and efficient distributed failure detectors , 2001, PODC '01.

[4]  Pascal Felber,et al.  XNET: a reliable content-based publish/subscribe system , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..

[5]  Michel Raynal,et al.  An adaptive failure detection protocol , 2001, Proceedings 2001 Pacific Rim International Symposium on Dependable Computing.

[6]  Miguel Castro,et al.  Scribe: a large-scale and decentralized application-level multicast infrastructure , 2002, IEEE J. Sel. Areas Commun..

[7]  Bernadette Charron-Bost,et al.  Solving Problems in the Presence of Process Crashes and Lossy Links , 1996 .

[8]  David S. Rosenblum,et al.  Design and evaluation of a wide-area event notification service , 2001, TOCS.

[9]  Pierre Sens,et al.  Implementation and performance evaluation of an adaptable failure detector , 2002, Proceedings International Conference on Dependable Systems and Networks.

[10]  Gregor von Laszewski,et al.  A fault detection service for wide area distributed computations , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[11]  Naohiro Hayashibara,et al.  The φ Accrual Failure Detector , 2004 .

[12]  Pierre Sens,et al.  Performance analysis of a hierarchical failure detector , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[13]  Rachid Guerraoui,et al.  Failure detectors as first class objects , 1999, Proceedings of the International Symposium on Distributed Objects and Applications.

[14]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[15]  Edmundo Roberto Mauro Madeira,et al.  ADAPTATION - Algorithms to Adaptive Fault Monitoring and their implementation on CORBA , 2001, Proceedings 3rd International Symposium on Distributed Objects and Applications.

[16]  V. Jacobson,et al.  Congestion avoidance and control , 1988, CCRV.

[17]  Marcos K. Aguilera,et al.  On the quality of service of failure detectors based on control theory , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[18]  Guruduth Banavar,et al.  An efficient multicast protocol for content-based publish-subscribe systems , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[19]  Naohiro Hayashibara,et al.  Failure detectors for large-scale distributed systems , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[20]  Robbert van Renesse,et al.  Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining , 2003, TOCS.