论文信息 - Information Propagation on the phi Failure Detector

Information Propagation on the phi Failure Detector

It is widely recognized that distributed systems would greatly benefit from the availability of a generic failure detection service. There are however several issues that must be addressed before such a service can actually be implemented. In this paper, we highlight the issue related to propagating information on failures in the phi failure detector for large-scale systems. Traditionally, failure detection systems provide information on suspects to every processes. However, it is not the efficient way in the large-scale system. We consider the notification system that propagates information on suspicions with content-based filtering

Xavier Défago | Makoto Takizawa | Naohiro Hayashibara | Takuya Katayama

[1] Robbert van Renesse,et al. A Gossip-Style Failure Detection Service , 2009 .

[2] Péter Urbán,et al. Definition and specification of accrual failure detectors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[3] Indranil Gupta,et al. On scalable and efficient distributed failure detectors , 2001, PODC '01.

[4] Pascal Felber,et al. XNET: a reliable content-based publish/subscribe system , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..

[5] Michel Raynal,et al. An adaptive failure detection protocol , 2001, Proceedings 2001 Pacific Rim International Symposium on Dependable Computing.

[6] Miguel Castro,et al. Scribe: a large-scale and decentralized application-level multicast infrastructure , 2002, IEEE J. Sel. Areas Commun..

[7] Bernadette Charron-Bost,et al. Solving Problems in the Presence of Process Crashes and Lossy Links , 1996 .

[8] David S. Rosenblum,et al. Design and evaluation of a wide-area event notification service , 2001, TOCS.

[9] Pierre Sens,et al. Implementation and performance evaluation of an adaptable failure detector , 2002, Proceedings International Conference on Dependable Systems and Networks.

[10] Gregor von Laszewski,et al. A fault detection service for wide area distributed computations , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[11] Naohiro Hayashibara,et al. The φ Accrual Failure Detector , 2004 .

[12] Pierre Sens,et al. Performance analysis of a hierarchical failure detector , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[13] Rachid Guerraoui,et al. Failure detectors as first class objects , 1999, Proceedings of the International Symposium on Distributed Objects and Applications.

[14] Sam Toueg,et al. Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[15] Edmundo Roberto Mauro Madeira,et al. ADAPTATION - Algorithms to Adaptive Fault Monitoring and their implementation on CORBA , 2001, Proceedings 3rd International Symposium on Distributed Objects and Applications.

[16] V. Jacobson,et al. Congestion avoidance and control , 1988, CCRV.

[17] Marcos K. Aguilera,et al. On the quality of service of failure detectors based on control theory , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[18] Guruduth Banavar,et al. An efficient multicast protocol for content-based publish-subscribe systems , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[19] Naohiro Hayashibara,et al. Failure detectors for large-scale distributed systems , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[20] Robbert van Renesse,et al. Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining , 2003, TOCS.