Comparative Analysis of QoS and Memory Usage of Adaptive Failure Detectors

This paper compares several parametric and adaptive failure detection schemes in terms of their respective QoS. We introduce an improvement over existing methods, and evaluate their benefits. First, we propose an optimization to enhance the adaptation of Chen's FD, which significantly improves QoS, especially in the aggressive range and when the network is unstable. Second, we address the problem of most adaptive schemes, namely their need for a large window of samples. We study a scheme that is designed to use a fixed and very limited amount of memory for each monitored-monitoring link. Our experimental results over several kinds of networks (Cluster, WiFi, wired LAN, WAN) show that the properties of the existing adaptive FDs, and that the optimization is reasonable and acceptable. Furthermore, the extensive experimental results show what is the effect of memory size on the overall QoS of each adaptive FD.

[1]  Achour Mostéfaoui,et al.  Leader-Based Consensus , 2001, Parallel Process. Lett..

[2]  Jose Garcia,et al.  Multiple coverage for MBS environments , 2000, 11th IEEE International Symposium on Personal Indoor and Mobile Radio Communications. PIMRC 2000. Proceedings (Cat. No.00TH8525).

[3]  Jorge C. A. de Figueiredo,et al.  How bad are wrong suspicions? towards adaptive distributed protocols , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[4]  Andrea Bondavalli,et al.  Experimental evaluation of the QoS of failure detectors on wide area network , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[5]  Achour Mostéfaoui,et al.  From static distributed systems to dynamic systems , 2005, 24th IEEE Symposium on Reliable Distributed Systems (SRDS'05).

[6]  J. Fernandes,et al.  Cellular coverage for efficient transmission performance in MBS , 2000, Vehicular Technology Conference Fall 2000. IEEE VTS Fall VTC2000. 52nd Vehicular Technology Conference (Cat. No.00CH37152).

[7]  Achour Mostéfaoui,et al.  A necessary and sufficient condition for transforming limited accuracy failure detectors , 2004, J. Comput. Syst. Sci..

[8]  Pierre Sens,et al.  Performance analysis of a hierarchical failure detector , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[9]  Rachid Guerraoui,et al.  Failure detectors as first class objects , 1999, Proceedings of the International Symposium on Distributed Objects and Applications.

[10]  Ingrid Jansch-Pôrto,et al.  Modeling communication delays in distributed systems using time series , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[11]  Marcos K. Aguilera,et al.  Failure detection and consensus in the crash-recovery model , 2000, Distributed Computing.

[12]  Achour Mostéfaoui,et al.  Asynchronous implementation of failure detectors , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[13]  Dahlia Malkhi,et al.  Active disk paxos with infinitely many processes , 2002, PODC.

[14]  W. C. Y. Lee,et al.  Overview of cellular CDMA , 1991 .

[15]  Naohiro Hayashibara,et al.  The φ Accrual Failure Detector , 2004 .

[16]  Marcos K. Aguilera,et al.  On implementing omega with weak reliability and synchrony assumptions , 2003, PODC '03.

[17]  Marcos K. Aguilera,et al.  Communication-efficient leader election and consensus with limited link synchrony , 2004, PODC '04.

[18]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[19]  Pascal Felber,et al.  THE CORBA OBJECT GROUP SERVICE: A SERVICE APPROACH TO OBJECT GROUPS IN CORBA , 1998 .

[20]  Indranil Gupta,et al.  On scalable and efficient distributed failure detectors , 2001, PODC '01.

[21]  Makoto Takizawa,et al.  Performance Analysis of the \varphi Failure Detector with its Tunable Parameters , 2006, 17th International Workshop on Database and Expert Systems Applications (DEXA'06).

[22]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1996, JACM.

[23]  Ian F. Akyildiz,et al.  Wireless mesh networks: a survey , 2005, Comput. Networks.

[24]  Marcos K. Aguilera,et al.  On the quality of service of failure detectors based on control theory , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[25]  Rachid Guerraoui,et al.  The information structure of indulgent consensus , 2004, IEEE Transactions on Computers.

[26]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[27]  Péter Urbán,et al.  Definition and specification of accrual failure detectors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[28]  Mikel Larrea,et al.  Optimal implementation of the weakest failure detector for solving consensus , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[29]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[30]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[31]  Pierre Sens,et al.  Implementation and performance evaluation of an adaptable failure detector , 2002, Proceedings International Conference on Dependable Systems and Networks.

[32]  Achour Mostéfaoui,et al.  Crash-resilient time-free eventual leadership , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..

[33]  Mikel Larrea,et al.  Optimal implementation of the weakest failure detector for solving consensus , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.