Derivation of Fail-Aware Membership Service Specifications

We derive the specification of a primary partition and a partitionable fail-aware node membership service in a top-down fashion. The derived specifications are fail-aware in the sense that each client of a membership server can learn if the server currently provides its standard semantics or an exception semantics because too many failures have occurred. We first propose the specification of an ideal membership service and then transform this ideal specification step by step to derive the two fail-aware specifications that are implementable in timed asynchronous systems. In each step we address an implementation problem or a change in the system/failure model.

[1]  Flaviu Cristian,et al.  Fail-aware datagram service , 1999, IEE Proc. Softw..

[2]  F. Cristian Reaching Agreement on Processor Group Membership in Synchronous Distributed Systems Key Words: Communication Network { Distributed System { Failure Detection { Fault Tolerance { Real Time System { Replicated Data , 1991 .

[3]  Emmanuelle Anceaume,et al.  On the Formal Specification of Group Membership Services , 1994 .

[4]  Carl E. Landwehr,et al.  Dependable Computing for Critical Applications 4 , 1995, Dependable Computing and Fault-Tolerant Systems.

[5]  Flaviu Cristian,et al.  The Timed Asynchronous Distributed System Model , 1999, IEEE Trans. Parallel Distributed Syst..

[6]  Flaviu Cristian,et al.  A Highly Available Local Leader Election Service , 1999, IEEE Trans. Software Eng..

[7]  Flaviu Cristian,et al.  Continuous clock amortization need not affect the precision of a clock synchronization algorithm , 1990, PODC '90.

[8]  Flaviu Cristian Automatic reconfiguration in the presence of failures , 1992, Softw. Eng. J..

[9]  S. Toueg,et al.  On the Formal Speci cation ofGroup Membership Services , 1995 .

[10]  Louise E. Moser,et al.  Processor Membership in Asynchronous Distributed Systems , 1994, IEEE Trans. Parallel Distributed Syst..

[11]  Frank B. Schmuck,et al.  Agreeing on Processor Group Membership in Timed Asynchronous Distributed Systems , 1995 .

[12]  Ragunathan Rajkumar,et al.  Processor group membership protocols: specification, design and implementation , 1993, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.

[13]  Flaviu Cristian,et al.  Automatic Reconnguration in the Presence of Failures , 1992 .

[14]  Gil Neiger A new look at membership services (extended abstract) , 1996, PODC '96.

[15]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[16]  Flaviu Cristian,et al.  Fail-awareness in timed asynchronous systems , 1996, PODC '96.

[17]  Danny Dolev,et al.  On the possibility and impossibility of achieving clock synchronization , 1984, STOC '84.

[18]  Bernadette Charron-Bost,et al.  On the impossibility of group membership , 1996, PODC '96.

[19]  Flaviu Cristian,et al.  Fail-awareness: an approach to construct fail-safe applications , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[20]  Matti A. Hiltunen,et al.  Properties of membership services , 1995, Proceedings ISADS 95. Second International Symposium on Autonomous Decentralized Systems.

[21]  Shivakant Mishra,et al.  A Membership Protocol Based on Partial Order , 1992 .

[22]  F. Cristian,et al.  A fail-aware membership service , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.