Using light-weight groups to handle timing failures in quasi-synchronous systems

In a quasi-synchronous environment worst-case times associated with a given activity are usually much higher than the average time needed for that activity. Using always those worst-case times can make a system useless. However not using them may lend to timing failures. On the other hand, fully synchronous behavior is usually restricted to small parts of the global system. In a previously defined architecture we use this small synchronous part to control and validate the other parts of the system. In this paper we present a light-weight group protocol that together with the previously defined architecture makes it possible to efficiently handle timing failures in a quasi-synchronous system. This is specially interesting when active replication is used. It provides application support for a fail-safe behavior or controlled (timely and safe) switching between different qualities of service.

[1]  I. Bey,et al.  Delta-4: A Generic Architecture for Dependable Distributed Computing , 1991, Research Reports ESPRIT.

[2]  Anees Shaikh,et al.  RTCAST: lightweight multicast for real-time process groups , 1996, Proceedings Real-Time Technology and Applications.

[3]  F. Cristian,et al.  A fail-aware membership service , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.

[4]  Martin de Prycker,et al.  Asynchronous Transfer Mode, Solution for Broadband Isdn , 1991 .

[5]  Paulo Veríssimo,et al.  Quasi-Synchronism: a step away from the traditional fault-tolerant real-time system models , 1995 .

[6]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[7]  Andrew T. Heybey The network simulator version 2 , 1990 .

[8]  Kenneth P. Birman,et al.  Using process groups to implement failure detection in asynchronous environments , 1991, PODC '91.

[9]  Robbert van Renesse,et al.  Light-weight process groups in the Isis system , 1993, Distributed Syst. Eng..

[10]  Paulo Veríssimo,et al.  Timing failure detection and real-time group communication in quasi-synchronous systems , 1996, Proceedings of the Eighth Euromicro Workshop on Real-Time Systems.

[11]  Sam Toueg,et al.  Unreliable failure detectors for asynchronous systems (preliminary version) , 1991, PODC '91.

[12]  Gary Comparetto,et al.  Trends in Mobile Satellite Technology , 1997, Computer.

[13]  Flaviu Cristian,et al.  Fail-awareness: an approach to construct fail-safe applications , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[14]  Flaviu Cristian,et al.  Fail-aware datagram service , 1999, IEE Proc. Softw..

[15]  Sam Toueg,et al.  Inconsistency and contamination (preliminary version) , 1991, PODC '91.

[16]  Flaviu Cristian,et al.  Early-delivery atomic broadcast , 1990, PODC '90.

[17]  P. Verissimo,et al.  An adaptive real-time group communication protocol , 1995, Proceedings 1995 IEEE International Workshop on Factory Communication Systems. WFCS'95.

[18]  Roy Friedman,et al.  Strong and weak virtual synchrony in Horus , 1996, Proceedings 15th Symposium on Reliable Distributed Systems.

[19]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[20]  Katherine Guo,et al.  A Dynamic Light-Weight Group Service , 2000, J. Parallel Distributed Comput..

[21]  J. Spragins Asynchronous Transfer Mode: Solution for Broadband ISDN, Third Edition [New Books] , 1996, IEEE Network.