Phoenix: A Toolkit for Building Fault-Tolerant Distributed Applications in Large Scale

Large scale systems are becoming more and more common today. There are many distributed applications emerging that use the capability of world-wide internetworking. Since a lot of applications need insurance of consistency even in the presence of failures, an adequate support for fault-tolerance is necessary. This can be provided by different paradigms and their implementations. Unfortunately, most of these implementations aim only local area networks, whereas our system, called Phoenix, will aim large scale where additional failure types have to be overcome. This paper shows the problems due to large scale, the limits of actual implementations, and our proposition to solve them.

[1]  André Schiper,et al.  A hierarchy of totally ordered multicasts , 1995, Proceedings. 14th Symposium on Reliable Distributed Systems.

[2]  Bradford B. Glade,et al.  The Horus System , 1993 .

[3]  Sam Toueg,et al.  Unreliable Failure Detectors for Asynchronous Systems , 1991 .

[4]  A. Schiper,et al.  View Synchronous Communication in the Internet , 1994 .

[5]  André Schiper,et al.  On group communication in large-scale distributed systems , 1994, EW 6.

[6]  Rachid Guerraoui,et al.  A generic multicast primitive to support transactions on replicated objects in distributed systems , 1995, Proceedings of the Fifth IEEE Computer Society Workshop on Future Trends of Distributed Computing Systems.

[7]  Christoph P. Malloth Increasing Reliability of Communication in Large Scale Distributed Systems , 1995, Parallel and Distributed Computing and Systems.

[8]  Yair Amir,et al.  Transis: A Communication Sub-system for High Availability , 1992 .

[9]  André Schiper,et al.  Primary Partition "Virtually-Synchronous Communication" harder than Consensus , 1994, WDAG.

[10]  André Schiper,et al.  Uniform reliable multicast in a virtually synchronous environment , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[11]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.