The Chameleon infrastructure for adaptive, software implemented fault tolerance

This paper presents Chameleon, an adaptive software infrastructure for supporting different levels of availability requirements in a heterogeneous networked environment. Chameleon provides dependability through the use of ARMORs-Adaptive, Reconfigurable, and Mobile Objects for Reliability. Three broad classes of ARMORs are defined: Managers, Daemons, and Common ARMORs. Key concepts that support adaptive fault tolerance include the construction of fault tolerance execution strategies from a comprehensive set of ARMORs, the creation of ARMORs from a library of reusable basic building blocks, the dynamic adaptation to changing fault tolerance requirements, and the ability to detect and recover from errors in applications and in ARMORs.

[1]  I. Bey,et al.  Delta-4: A Generic Architecture for Dependable Distributed Computing , 1991, Research Reports ESPRIT.

[2]  Robbert van Renesse,et al.  Reliable Distributed Computing with the Isis Toolkit , 1994 .

[3]  Yennun Huang,et al.  Software Implemented Fault Tolerance Technologies and Experience , 1993, FTCS.

[4]  Gul Agha,et al.  A Methodology for Adapting to Patterns of Faults , 1994 .

[5]  Robbert van Renesse,et al.  Horus: a flexible group communication system , 1996, CACM.

[6]  Danny Dolev,et al.  The Transis approach to high availability cluster communication , 1996, CACM.

[7]  Silvano Maffeis Prianha: A CORBA Tool For High Availability , 1997, Computer.

[8]  David Powell,et al.  Distributed fault tolerance: lessons from Delta-4 , 1994, IEEE Micro.

[9]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.

[10]  Kenneth P. Birman,et al.  The process group approach to reliable distributed computing , 1992, CACM.

[11]  Michael K. Reiter,et al.  Distributing trust with the Rampart toolkit , 1996, CACM.

[12]  John H. Wensley SIFT: software implemented fault tolerance , 1972, AFIPS '72 (Fall, part I).

[13]  William H. Sanders,et al.  AQuA: an adaptive architecture that provides dependable distributed objects , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[14]  Jean-Charles Fabre,et al.  A Metaobject Architecture for Fault-Tolerant Distributed Systems: The FRIENDS Approach , 1998, IEEE Trans. Computers.

[15]  Louise E. Moser,et al.  Totem: a fault-tolerant multicast group communication system , 1996, CACM.