Using dynamic atomic actions to build fault tolerant systems

The purpose of this note is to propose a model for building fault tolerant systems. We present an approach based on the object paradigm. To ensure system consistency in the event of failure we provide two basic mechamisms, the persistent state of an object and the recovery unit. Methods are called within recovery units. To define global checkpoints, we build dynamic distributed atomic actions from dependencies between recovery units. One implementation of this model is then briefly presented.

[1]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[2]  Bruno Rochat Une approche a la construction de services fiables dans les systemes distribues , 1992 .

[3]  Yolande Berbers,et al.  La désignation dans les systèmes d'exploitation répartis , 1988 .

[4]  Hubert D. Kirrmann,et al.  Alphorn: a remote procedure call environment for fault-tolerant, heterogeneous, distributed systems , 1991, IEEE Micro.

[5]  Maurice Jégado,et al.  Communicating processes and fault tolerance : a shared memory multiprocessor experience , 1992 .

[6]  Michel Banâtre,et al.  How to Design Reliable Servers using Fault Tolerant Micro-Kernel Mechanisms , 1991, USENIX MACH Symposium.

[7]  Michel Banâtre,et al.  Ensuring data security and integrity with a fast stable storage , 1988, Proceedings. Fourth International Conference on Data Engineering.

[8]  Michel Banâtre,et al.  Design decisions for the FTM: a general purpose fault tolerant machine , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[9]  Kenneth P. Birman Replication and fault-tolerance in the ISIS system , 1985, SOSP 1985.

[10]  Lily B. Mummert,et al.  Camelot and Avalon: A Distributed Transaction Facility , 1991 .

[11]  Barbara Liskov,et al.  Guardians and Actions: Linguistic Support for Robust, Distributed Programs , 1983, TOPL.

[12]  Brad J. Cox,et al.  Object-oriented programming ; an evolutionary approach , 1986 .

[13]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.