Process Replication with Log-Based Amnesia Support

Process replication is used for providing highly available and fault-tolerant systems. Traditionally, for simplicity reasons they have assumed the crash-stop failure model. This paper, instead, encourages the use of the crash-recovery with partial amnesia failure model when managing large state amounts, presenting the arising problems of this assumption and outlining how they can be managed. Finally, an overhead analysis is presented.

[1]  David B. Lomet,et al.  Process structuring, synchronization, and recovery using atomic actions , 1977, Language Design for Reliable Software.

[2]  Idit Keidar,et al.  Group communication specifications: a comprehensive study , 2001, CSUR.

[3]  Wolfgang Graetsch,et al.  Fault tolerance under UNIX , 1989, TOCS.

[4]  Gianluca Dini,et al.  Enriched View Synchrony: A Programming Paradigm for Partitionable Asynchronous Distributed Systems , 1997, IEEE Trans. Computers.

[5]  Gustavo Alonso,et al.  Understanding replication in databases and distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[6]  Brian Randell,et al.  Process Structuring , 1973, CSUR.

[7]  Yennun Huang,et al.  Software Implemented Fault Tolerance Technologies and Experience , 1993, FTCS.

[8]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[9]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[10]  Jerome H. Saltzer,et al.  End-to-end arguments in system design , 1984, TOCS.

[11]  Francisco Castro-Company,et al.  CLOB: communication support for efficient replicated database recovery , 2005, 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[12]  Francisco Castro-Company,et al.  FOBr: a version-based recovery protocol for replicated databases , 2005, 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[13]  Gustavo Alonso,et al.  Non-intrusive, parallel recovery of replicated data , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[14]  Luis Irún-Briz,et al.  Supporting amnesia in log-based recovery protocols , 2007, EATIS '07.

[15]  Alberto Bartoli,et al.  Online reconfiguration in replicated databases based on group communication , 2001, 2001 International Conference on Dependable Systems and Networks.

[16]  Robbert van Renesse,et al.  Reliable Distributed Computing with the Isis Toolkit , 1994 .

[17]  Sam Toueg,et al.  Fault-tolerant broadcasts and related problems , 1993 .

[18]  André Schiper,et al.  Beyond 1-Safety and 2-Safety for Replicated Databases: Group-Safety , 2004, EDBT.

[19]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.

[20]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[21]  Willy Zwaenepoel,et al.  On the use and implementation of message logging , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[22]  Hendrik Decker,et al.  Revisiting Hot Passive Replication , 2007, The Second International Conference on Availability, Reliability and Security (ARES'07).

[23]  Hector Garcia-Molina,et al.  Exactly-once semantics in a replicated messaging system , 2001, Proceedings 17th International Conference on Data Engineering.

[24]  Priya Narasimhan,et al.  Strongly consistent replication and recovery of fault-tolerant CORBA applications , 2002, Comput. Syst. Sci. Eng..