Supporting amnesia in log-based recovery protocols

Replicated systems are commonly used to provide highly available and fault tolerant applications, based on the use of replication and recovery protocols. Traditionally, the literature has focused on replicated systems which adopt the fail-stop failure model which presents good performance levels for replicated systems managing few state. This paper points out how the crash-recovery with partial amnesia failure model presents a better accuracy for replicated systems with huge state, but how its use has the amnesia phenomenon drawback. Then, the paper analyzes this phenomenon and how to deal with it in a basic configuration using a log-based recovery approach. Analyzing after, how it is supported and managed with other replication configurations.

[1]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[2]  Robbert van Renesse,et al.  Reliable Distributed Computing with the Isis Toolkit , 1994 .

[3]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.

[4]  Luis Irún-Briz,et al.  Recovery Strategies for Linear Replication , 2006, ISPA.

[5]  Gustavo Alonso,et al.  Database replication techniques: a three parameter classification , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[6]  Gianluca Dini,et al.  Enriched View Synchrony: A Programming Paradigm for Partitionable Asynchronous Distributed Systems , 1997, IEEE Trans. Computers.

[7]  Gustavo Alonso,et al.  A new approach to developing and implementing eager database replication protocols , 2000, TODS.

[8]  Luis Irún Briz Implementable models for replicated and fault-tolerant geographically distributed databases. Consistency management for globdata , 2003 .

[9]  JoAnne Holliday Replicated database recovery using multicast communication , 2001, Proceedings IEEE International Symposium on Network Computing and Applications. NCA 2001.

[10]  Gustavo Alonso,et al.  Non-intrusive, parallel recovery of replicated data , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[11]  Alan Burns,et al.  Programming Replicated Systems in Ada 95 , 1996, Comput. J..

[12]  Hendrik Decker,et al.  Revisiting Hot Passive Replication , 2007, The Second International Conference on Availability, Reliability and Security (ARES'07).

[13]  Sam Toueg,et al.  Fault-tolerant broadcasts and related problems , 1993 .

[14]  André Schiper,et al.  Beyond 1-Safety and 2-Safety for Replicated Databases: Group-Safety , 2004, EDBT.

[15]  Samuel Madden,et al.  An integrated approach to recovery and high availability in an updatable, distributed data warehouse , 2006, VLDB.

[16]  Idit Keidar,et al.  Group communication specifications: a comprehensive study , 2001, CSUR.

[17]  Alberto Bartoli,et al.  Online reconfiguration in replicated databases based on group communication , 2001, 2001 International Conference on Dependable Systems and Networks.

[18]  Francisco Castro-Company,et al.  FOBr: a version-based recovery protocol for replicated databases , 2005, 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[19]  Francesc D. Muñoz-Escoí,et al.  A Protocol for Reconciling Recovery and High-Availability in Replicated Databases , 2006, ISCIS.

[20]  AlonsoGustavo,et al.  A new approach to developing and implementing eager database replication protocols , 2000 .

[21]  André Schiper,et al.  Comparison of database replication techniques based on total order broadcast , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Francisco Castro-Company,et al.  CLOB: communication support for efficient replicated database recovery , 2005, 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing.