A Cost Analysis of Solving the Amnesia Problems

Transactional replicated systems have usually adopted the crash-recovery with partial amnesia failure model. These systems need to deal with the amnesia phenomenon; i.e. non committed state is lost at crash time. So, if this phenomenon is not accurately managed in recovery processes it can lead to state inconsistencies in the replicate state. A general solution that consists in persisting messages atomically in the delivery process has been proposed for overcoming this problem, demonstrating also its validity. But, its use implies a cost: the overhead introduced for persisting messages atomically in the delivery process. This paper analyses this overhead simulating the proposed solution for a transactional replication protocol based on certification and demonstrates how this overhead can be minimized using solid-state memories, making it acceptable.

[1]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[2]  Fred B. Schneider,et al.  Byzantine generals in action: implementing fail-stop processors , 1984, TOCS.

[3]  Fernando Pedone,et al.  Sprint: a middleware for high-performance transaction processing , 2007, EuroSys '07.

[4]  Idit Keidar,et al.  Group communication specifications: a comprehensive study , 2001, CSUR.

[5]  Francesc D. Muñoz-Escoí,et al.  Reviewing Amnesia Support in Database Recovery Protocols , 2007, OTM Conferences.

[6]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[7]  Francesc D. Muñoz-Escoí,et al.  Persistent Logical Synchrony , 2008, 2008 Seventh IEEE International Symposium on Network Computing and Applications.

[8]  Gustavo Alonso,et al.  Understanding replication in databases and distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[9]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.

[10]  Alan L. Cox,et al.  A comparative evaluation of transparent scaling techniques for dynamic content servers , 2005, 21st International Conference on Data Engineering (ICDE'05).

[11]  Gustavo Alonso,et al.  Ganymed: Scalable Replication for Transactional Web Applications , 2004, Middleware.

[12]  Marcos K. Aguilera,et al.  Failure detection and consensus in the crash-recovery model , 2000, Distributed Computing.

[13]  Luis Irún-Briz,et al.  Supporting amnesia in log-based recovery protocols , 2007, EATIS '07.

[14]  Rachid Guerraoui,et al.  Robust emulations of shared memory in a crash-recovery model , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[15]  M. I. Ruiz-Fuertes,et al.  On Optimizing Certification-Based Database Recovery Supporting Amnesia ? , 2007 .

[16]  André Schiper,et al.  A new look at atomic broadcast in the asynchronous crash-recovery model , 2005, 24th IEEE Symposium on Reliable Distributed Systems (SRDS'05).

[17]  Michel Raynal,et al.  Atomic Broadcast in Asynchronous Crash-Recovery Distributed Systems and Its Use in Quorum-Based Replication , 2003, IEEE Trans. Knowl. Data Eng..

[18]  Luis Irún-Briz,et al.  Ensuring Progress in Amnesiac Replicated Systems , 2008, 2008 Third International Conference on Availability, Reliability and Security.

[19]  Gustavo Alonso,et al.  MIDDLE-R: Consistent database replication at the middleware level , 2005, TOCS.

[20]  Robbert van Renesse,et al.  Reliable Distributed Computing with the Isis Toolkit , 1994 .

[21]  André Schiper,et al.  Comparison of database replication techniques based on total order broadcast , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Sameh Elnikety,et al.  Tashkent+: memory-aware load balancing and update filtering in replicated databases , 2007, EuroSys '07.

[23]  Ricardo Jiménez-Peris,et al.  Middleware based data replication providing snapshot isolation , 2005, SIGMOD '05.