Pronto: High availability for standard off-the-shelf databases

Enterprise applications typically store their state in databases. If a database fails, the application is unavailable while the database recovers. Database recovery is time consuming because it involves replaying the persistent transaction log. To isolate end users from database failures we introduce Pronto, a protocol to orchestrate the transaction processing by multiple, standard databases so that they collectively implement the illusion of a single, highly available database. Pronto is a novel replication protocol that handles non-determinism without relying on perfect failure detection, does not require any modifications in existing applications and databases, and allows databases from different providers to be part of the replicated compound.

[1]  Amr El Abbadi,et al.  Maintaining availability in partitioned replicated databases , 1987, ACM Trans. Database Syst..

[2]  Gustavo Alonso,et al.  A new approach to developing and implementing eager database replication protocols , 2000, TODS.

[3]  Marcos K. Aguilera,et al.  Failure detection and consensus in the crash-recovery model , 2000, Distributed Computing.

[4]  Fernando Pedone,et al.  Tashkent: uniting durability with transaction ordering for high-performance scalable database replication , 2006, EuroSys.

[5]  Farnam Jahanian,et al.  A Real-Time Primary-Backup Replication Service , 1999, IEEE Trans. Parallel Distributed Syst..

[6]  Rachid Guerraoui,et al.  Transaction reordering in replicated databases , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.

[7]  Gustavo Alonso,et al.  Scalable Replication in Database Clusters , 2000, DISC.

[8]  Rajeev Rastogi,et al.  Update propagation protocols for replicated databates , 1999, SIGMOD '99.

[9]  Calton Pu,et al.  Replica control in distributed systems: as asynchronous approach , 1991, SIGMOD '91.

[10]  Hector Garcia-Molina,et al.  Two Epoch Algorithms for Disaster Recovery , 1990, VLDB.

[11]  Mamoru Maekawa,et al.  A N algorithm for mutual exclusion in decentralized systems , 1985, TOCS.

[12]  Arthur J. Bernstein,et al.  Bounded ignorance in replicated systems , 1991, PODS.

[13]  Fred B. Schneider,et al.  Optimal Primary-Backup Protocols , 1992, WDAG.

[14]  Alberto Bartoli,et al.  Online reconfiguration in replicated databases based on group communication , 2001, 2001 International Conference on Dependable Systems and Networks.

[15]  Bettina Kemme,et al.  Postgres-R(SI): combining replica control with concurrency control based on snapshot isolation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[16]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[17]  Henry F. Korth,et al.  Replication and consistency: being lazy helps sometimes , 1997, PODS.

[18]  Fernando Pedone,et al.  Database replication using generalized snapshot isolation , 2005, 24th IEEE Symposium on Reliable Distributed Systems (SRDS'05).

[19]  Narain H. Gehani,et al.  Scalable Update Propagation in Epidemic Replicated Databases , 1996, EDBT.

[20]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[21]  Alan L. Cox,et al.  Distributed Versioning: Consistent Replication for Scaling Back-End Databases of Dynamic Content Web Sites , 2003, Middleware.

[22]  Divyakant Agrawal,et al.  Epidemic Algorithms for Replicated Databases , 2003, IEEE Trans. Knowl. Data Eng..

[23]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[24]  Rachid Guerraoui,et al.  Exploiting Atomic Broadcast in Replicated Databases , 1998, Euro-Par.

[25]  Doug Stacey Replication: DB2, Oracle, or Sybase? , 1995, SGMD.

[26]  Gustavo Alonso,et al.  MIDDLE-R: Consistent database replication at the middleware level , 2005, TOCS.

[27]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[28]  Gustavo Alonso,et al.  Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication , 2000, VLDB.

[29]  Patrick Valduriez,et al.  Preventive Replication in a Database Cluster , 2005, Distributed and Parallel Databases.

[30]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[31]  Michel Raynal,et al.  Atomic broadcast in asynchronous crash-recovery distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[32]  Gustavo Alonso,et al.  Understanding replication in databases and distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[33]  Divyakant Agrawal,et al.  The Tree Quorum Protocol: An Efficient Approach for Managing Replicated Data , 1990, VLDB.

[34]  Gustavo Alonso,et al.  Exploiting Atomic Broadcast in Replicated Databases (Extended Abstract) , 1997, Euro-Par.

[35]  André Schiper,et al.  Handling message semantics with Generic Broadcast protocols , 2002, Distributed Computing.

[36]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[37]  Divyakant Agrawal,et al.  Epidemic algorithms in replicated databases (extended abstract) , 1997, PODS.

[38]  Ricardo Jiménez-Peris,et al.  Middleware based data replication providing snapshot isolation , 2005, SIGMOD '05.

[39]  Willy Zwaenepoel,et al.  C-JDBC: Flexible Database Clustering Middleware , 2004, USENIX Annual Technical Conference, FREENIX Track.

[40]  S. S. Ravi,et al.  Deferred updates and data placement in distributed databases , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[41]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[42]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[43]  Scott Shenker,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[44]  Lars Frank Evaluation of the basic remote backup and replication methods for high availability databases , 1999 .

[45]  Rachid Guerraoui,et al.  A pragmatic implementation of e-transactions , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[46]  Gustavo Alonso,et al.  Non-intrusive, parallel recovery of replicated data , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[47]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[48]  Gustavo Alonso,et al.  Ganymed: Scalable Replication for Transactional Web Applications , 2004, Middleware.

[49]  Paulo Veríssimo,et al.  The Delta-4 extra performance architecture (XPA) , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[50]  Maydene Fisher,et al.  JDBC¿ API Tutorial and Reference , 2003 .

[51]  Yair Amir,et al.  From total order to database replication , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[52]  Divyakant Agrawal,et al.  The performance of database replication with group multicast , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[53]  Lars Frank,et al.  Evaluation of the basic remote backup and replication methods for high availability databases , 1999, Softw. Pract. Exp..

[54]  Esther Pacitti,et al.  Update propagation strategies to improve freshness in lazy master replicated databases , 2000, The VLDB Journal.

[55]  Pedro Vicente,et al.  Strong Replication in the GlobData Middleware , 2002 .

[56]  Hector Garcia-Molina,et al.  Processing of read-only queries at a remote backup , 1994, Proceedings of IEEE 13th Symposium on Reliable Distributed Systems.