Middleware-based database replication: the gaps between theory and practice

The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solutions, while commercial teams take a holistic approach that often misses opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. This paper aims to characterize the gap along three axes: performance, availability, and administration. We build on our own experience developing and deploying replication systems in commercial and academic settings, as well as on a large body of prior related work. We sift through representative examples from the last decade of open-source, academic, and commercial database replication systems and combine this material with case studies from real systems deployed at Fortune 500 customers. We propose two agendas, one for academic research and one for industrial R&D, which we believe can bridge the gap within 5-10 years. This way, we hope to both motivate and help researchers in making the theory and practice of middleware-based database replication more relevant to each other.

[1]  Alan L. Cox,et al.  A comparative evaluation of transparent scaling techniques for dynamic content servers , 2005, 21st International Conference on Data Engineering (ICDE'05).

[2]  Marta Patiño-Martínez Consistent Database Replication at the Middleware Level , 2005 .

[3]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[4]  Bettina Kemme Database replication based on group communication: implementation issues , 2003 .

[5]  Gustavo Alonso,et al.  MIDDLE-R: Consistent database replication at the middleware level , 2005, TOCS.

[6]  Willy Zwaenepoel,et al.  C-JDBC: Flexible Database Clustering Middleware , 2004, USENIX Annual Technical Conference, FREENIX Track.

[7]  Gustavo Alonso,et al.  Ganymed: Scalable Replication for Transactional Web Applications , 2004, Middleware.

[8]  Maarten van Steen,et al.  Proceedings of the 7th ACM/IFIP/USENIX international conference on Middleware , 2006 .

[9]  Yi Lin,et al.  Enhancing Edge Computing with Database Replication , 2007, 2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007).

[10]  Gustavo Alonso,et al.  Are quorums an alternative for data replication? , 2003, TODS.

[11]  Sameh Elnikety,et al.  Tashkent+: memory-aware load balancing and update filtering in replicated databases , 2007, EuroSys '07.

[12]  Jim Melton,et al.  SQL:2003 has been published , 2004, SGMD.

[13]  Gustavo Alonso,et al.  Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication , 2000, VLDB.

[14]  Ricardo Jiménez-Peris,et al.  Middleware based data replication providing snapshot isolation , 2005, SIGMOD '05.

[15]  Yi-Min Wang,et al.  ONE-IP: Techniques for Hosting a Service on a Cluster of Machines , 1997, Comput. Networks.

[16]  Emmanuel Cecchet,et al.  Evaluation of a Group Communication Middleware for Clustered J2EE Application Servers , 2004, CoopIS/DOA/ODBASE.

[17]  Gustavo Alonso,et al.  Extending DBMSs with satellite databases , 2008, The VLDB Journal.

[18]  Gustavo Alonso,et al.  Database replication based on group communication , 1998 .

[19]  Jin Chen,et al.  Autonomic Provisioning of Backend Databases in Dynamic Content Web Servers , 2006, 2006 IEEE International Conference on Autonomic Computing.

[20]  Kenneth Salem,et al.  Lazy database replication with snapshot isolation , 2006, VLDB.

[21]  Roger S. Barga,et al.  Phoenix project: fault-tolerant applications , 2002, SGMD.

[22]  Ricardo Jiménez-Peris,et al.  Adaptive Middleware for Data Replication , 2004, Middleware.

[23]  Fernando Pedone,et al.  Tashkent: uniting durability with transaction ordering for high-performance scalable database replication , 2006, EuroSys.

[24]  Paolo Missier,et al.  Telcordia's Database Reconciliation and Data Quality Analysis Tool , 2000, VLDB.

[25]  Alan L. Cox,et al.  Conflict-Aware Scheduling for Dynamic Content Applications , 2003, USENIX Symposium on Internet Technologies and Systems.

[26]  C. Amza,et al.  Specification and implementation of dynamic Web site benchmarks , 2002, 2002 IEEE International Workshop on Workload Characterization.

[27]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.