Correctness proof of a database replication protocol under the perspective of the I/O automaton model

Correctness of recent database replication protocols has been justified in a rather informal way focusing only in safety properties and without using any rigorous formalism. Since a database replication protocol must ensure some degree of replica consistency and that transactions follow a given isolation level, previous proofs only focused in these two issues. This paper proposes a formalization using the I/O automaton model, identifying several components in the distributed system that are involved in the replication support (replication protocol, group communication system, database replicas) and specifying clearly their actions in the global replicated system architecture. Then, a general certification-based replication protocol guaranteeing the snapshot isolation level is proven correct. To this end, different safety and liveness properties are identified, checked and proved. Our work shows that some details of the replication protocols that were ignored in previous correctness justifications are indeed needed in order to guarantee our proposed correctness criteria.

[1]  Ricardo Jiménez-Peris,et al.  Lightweight Reflection for Middleware-based Database Replication , 2006, 2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06).

[2]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[3]  Francesc D. Muñoz-Escoí,et al.  Correctness criteria for replicated database systems with snapshot isolation replicas , 2008, PODC '08.

[4]  Nancy A. Lynch,et al.  An introduction to input/output automata , 1989 .

[5]  Fernando Pedone,et al.  Tashkent: uniting durability with transaction ordering for high-performance scalable database replication , 2006, EuroSys.

[6]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[7]  Barbara Liskov,et al.  Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions , 1999 .

[8]  Leslie Lamport,et al.  Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers [Book Review] , 2002, Computer.

[9]  Alberto Bartoli,et al.  Online reconfiguration in replicated databases based on group communication , 2001, 2001 International Conference on Dependable Systems and Networks.

[10]  Gustavo Alonso,et al.  Extending DBMSs with satellite databases , 2008, The VLDB Journal.

[11]  Sape Mullender,et al.  Distributed systems , 1989 .

[12]  A. Udaya Shankar,et al.  An introduction to assertional reasoning for concurrent systems , 1993, CSUR.

[13]  André Schiper,et al.  Comparison of database replication techniques based on total order broadcast , 2005, IEEE Transactions on Knowledge and Data Engineering.

[14]  Nancy A. Lynch,et al.  A Theory of Atomic Transactions , 1988, ICDT.

[15]  Francesc D. Muñoz-Escoí,et al.  Revisiting Certification-Based Replicated Database Recovery , 2007, OTM Conferences.

[16]  Divyakant Agrawal,et al.  Epidemic Algorithms for Replicated Databases , 2003, IEEE Trans. Knowl. Data Eng..

[17]  Miron Livny,et al.  Conflict detection tradeoffs for replicated data , 1991, TODS.

[18]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[19]  Fernando Pedone The database state machine and group communication issues , 1999 .

[20]  Fernando Pedone,et al.  Conflict-aware load-balancing techniques for database replication , 2008, SAC '08.

[21]  Bettina Kemme,et al.  Postgres-R(SI): combining replica control with concurrency control based on snapshot isolation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[22]  Luís E. T. Rodrigues,et al.  On the Use of a Reflective Architecture to Augment Database Management Systems , 2007, J. Univers. Comput. Sci..

[23]  Fernando Pedone,et al.  Database replication using generalized snapshot isolation , 2005, 24th IEEE Symposium on Reliable Distributed Systems (SRDS'05).

[24]  Luis Irún-Briz,et al.  Managing Transaction Conflicts in Middleware-based Database Replication Architectures , 2006, 2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06).

[25]  Robin Milner,et al.  A Calculus of Communicating Systems , 1980, Lecture Notes in Computer Science.

[26]  Kenneth Salem,et al.  Lazy database replication with snapshot isolation , 2006, VLDB.

[27]  Philip A. Bernstein,et al.  Middleware: a model for distributed system services , 1996, CACM.

[28]  Dennis Shasha,et al.  Making snapshot isolation serializable , 2005, TODS.

[29]  J. R. Garitagoitia,et al.  Non-blocking ROWA Protocols Implement GSI Using SI Replicas , 2007 .

[30]  Gustavo Alonso,et al.  Database replication techniques: a three parameter classification , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[31]  Gustavo Alonso,et al.  A new approach to developing and implementing eager database replication protocols , 2000, TODS.

[32]  A. Fleischmann Distributed Systems , 1994, Springer Berlin Heidelberg.

[33]  Idit Keidar,et al.  Group communication specifications: a comprehensive study , 2001, CSUR.

[34]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[35]  Ricardo Jiménez-Peris,et al.  Middleware based data replication providing snapshot isolation , 2005, SIGMOD '05.

[36]  Gustavo Alonso,et al.  MIDDLE-R: Consistent database replication at the middleware level , 2005, TOCS.

[37]  Gustavo Alonso,et al.  Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication , 2000, VLDB.

[38]  Bettina Kemme,et al.  Online recovery in cluster databases , 2008, EDBT '08.