Multi-shot distributed transaction commit

Atomic Commit Problem (ACP) is a single-shot agreement problem similar to consensus, meant to model the properties of transaction commit protocols in fault-prone distributed systems. We argue that ACP is too restrictive to capture the complexities of modern transactional data stores, where commit protocols are integrated with concurrency control, and their executions for different transactions are interdependent. As an alternative, we introduce Transaction Certification Service (TCS), a new formal problem that captures safety guarantees of multi-shot transaction commit protocols with integrated concurrency control. TCS is parameterized by a certification function that can be instantiated to support common isolation levels, such as serializability and snapshot isolation. We then derive a provably correct crash-resilient protocol for implementing TCS through successive refinement. Our protocol achieves a better time complexity than mainstream approaches that layer two-phase commit on top of Paxos-style replication.

[1]  Pawel T. Wojciechowski,et al.  Make the Leader Work: Executive Deferred Update Replication , 2014, 2014 IEEE 33rd International Symposium on Reliable Distributed Systems.

[2]  Arvind Krishnamurthy,et al.  When Is Operation Ordering Required in Replicated Transactional Storage? , 2016, IEEE Data Eng. Bull..

[3]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[4]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[5]  Miguel Castro,et al.  No compromises: distributed transactions with consistency, availability, and performance , 2015, SOSP.

[6]  Barbara Liskov,et al.  Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems , 1999, PODC '88.

[7]  Rachid Guerraoui,et al.  Reducing the cost for non-blocking in atomic commitment , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[8]  Rachid Guerraoui,et al.  The Database State Machine Approach , 2003, Distributed and Parallel Databases.

[9]  Flavio Paiva Junqueira,et al.  Zab: High-performance broadcast for primary-backup systems , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[10]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[11]  Idit Keidar,et al.  A simple proof of the uniform consensus synchronous lower bound , 2003, Inf. Process. Lett..

[12]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[13]  Tim Kraska,et al.  MDCC: multi-data center consistency , 2012, EuroSys '13.

[14]  Leslie Lamport,et al.  Consensus on transaction commit , 2004, TODS.

[15]  Idit Keidar,et al.  Increasing the resilience of atomic commit, at no additional cost , 1995, PODS '95.

[16]  Paulo R. Coelho,et al.  Fast Atomic Multicast , 2017, 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[17]  Vassos Hadzilacos,et al.  On the Relationship Between the Atomic Commitment and Consensus Problems , 1990, Fault-Tolerant Distributed Computing.

[18]  Luís E. T. Rodrigues,et al.  GMU: Genuine Multiversion Update-Serializable Partial Data Replication , 2016, IEEE Transactions on Parallel and Distributed Systems.

[19]  K. V. S. Ramarao Complexity of distributed commit protocols , 2004, Acta Informatica.

[20]  André Schiper,et al.  Uniform consensus is harder than consensus , 2004, J. Algorithms.

[21]  Marc H. Scholl,et al.  Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery , 2001, SGMD.

[22]  Dale Skeen,et al.  Nonblocking commit protocols , 1981, SIGMOD '81.

[23]  Gottfried Vossen,et al.  Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery , 2002 .

[24]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[25]  Ivan Beschastnikh,et al.  Scalable consistency in Scatter , 2011, SOSP.

[26]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[27]  Rachid Guerraoui,et al.  How Fast can a Distributed Transaction Commit? , 2017, PODS.

[28]  Marc Shapiro,et al.  G-DUR: a middleware for assembling, analyzing, and improving transactional protocols , 2014, Middleware.

[29]  Arvind Krishnamurthy,et al.  Building consistent transactions with inconsistent replication , 2015, SOSP.

[30]  Andrea J. Borr Transaction Monitoring in ENCOMPASS: Reliable Distributed Transaction Processing , 1981, VLDB.

[31]  Paolo Romano,et al.  SCORe: A Scalable One-Copy Serializable Partial Replication Protocol , 2012, Middleware.

[32]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[33]  Fernando Pedone,et al.  Scalable deferred update replication , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[34]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[35]  Fernando Pedone,et al.  P-Store: Genuine Partial Replication in Wide Area Networks , 2010, 2010 29th IEEE Symposium on Reliable Distributed Systems.

[36]  Marcos K. Aguilera,et al.  Transactional storage for geo-replicated systems , 2011, SOSP.

[37]  Cynthia Dwork,et al.  The inherent cost of nonblocking commitment , 1983, PODC '83.

[38]  Rachid Guerraoui Revistiting the Relationship Between Non-Blocking Atomic Commitment and Consensus , 1995, WDAG.

[39]  Divyakant Agrawal,et al.  Low-Latency Multi-Datacenter Databases using Replicated Commit , 2013, Proc. VLDB Endow..