Epoch-based Commit and Replication in Distributed OLTP Databases

Many modern data-oriented applications are built on top of distributed OLTP databases for both scalability and high availability. Such distributed databases enforce atomicity, durability, and consistency through two-phase commit (2PC) and synchronous replication at the granularity of every single transaction. In this paper, we present COCO, a new distributed OLTP database that supports epoch-based commit and replication. The key idea behind COCO is that it separates transactions into epochs and treats a whole epoch of transactions as the commit unit. In this way, the overhead of 2PC and synchronous replication is significantly reduced. We support two variants of optimistic concurrency control (OCC) using physical time and logical time with various optimizations, which are enabled by the epoch-based execution. Our evaluation on two popular benchmarks (YCSB and TPC-C) show that COCO outperforms systems with fine-grained 2PC and synchronous replication by up to a factor of four. PVLDB Reference Format: Yi Lu, Xiangyao Yu, Lei Cao, and Samuel Madden. Epoch-based Commit and Replication in Distributed OLTP Databases. PVLDB, 14(5): 743 756,

[1]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[2]  Ian Rae,et al.  F1: A Distributed SQL Database That Scales , 2013, Proc. VLDB Endow..

[3]  Hyeontaek Lim,et al.  Cicada: Dependably Fast Multi-Core In-Memory Transactions , 2017, SIGMOD Conference.

[4]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[5]  AgrawalDivyakant,et al.  Low-latency multi-datacenter databases using replicated commit , 2013, VLDB 2013.

[6]  Gang Chen,et al.  Towards a Non-2PC Transaction Management in Distributed Database Systems , 2016, SIGMOD Conference.

[7]  Arvind Krishnamurthy,et al.  Building consistent transactions with inconsistent replication , 2015, SOSP.

[8]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[9]  Ali Ghodsi,et al.  Coordination Avoidance in Database Systems , 2014, Proc. VLDB Endow..

[10]  Eva Zangerle,et al.  HOT: A Height Optimized Trie Index for Main-Memory Database Systems , 2018, SIGMOD Conference.

[11]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[12]  Philip A. Bernstein,et al.  Concurrency Control in Distributed Database Systems , 1986, CSUR.

[13]  Divyakant Agrawal,et al.  Minimizing Commit Latency of Transactions in Geo-Replicated Data Stores , 2015, SIGMOD Conference.

[14]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[15]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[16]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[17]  Tim Kraska,et al.  MDCC: multi-data center consistency , 2012, EuroSys '13.

[18]  Kenneth Baclawski,et al.  Quickly generating billion-record synthetic databases , 1994, SIGMOD '94.

[19]  Jialin Li,et al.  Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control , 2017, SOSP.

[20]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[21]  Dushyanth Narayanan,et al.  Fast General Distributed Transactions with Opacity , 2019, SIGMOD Conference.

[22]  Daniel J. Abadi,et al.  Rethinking serializable multiversion concurrency control , 2014, Proc. VLDB Endow..

[23]  Barbara Liskov,et al.  Granola: Low-Overhead Distributed Transaction Coordination , 2012, USENIX Annual Technical Conference.

[24]  Gustavo Alonso,et al.  Ganymed: Scalable Replication for Transactional Web Applications , 2004, Middleware.

[25]  Cody Cutler,et al.  Phase Reconciliation for Contended In-Memory Transactions , 2014, OSDI.

[26]  Leslie Lamport,et al.  Generalized Consensus and Paxos , 2005 .

[27]  David P. Reed,et al.  Implementing atomic actions on decentralized data , 1983, TOCS.

[28]  Daniel J. Abadi,et al.  High Performance Transactions via Early Write Visibility , 2017, Proc. VLDB Endow..

[29]  Eddie Kohler,et al.  Speedy transactions in multicore in-memory databases , 2013, SOSP.

[30]  Michael Stonebraker,et al.  An Evaluation of Distributed Concurrency Control , 2017, Proc. VLDB Endow..

[31]  Yi Lu,et al.  STAR: Scaling Transactions through Asymmetric Replication , 2018, Proc. VLDB Endow..

[32]  Carlo Curino,et al.  Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems , 2012, SIGMOD Conference.

[33]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[34]  Andrew C. Myers,et al.  Warranties for Faster Strong Consistency , 2014, NSDI.

[35]  Divyakant Agrawal,et al.  Low-Latency Multi-Datacenter Databases using Replicated Commit , 2013, Proc. VLDB Endow..

[36]  Johannes Gehrke,et al.  Improving Optimistic Concurrency Control Through Transaction Batching and Operation Reordering , 2018, Proc. VLDB Endow..

[37]  Bruce G. Lindsay,et al.  Transaction management in the R* distributed database management system , 1986, TODS.

[38]  Gustavo Alonso,et al.  Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication , 2000, VLDB.

[39]  Jeffrey Xu Yu,et al.  RushMon: Real-time Isolation Anomalies Monitoring , 2018, SIGMOD Conference.

[40]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.

[41]  Ziqi Wang,et al.  Building a Bw-Tree Takes More Than Just Buzz Words , 2018, SIGMOD Conference.

[42]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[43]  Ippokratis Pandis,et al.  Efficiently making (almost) any concurrency control mechanism serializable , 2016, The VLDB Journal.

[44]  Irving L. Traiger,et al.  The notions of consistency and predicate locks in a database system , 1976, CACM.

[45]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[46]  Divyakant Agrawal,et al.  G-Store: a scalable data store for transactional multi key access in the cloud , 2010, SoCC '10.

[47]  Jignesh M. Patel,et al.  High-Performance Concurrency Control Mechanisms for Main-Memory Databases , 2011, Proc. VLDB Endow..

[48]  Daniel J. Abadi,et al.  The case for determinism in database systems , 2010, Proc. VLDB Endow..

[49]  Yang Zhang,et al.  Extracting More Concurrency from Distributed Transactions , 2014, OSDI.

[50]  Samuel Madden,et al.  Transactional Consistency and Automatic Management in an Application Data Cache , 2010, OSDI.

[51]  Lorenzo Alvisi,et al.  I Can't Believe It's Not Causal! Scalable Causal Consistency with No Slowdown Cascades , 2017, NSDI.

[52]  S. Madden,et al.  Aria , 2020, Proc. VLDB Endow..

[53]  Philip A. Bernstein,et al.  Formal Aspects of Serializability in Database Concurrency Control , 1979, IEEE Transactions on Software Engineering.

[54]  Lorenzo Alvisi,et al.  Obladi: Oblivious Serializable Transactions in the Cloud , 2018, OSDI.

[55]  Anurag Gupta,et al.  Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases , 2017, SIGMOD Conference.

[56]  Divyakant Agrawal,et al.  Unifying Consensus and Atomic Commitment for Effective Cloud Data Management , 2019, Proc. VLDB Endow..

[57]  Gustavo Alonso,et al.  Extending DBMSs with satellite databases , 2008, The VLDB Journal.

[58]  Norman May,et al.  Distributed snapshot isolation: global transactions pay globally, local transactions pay locally , 2014, The VLDB Journal.

[59]  Srinivas Devadas,et al.  Sundial: Harmonizing Concurrency Control and Caching in a Distributed OLTP Database Management System , 2018, Proc. VLDB Endow..

[60]  Divyakant Agrawal,et al.  MaaT: Effective and scalable coordination of distributed transactions in the cloud , 2014, Proc. VLDB Endow..

[61]  Tim Brecht,et al.  Carousel: Low-Latency Transaction Processing for Globally-Distributed Data , 2018, SIGMOD Conference.

[62]  David P. Reed,et al.  Naming and synchronization in a decentralized computer system , 1978 .

[63]  Ippokratis Pandis,et al.  ERMIA: Fast Memory-Optimized Database System for Heterogeneous Workloads , 2016, SIGMOD Conference.

[64]  Marcos K. Aguilera,et al.  Transactional storage for geo-replicated systems , 2011, SOSP.

[65]  Xiaoning Ding,et al.  BCC: Reducing False Aborts in Optimistic Concurrency Control with Low Cost for In-Memory Databases , 2016, Proc. VLDB Endow..

[66]  David G. Andersen,et al.  There is more consensus in Egalitarian parliaments , 2013, SOSP.

[67]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[68]  Srinivas Devadas,et al.  TicToc: Time Traveling Optimistic Concurrency Control , 2016, SIGMOD Conference.

[69]  Eddie Kohler,et al.  Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.

[70]  Andrew Pavlo,et al.  An Empirical Evaluation of In-Memory Multi-Version Concurrency Control , 2017, Proc. VLDB Endow..

[71]  Eddie Kohler,et al.  Fast Databases with Fast Durability and Recovery Through Multicore Parallelism , 2014, OSDI.