Strong consistency is not hard to get: Two-Phase Locking and Two-Phase Commit on Thousands of Cores

Concurrency control is a cornerstone of distributed database engines and storage systems. In pursuit of scalability, a common assumption is that Two-Phase Locking (2PL) and Two-Phase Commit (2PC) are not viable solutions due to their communication overhead. Recent results, however, have hinted that 2PL and 2PC might not have such a bad performance. Nevertheless, there has been no attempt to actually measure how a state-of-the-art implementation of 2PL and 2PC would perform on modern hardware. The goal of this paper is to establish a baseline for concurrency control mechanisms on thousands of cores connected through a lowlatency network. We develop a distributed lock table supporting all the standard locking modes used in database engines. We focus on strong consistency in the form of strict serializability implemented through strict 2PL, but also explore read-committed and repeatableread, two common isolation levels used in many systems. We do not leverage any known optimizations in the locking or commit parts of the protocols. The surprising result is that, for TPC-C, 2PL and 2PC can be made to scale to thousands of cores and hundreds of machines, reaching a throughput of over 21 million transactions per second with 9.5 million New Order operations per second. Since most existing relational database engines use some form of locking for implementing concurrency control, our findings provide a path for such systems to scale without having to significantly redesign transaction management. To achieve these results, our implementation relies on Remote Direct Memory Access (RDMA). Today, this technology is commonly available on both Infiniband as well as Ethernet networks, making the results valid across a wide range of systems and platforms, including database appliances, data centers, and cloud environments. PVLDB Reference Format: Claude Barthels, Ingo Müller, Konstantin Taranov, Gustavo Alonso, and Torsten Hoefler. Strong consistency is not hard to get: Two-Phase Locking and Two-Phase Commit on Thousands of Cores. PVLDB, 12(13): 23252338, 2019. DOI: https://doi.org/10.14778/3358701.3358702 This work is licensed under the Creative Commons AttributionNonCommercial-NoDerivatives 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For any use beyond those covered by this license, obtain permission by emailing info@vldb.org. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment. Proceedings of the VLDB Endowment, Vol. 12, No. 13 ISSN 2150-8097. DOI: https://doi.org/10.14778/3358701.3358702

[1]  Gustavo Alonso,et al.  BatchDB: Efficient Isolated Execution of Hybrid OLTP+OLAP Workloads for Interactive Applications , 2017, SIGMOD Conference.

[2]  Mosharaf Chowdhury,et al.  Distributed Lock Management with RDMA: Decentralization without Starvation , 2018, SIGMOD Conference.

[3]  Carsten Binnig,et al.  The End of a Myth: Distributed Transaction Can Scale , 2016, Proc. VLDB Endow..

[4]  Donald Kossmann,et al.  Fast Scans on Key-Value Stores , 2017, Proc. VLDB Endow..

[5]  Alfons Kemper,et al.  Flow-Join: Adaptive skew handling for distributed joins over high-speed networks , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[6]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[7]  Miguel Castro,et al.  No compromises: distributed transactions with consistency, availability, and performance , 2015, SOSP.

[8]  Ippokratis Pandis,et al.  Efficiently making (almost) any concurrency control mechanism serializable , 2016, The VLDB Journal.

[9]  Gustavo Alonso,et al.  Minimizing the Hidden Cost of RDMA , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[10]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[11]  Torsten Hoefler,et al.  sPIN: High-performance streaming Processing in the Network , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Tim Kraska,et al.  Rethinking Database High Availability with RDMA Networks , 2019, Proc. VLDB Endow..

[13]  Gustavo Alonso,et al.  DPI: The Data Processing Interface for Modern Networks (Extended Abstract) , 2019, BTW.

[14]  Torsten Hoefler,et al.  Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[15]  Carsten Binnig,et al.  Designing Distributed Tree-based Index Structures for Fast RDMA-capable Networks , 2019, SIGMOD Conference.

[16]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[17]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.

[18]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[19]  Jeff Hilland RDMA Protocol Verbs Specification , 2003 .

[20]  KimJohn,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008 .

[21]  Gustavo Alonso,et al.  SWissBox: An Architecture for Data Processing Appliances , 2011, CIDR.

[22]  David G. Andersen,et al.  FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs , 2016, OSDI.

[23]  Torsten Hoefler,et al.  High-Performance Distributed RMA Locks , 2016, HPDC.

[24]  Eddie Kohler,et al.  Speedy transactions in multicore in-memory databases , 2013, SOSP.

[25]  Gustavo Alonso,et al.  Rack-Scale In-Memory Join Processing using RDMA , 2015, SIGMOD Conference.

[26]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[27]  Michael Burrows,et al.  The Chubby Lock Service for Loosely-Coupled Distributed Systems , 2006, OSDI.

[28]  Torsten Hoefler,et al.  Slim Fly: A Cost Effective Low-Diameter Network Topology , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[29]  Haibo Chen,et al.  Fast In-Memory Transaction Processing Using RDMA and HTM , 2017, ACM Trans. Comput. Syst..

[30]  William J. Dally,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[31]  Michael Stonebraker,et al.  An Evaluation of Distributed Concurrency Control , 2017, Proc. VLDB Endow..

[32]  Feilong Liu,et al.  Design and Evaluation of an RDMA-aware Data Shuffling Operator for Parallel Database Systems , 2017, EuroSys.

[33]  Gustavo Alonso,et al.  Distributed Join Algorithms on Thousands of Cores , 2017, Proc. VLDB Endow..

[34]  David G. Andersen,et al.  Using RDMA efficiently for key-value services , 2015, SIGCOMM 2015.

[35]  Torsten Hoefler,et al.  Enabling highly-scalable remote memory access programming with MPI-3 one sided , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[36]  Gustavo Alonso,et al.  Designing Databases for Future High-Performance Networks , 2017, IEEE Data Eng. Bull..

[37]  Torsten Hoefler,et al.  Exploiting Offload-Enabled Network Interfaces , 2015, IEEE Micro.

[38]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[39]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[40]  Michael Stonebraker,et al.  Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores , 2014, Proc. VLDB Endow..