Towards a Non-2PC Transaction Management in Distributed Database Systems

Shared-nothing architecture has been widely used in distributed databases to achieve good scalability. While it offers superior performance for local transactions, the overhead of processing distributed transactions can degrade the system performance significantly. The key contributor to the degradation is the expensive two-phase commit (2PC) protocol used to ensure atomic commitment of distributed transactions. In this paper, we propose a transaction management scheme called LEAP to avoid the 2PC protocol within distributed transaction processing. Instead of processing a distributed transaction across multiple nodes, LEAP converts the distributed transaction into a local transaction. This benefits the processing locality and facilitates adaptive data repartitioning when there is a change in data access pattern. Based on LEAP, we develop an online transaction processing (OLTP) system, L-Store, and compare it with the state-of-the-art distributed in-memory OLTP system, H-Store, which relies on the 2PC protocol for distributed transaction processing, and H^L-Store, a H-Store that has been modified to make use of LEAP. Results of an extensive experimental evaluation show that our LEAP-based engines are superior over H-Store by a wide margin, especially for workloads that exhibit locality-based data accesses.

[1]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[2]  Carlo Curino,et al.  OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases , 2013, Proc. VLDB Endow..

[3]  Beng Chin Ooi,et al.  In-Memory Big Data Management and Processing: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.

[4]  Divyakant Agrawal,et al.  G-Store: a scalable data store for transactional multi key access in the cloud , 2010, SoCC '10.

[5]  Divyakant Agrawal,et al.  Zephyr: live migration in shared nothing databases for elastic cloud platforms , 2011, SIGMOD '11.

[6]  Prashant Malik,et al.  Cassandra: structured storage system on a P2P network , 2009, PODC '09.

[7]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[8]  Gang Chen,et al.  Adaptive Logging for Distributed In-memory Databases , 2015, ArXiv.

[9]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[10]  Kian-Lee Tan,et al.  epiC: an extensible and scalable system for processing Big Data , 2014, The VLDB Journal.

[11]  Carl Hewitt,et al.  A Universal Modular ACTOR Formalism for Artificial Intelligence , 1973, IJCAI.

[12]  Ali Ghodsi,et al.  Coordination Avoidance in Database Systems , 2014, Proc. VLDB Endow..

[13]  Donald Kossmann,et al.  On the Design and Scalability of Distributed Shared-Data Databases , 2015, SIGMOD Conference.

[14]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[15]  Jim Kurose,et al.  Computer Networking: A Top-Down Approach , 1999 .

[16]  Jim Kurose,et al.  Computer Networking: A Top-Down Approach (6th Edition) , 2007 .

[17]  Carlo Curino,et al.  Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems , 2012, SIGMOD Conference.

[18]  Beng Chin Ooi,et al.  ES2: A cloud data storage system for supporting both OLTP and OLAP , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[19]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[20]  Beng Chin Ooi,et al.  In-memory Databases: Challenges and Opportunities From Software and Hardware Perspectives , 2015, SGMD.

[21]  Divyakant Agrawal,et al.  Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases , 2015, SIGMOD Conference.

[22]  Gang Chen,et al.  Adaptive Logging: Optimizing Logging and Recovery Costs in Distributed In-memory Databases , 2016, SIGMOD Conference.

[23]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[24]  Robert Gruber,et al.  Efficient optimistic concurrency control using loosely synchronized clocks , 1995, SIGMOD '95.

[25]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[26]  Gottfried Vossen,et al.  Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery , 2002 .

[27]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.

[28]  Michael Stonebraker,et al.  The VoltDB Main Memory DBMS , 2013, IEEE Data Eng. Bull..

[29]  Divyakant Agrawal,et al.  ElasTraS: An elastic, scalable, and self-managing transactional database for the cloud , 2013, TODS.

[30]  Sashikanth Chandrasekaran,et al.  Shared cache - the future of parallel databases , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[31]  Gerhard Weikum,et al.  CHAPTER FOUR – Concurrency Control Algorithms , 2002 .

[32]  Michael Stonebraker,et al.  E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing , 2014, Proc. VLDB Endow..

[33]  Daniel J. Abadi,et al.  Low overhead concurrency control for partitioned main memory databases , 2010, SIGMOD Conference.

[34]  Marc H. Scholl,et al.  Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery , 2001, SGMD.

[35]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.