Fast and general distributed transactions using RDMA and HTM

Recent transaction processing systems attempt to leverage advanced hardware features like RDMA and HTM to significantly boost performance, which, however, pose several limitations like requiring priori knowledge of read/write sets of transactions and providing no availability support. In this paper, we present DrTM+R, a fast in-memory transaction processing system that retains the performance benefit from advanced hardware features, while supporting general transactional workloads and high availability through replication. DrTM+R addresses the generality issue by designing a hybrid OCC and locking scheme, which leverages the strong atomicity of HTM and the strong consistency of RDMA to preserve strict serializability with high performance. To resolve the race condition between the immediate visibility of records updated by HTM transactions and the unready replication of such records, DrTM+R leverages an optimistic replication scheme that uses seqlock-like versioning to distinguish the visibility of tuples and the readiness of record replication. Evaluation using typical OLTP workloads like TPC-C and SmallBank shows that DrTM+R scales well on a 6-node cluster and achieves over 5.69 and 94 million transactions per second without replication for TPC-C and SmallBank respectively. Enabling 3-way replication on DrTM+R only incurs at most 41% overhead before reaching network bottleneck, and is still an order-of-magnitude faster than a state-of-the-art distributed transaction system (Calvin).

[1]  Yoav Raz Serializability by Commitment Ordering , 1994, Inf. Process. Lett..

[2]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[3]  Alan Fekete,et al.  The Cost of Serializability on Platforms That Use Snapshot Isolation , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  Eddie Kohler,et al.  Speedy transactions in multicore in-memory databases , 2013, SOSP.

[5]  Jinyang Li,et al.  Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store , 2013, USENIX ATC.

[6]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[7]  Barbara Liskov,et al.  Granola: Low-Overhead Distributed Transaction Coordination , 2012, USENIX Annual Technical Conference.

[8]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[9]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[10]  Chao Xie,et al.  Salt: Combining ACID and BASE in a Distributed Database , 2014, OSDI.

[11]  Ye Sun,et al.  Distributed transactional memory for metric-space networks , 2005, Distributed Computing.

[12]  Bradford L. Chamberlain,et al.  Software transactional memory for large scale clusters , 2008, PPoPP.

[13]  Leslie Lamport,et al.  Vertical paxos and primary-backup replication , 2009, PODC '09.

[14]  Satoshi Matsushita,et al.  Implementing linearizability at large scale and low latency , 2015, SOSP.

[15]  Maurice Herlihy,et al.  Committing conflicting transactions in an STM , 2009, PPoPP '09.

[16]  Miguel Castro,et al.  No compromises: distributed transactions with consistency, availability, and performance , 2015, SOSP.

[17]  Milo M. K. Martin,et al.  Subtleties of transactional memory atomicity semantics , 2006, IEEE Computer Architecture Letters.

[18]  Xiaoning Ding,et al.  BCC: Reducing False Aborts in Optimistic Concurrency Control with Low Cost for In-Memory Databases , 2016, Proc. VLDB Endow..

[19]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[20]  Haibo Chen,et al.  Using restricted transactional memory to build a scalable in-memory database , 2014, EuroSys '14.

[21]  Yoav Raz The Principle of Commitment Ordering, or Guaranteeing Serializability in a Heterogeneous Environment of Multiple Autonomous Resource Mangers Using Atomic Commitment , 1992, VLDB.

[22]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[23]  Eddie Kohler,et al.  Modular data storage with Anvil , 2009, SOSP '09.

[24]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[25]  Ramakrishna Kotla,et al.  Zyzzyva , 2007, SOSP.

[26]  Patrick Valduriez,et al.  Prototyping Bubba, A Highly Parallel Database System , 1990, IEEE Trans. Knowl. Data Eng..

[27]  Chao Xie,et al.  High-performance ACID via modular concurrency control , 2015, SOSP.

[28]  Divyakant Agrawal,et al.  Distributed optimistic concurrency control with reduced rollback , 2005, Distributed Computing.

[29]  Marcos K. Aguilera,et al.  Transaction chains: achieving serializability with low latency in geo-distributed storage systems , 2013, SOSP.

[30]  Eddie Kohler,et al.  Fast Databases with Fast Durability and Recovery Through Multicore Parallelism , 2014, OSDI.

[31]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[32]  Haibo Chen,et al.  Opportunities and pitfalls of multi-core scaling using hardware transaction memory , 2013, APSys.

[33]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[34]  Arvind Krishnamurthy,et al.  Building consistent transactions with inconsistent replication , 2015, SOSP.

[35]  Don S. Batory,et al.  GENESIS: An Extensible Database Management System , 1988, IEEE Trans. Software Eng..

[36]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.

[37]  Haibo Chen,et al.  Persistent Transactional Memory , 2015, IEEE Computer Architecture Letters.

[38]  Eddie Kohler,et al.  Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.

[39]  David G. Andersen,et al.  Using RDMA efficiently for key-value services , 2015, SIGCOMM 2015.

[40]  Marcos K. Aguilera,et al.  Yesquel: scalable sql storage for web applications , 2014, SOSP.

[41]  Anastasia Ailamaki,et al.  ATraPos: Adaptive transaction processing on hardware Islands , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[42]  Hamid Pirahesh,et al.  A data management extension architecture , 1987, SIGMOD '87.

[43]  Luís E. T. Rodrigues,et al.  Asynchronous Lease-Based Replication of Software Transactional Memory , 2010, Middleware.

[44]  Cody Cutler,et al.  Phase Reconciliation for Contended In-Memory Transactions , 2014, OSDI.

[45]  Viktor Leis,et al.  Exploiting hardware transactional memory in main-memory databases , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[46]  Haibo Chen,et al.  Fast In-Memory Transaction Processing Using RDMA and HTM , 2017, ACM Trans. Comput. Syst..

[47]  Yang Zhang,et al.  Extracting More Concurrency from Distributed Transactions , 2014, OSDI.

[48]  Divyakant Agrawal,et al.  Ordered shared locks for real-time databases , 2005, The VLDB Journal.

[49]  Madalin Mihailescu,et al.  Exploiting distributed version concurrency in a transactional memory cluster , 2006, PPoPP '06.