Fast in-memory transaction processing using RDMA and HTM

We present DrTM, a fast in-memory transaction processing system that exploits advanced hardware features (i.e., RDMA and HTM) to improve latency and throughput by over one order of magnitude compared to state-of-the-art distributed transaction systems. The high performance of DrTM are enabled by mostly offloading concurrency control within a local machine into HTM and leveraging the strong consistency between RDMA and HTM to ensure serializability among concurrent transactions across machines. We further build an efficient hash table for DrTM by leveraging HTM and RDMA to simplify the design and notably improve the performance. We describe how DrTM supports common database features like read-only transactions and logging for durability. Evaluation using typical OLTP workloads including TPC-C and SmallBank show that DrTM scales well on a 6-node cluster and achieves over 5.52 and 138 million transactions per second for TPC-C and SmallBank Respectively. This number outperforms a state-of-the-art distributed transaction system (namely Calvin) by at least 17.9X for TPC-C.

[1]  David G. Andersen,et al.  Using RDMA efficiently for key-value services , 2015, SIGCOMM 2015.

[2]  Marcos K. Aguilera,et al.  Yesquel: scalable sql storage for web applications , 2014, SOSP.

[3]  Marcos K. Aguilera,et al.  Transaction chains: achieving serializability with low latency in geo-distributed storage systems , 2013, SOSP.

[4]  Eddie Kohler,et al.  Fast Databases with Fast Durability and Recovery Through Multicore Parallelism , 2014, OSDI.

[5]  Li Zhang,et al.  C-Hint: An Effective and Reliable Cache Management for RDMA-Accelerated Key-Value Stores , 2014, SoCC.

[6]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[7]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[8]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[9]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[10]  David G. Andersen,et al.  Paxos Quorum Leases: Fast Reads Without Sacrificing Writes , 2014, SoCC.

[11]  Yang Zhang,et al.  Extracting More Concurrency from Distributed Transactions , 2014, OSDI.

[12]  Haibo Chen,et al.  Scaling Multicore Databases via Constrained Parallel Execution , 2016, SIGMOD Conference.

[13]  Miguel Castro,et al.  No compromises: distributed transactions with consistency, availability, and performance , 2015, SOSP.

[14]  Jinyang Li,et al.  Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store , 2013, USENIX Annual Technical Conference.

[15]  Chao Xie,et al.  High-performance ACID via modular concurrency control , 2015, SOSP.

[16]  Jeffrey F. Naughton,et al.  On Transactional Memory, Spinlocks, and Database Transactions , 2010, ADMS@VLDB.

[17]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[18]  Madalin Mihailescu,et al.  Exploiting distributed version concurrency in a transactional memory cluster , 2006, PPoPP '06.

[19]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[20]  Chao Xie,et al.  Salt: Combining ACID and BASE in a Distributed Database , 2014, OSDI.

[21]  Eddie Kohler,et al.  Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.

[22]  Milo M. K. Martin,et al.  Subtleties of transactional memory atomicity semantics , 2006, IEEE Computer Architecture Letters.

[23]  Luís E. T. Rodrigues,et al.  Asynchronous Lease-Based Replication of Software Transactional Memory , 2010, Middleware.

[24]  Cody Cutler,et al.  Phase Reconciliation for Contended In-Memory Transactions , 2014, OSDI.

[25]  Viktor Leis,et al.  Exploiting hardware transactional memory in main-memory databases , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[26]  Satoshi Matsushita,et al.  Implementing linearizability at large scale and low latency , 2015, SOSP.

[27]  Alan Fekete,et al.  The Cost of Serializability on Platforms That Use Snapshot Isolation , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[28]  Eddie Kohler,et al.  Speedy transactions in multicore in-memory databases , 2013, SOSP.

[29]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[30]  Barbara Liskov,et al.  Granola: Low-Overhead Distributed Transaction Coordination , 2012, USENIX Annual Technical Conference.

[31]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[32]  Haibo Chen,et al.  SSMalloc: a low-latency, locality-conscious memory allocator with stable performance scalability , 2012, APSys.

[33]  Marcos K. Aguilera,et al.  Sinfonia: a new paradigm for building scalable distributed systems , 2007, SOSP.

[34]  Orion Hodson,et al.  Whole-system persistence , 2012, ASPLOS XVII.

[35]  Don S. Batory,et al.  GENESIS: An Extensible Database Management System , 1988, IEEE Trans. Software Eng..

[36]  Maurice Herlihy,et al.  Hopscotch Hashing , 2008, DISC.

[37]  Arvind Krishnamurthy,et al.  Building consistent transactions with inconsistent replication , 2015, SOSP.

[38]  Hyeontaek Lim,et al.  MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.

[39]  Haibo Chen,et al.  Opportunities and pitfalls of multi-core scaling using hardware transaction memory , 2013, APSys.

[40]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[41]  Haibo Chen,et al.  Using restricted transactional memory to build a scalable in-memory database , 2014, EuroSys '14.

[42]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[43]  Philip A. Bernstein,et al.  Concurrency Control in Distributed Database Systems , 1986, CSUR.

[44]  Hector Garcia-Molina,et al.  Using semantic knowledge for transaction processing in a distributed database , 1983, TODS.

[45]  Ramakrishna Kotla,et al.  Zyzzyva , 2007, SOSP.

[46]  Tarek A. El-Ghazawi,et al.  An evaluation of global address space languages: co-array fortran and unified parallel C , 2005, PPoPP.

[47]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[48]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[49]  Patrick Valduriez,et al.  Transaction chopping: algorithms and performance studies , 1995, TODS.

[50]  Eddie Kohler,et al.  Modular data storage with Anvil , 2009, SOSP '09.

[51]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX Annual Technical Conference.

[52]  Arthur J. Bernstein,et al.  Concurrency control for step-decomposed transactions , 1999, Inf. Syst..

[53]  Ye Sun,et al.  Distributed transactional memory for metric-space networks , 2005, Distributed Computing.

[54]  Bradford L. Chamberlain,et al.  Software transactional memory for large scale clusters , 2008, PPoPP.

[55]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[56]  Philip A. Bernstein,et al.  The correctness of concurrency control mechanisms in a system for distributed databases (SDD-1) , 1980, TODS.

[57]  Hamid Pirahesh,et al.  A data management extension architecture , 1987, SIGMOD '87.

[58]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.

[59]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[60]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.