Fast General Distributed Transactions with Opacity using Global Time

Transactions can simplify distributed applications by hiding data distribution, concurrency, and failures from the application developer. Ideally the developer would see the abstraction of a single large machine that runs transactions sequentially and never fails. This requires the transactional subsystem to provide opacity (strict serializability for both committed and aborted transactions), as well as transparent fault tolerance with high availability. As even the best abstractions are unlikely to be used if they perform poorly, the system must also provide high performance. Existing distributed transactional designs either weaken this abstraction or are not designed for the best performance within a data center. This paper extends the design of FaRM - which provides strict serializability only for committed transactions - to provide opacity while maintaining FaRM's high throughput, low latency, and high availability within a modern data center. It uses timestamp ordering based on real time with clocks synchronized to within tens of microseconds across a cluster, and a failover protocol to ensure correctness across clock master failures. FaRM with opacity can commit 5.4 million neworder transactions per second when running the TPC-C transaction mix on 90 machines with 3-way replication.

[1]  Haibo Chen,et al.  Fast in-memory transaction processing using RDMA and HTM , 2015, SOSP.

[2]  Michael Kaminsky,et al.  Datacenter RPCs can be General and Fast , 2018, NSDI.

[3]  Miguel Castro,et al.  No compromises: distributed transactions with consistency, availability, and performance , 2015, SOSP.

[4]  Sameh Elnikety,et al.  Clock-SI: Snapshot Isolation for Partitioned Data Stores Using Loosely Synchronized Clocks , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.

[5]  Eddie Kohler,et al.  Fast Databases with Fast Durability and Recovery Through Multicore Parallelism , 2014, OSDI.

[6]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[7]  Haibo Chen,et al.  Deconstructing RDMA-enabled Distributed Transactions: Hybrid is Better! , 2018, OSDI.

[8]  Keith Marzullo,et al.  Maintaining the time in a distributed system , 1985, OPSR.

[9]  David G. Andersen,et al.  FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs , 2016, OSDI.

[10]  Rachid Guerraoui,et al.  On the correctness of transactional memory , 2008, PPoPP.

[11]  Timothy G. Armstrong,et al.  LinkBench: a database benchmark based on the Facebook social graph , 2013, SIGMOD '13.

[12]  Goetz Graefe,et al.  Write-Optimized B-Trees , 2004, VLDB.

[13]  Satoshi Matsushita,et al.  Implementing linearizability at large scale and low latency , 2015, SOSP.

[14]  Jignesh M. Patel,et al.  High-Performance Concurrency Control Mechanisms for Main-Memory Databases , 2011, Proc. VLDB Endow..

[15]  David G. Andersen,et al.  Design Guidelines for High Performance RDMA Systems , 2016, USENIX Annual Technical Conference.

[16]  Marcos K. Aguilera,et al.  Sinfonia: a new paradigm for building scalable distributed systems , 2007, SOSP.

[17]  Srinivas Devadas,et al.  Sundial: Harmonizing Concurrency Control and Caching in a Distributed OLTP Database Management System , 2018, Proc. VLDB Endow..

[18]  Carsten Binnig,et al.  The End of a Myth: Distributed Transaction Can Scale , 2016, Proc. VLDB Endow..

[19]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[20]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[21]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[22]  Ravi Sethi,et al.  Useless Actions Make a Difference: Strict Serializability of Database Updates , 1982, JACM.

[23]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[24]  David G. Andersen,et al.  Using RDMA efficiently for key-value services , 2015, SIGCOMM 2015.

[25]  Haibo Chen,et al.  Fast and general distributed transactions using RDMA and HTM , 2016, EuroSys.

[26]  Nir Shavit,et al.  Transactional Locking II , 2006, DISC.

[27]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[28]  S. B. Yao,et al.  Efficient locking for concurrent operations on B-trees , 1981, TODS.

[29]  Ming Zhang,et al.  Congestion Control for Large-Scale RDMA Deployments , 2015, Comput. Commun. Rev..

[30]  Michael Stonebraker,et al.  An Evaluation of Distributed Concurrency Control , 2017, Proc. VLDB Endow..

[31]  Eddie Kohler,et al.  Speedy transactions in multicore in-memory databases , 2013, SOSP.

[32]  Torvald Riegel,et al.  A Lazy Snapshot Algorithm with Eager Validation , 2006, DISC.

[33]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.