Performance-Optimal Read-Only Transactions

Read-only transactions are critical for consistently reading data spread across a distributed storage system but have worse performance than simple, non-transactional reads. We identify three properties of simple reads that are necessary for read-only transactions to be performance-optimal, i.e., come as close as possible to simple reads. We demonstrate a fundamental tradeoff in the design of read-only transactions by proving that performance optimality is impossible to achieve with strict serializability, the strongest consistency. Guided by this result, we present PORT, a performanceoptimal design with the strongest consistency to date. Central to PORT are version clocks, a specialized logical clock that concisely captures the necessary ordering constraints. We show the generality of PORT with two applications. Scylla-PORT provides process-ordered serializability with simple writes and shows performance comparable to its nontransactional base system. Eiger-PORT provides causal consistency with write transactions and significantly improves the performance of its transactional base system.

[1]  Annette Bieniusa,et al.  Cure: Strong Semantics Meets High Availability and Low Latency , 2016, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[2]  Garth A. Gibson,et al.  PRObE: A Thousand-Node Experimental Cluster for Computer Systems Research , 2013, login Usenix Mag..

[3]  Shuai Mu,et al.  The SNOW Theorem and Latency-Optimal Read-Only Transactions , 2016, OSDI.

[4]  Willy Zwaenepoel,et al.  GentleRain: Cheap and Scalable Causal Consistency with Physical Clocks , 2014, SoCC.

[5]  Qian Li,et al.  Arachne: Core-Aware Thread Management , 2018, OSDI.

[6]  Frank Dabek,et al.  Large-scale Incremental Processing Using Distributed Transactions and Notifications , 2010, OSDI.

[7]  Fernando Pedone,et al.  Callinicos: Robust Transactional Storage for Distributed Data Structures , 2016, USENIX Annual Technical Conference.

[8]  Sudipta Sengupta,et al.  High Performance Transactions in Deuteronomy , 2015, CIDR.

[9]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[10]  Marcos K. Aguilera,et al.  Yesquel: scalable sql storage for web applications , 2014, SOSP.

[11]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[12]  Jinyang Li,et al.  Consolidating Concurrency Control and Consensus for Commits under Conflicts , 2016, OSDI.

[13]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[14]  Lorenzo Alvisi,et al.  I Can't Believe It's Not Causal! Scalable Causal Consistency with No Slowdown Cascades , 2017, NSDI.

[15]  Norman May,et al.  Distributed snapshot isolation: global transactions pay globally, local transactions pay locally , 2014, The VLDB Journal.

[16]  Marcos K. Aguilera,et al.  Consistency-based service level agreements for cloud storage , 2013, SOSP.

[17]  Mike Hibler,et al.  An integrated experimental environment for distributed systems and networks , 2002, OPSR.

[18]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[19]  Sanjeev Kumar,et al.  Existential consistency: measuring and understanding consistency at Facebook , 2015, SOSP.

[20]  Michael J. Freedman,et al.  Stronger Semantics for Low-Latency Geo-Replicated Storage , 2013, NSDI.

[21]  Sameh Elnikety,et al.  Orbe: scalable causal consistency using dependency matrices and physical clocks , 2013, SoCC.

[22]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[23]  Christos H. Papadimitriou,et al.  The serializability of concurrent database updates , 1979, JACM.

[24]  Yang Zhang,et al.  Extracting More Concurrency from Distributed Transactions , 2014, OSDI.

[25]  Barbara Liskov,et al.  Granola: Low-Overhead Distributed Transaction Coordination , 2012, USENIX Annual Technical Conference.

[26]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[27]  David P. Reed,et al.  Implementing atomic actions on decentralized data , 1983, TOCS.

[28]  Ali Ghodsi,et al.  Scalable atomic visibility with RAMP transactions , 2014, SIGMOD Conference.

[29]  Yang Wang,et al.  wPerf: Generic Off-CPU Analysis to Identify Bottleneck Waiting Events , 2018, OSDI.

[30]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[31]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[32]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[33]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[34]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[35]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[36]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[37]  Rachid Guerraoui,et al.  Causal Consistency and Latency Optimality: Friend or Foe? , 2018, Proc. VLDB Endow..

[38]  Arvind Krishnamurthy,et al.  Building consistent transactions with inconsistent replication , 2015, SOSP.

[39]  João Leitão,et al.  ChainReaction: a causal+ consistent datastore based on chain replication , 2013, EuroSys '13.

[40]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[41]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[42]  Satoshi Matsushita,et al.  Implementing linearizability at large scale and low latency , 2015, SOSP.

[43]  Marcos K. Aguilera,et al.  Sinfonia: a new paradigm for building scalable distributed systems , 2007, SOSP.