The SNOW Theorem and Latency-Optimal Read-Only Transactions

Scalable storage systems where data is sharded across many machines are now the norm for Web services as their data has grown beyond what a single machine can handle. Consistently reading data across different shards requires transactional isolation for the reads. Yet a Web service may read from its data store hundreds or thousands of times for a single page load and must minimize read latency to keep response times low. Examining the read-only transaction algorithms for many recent academic and industrial scalable storage systems suggests there is a tradeoff between their power--expressed as the consistency they provide and their compatibility with other types of transactions--and their latency. We show that this tradeoff is fundamental by proving the SNOW Theorem, an impossibility result that states that no read-only transaction algorithm can provide both the lowest latency and the highest power. We then use the tight boundary from the theorem to guide the design of new read-only transaction algorithms for two scalable storage systems, COPS and Rococo. We implement our new algorithms and then evaluate them to demonstrate they provide lower latency for read-only transactions and to understand their impact on overall throughput.

[1]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[2]  Satoshi Matsushita,et al.  Implementing linearizability at large scale and low latency , 2015, SOSP.

[3]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[4]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.

[5]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[6]  Ali Ghodsi,et al.  Scalable atomic visibility with RAMP transactions , 2014, SIGMOD Conference.

[7]  Roberto Palmieri,et al.  Disjoint-Access Parallelism: Impossibility, Possibility, and Cost of Transactional Memory Implementations , 2015, PODC.

[8]  Marcos K. Aguilera,et al.  Sinfonia: a new paradigm for building scalable distributed systems , 2007, SOSP.

[9]  Haibo Chen,et al.  Fast In-Memory Transaction Processing Using RDMA and HTM , 2017, ACM Trans. Comput. Syst..

[10]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[11]  Sanjeev Kumar,et al.  Challenges to Adopting Stronger Consistency at Scale , 2015, HotOS.

[12]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[13]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[14]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[15]  Hagit Attiya,et al.  Inherent Limitations on Disjoint-Access Parallel Implementations of Transactional Memory , 2010, Theory of Computing Systems.

[16]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[17]  Sameh Elnikety,et al.  Orbe: scalable causal consistency using dependency matrices and physical clocks , 2013, SoCC.

[18]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[19]  Christos H. Papadimitriou,et al.  The serializability of concurrent database updates , 1979, JACM.

[20]  Yang Zhang,et al.  Extracting More Concurrency from Distributed Transactions , 2014, OSDI.

[21]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[22]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[23]  Jinyang Li,et al.  Consolidating Concurrency Control and Consensus for Commits under Conflicts , 2016, OSDI.

[24]  Sanjeev Kumar,et al.  Existential consistency: measuring and understanding consistency at Facebook , 2015, SOSP.

[25]  Michael J. Freedman,et al.  Stronger Semantics for Low-Latency Geo-Replicated Storage , 2013, NSDI.

[26]  Arvind Krishnamurthy,et al.  Building consistent transactions with inconsistent replication , 2015, SOSP.

[27]  João Leitão,et al.  ChainReaction: a causal+ consistent datastore based on chain replication , 2013, EuroSys '13.

[28]  Hagit Attiya,et al.  Sequential consistency versus linearizability , 1994, TOCS.

[29]  Garth A. Gibson,et al.  PRObE: A Thousand-Node Experimental Cluster for Computer Systems Research , 2013, login Usenix Mag..

[30]  Gil Neiger,et al.  Causal memory: definitions, implementation, and programming , 1995, Distributed Computing.

[31]  Marcos K. Aguilera,et al.  Transactional storage for geo-replicated systems , 2011, SOSP.

[32]  Barbara Liskov,et al.  Granola: Low-Overhead Distributed Transaction Coordination , 2012, USENIX Annual Technical Conference.