Consistency , Availability , and Convergence

We examine the limits of consistency in highly available and fault-tolerant distributed storage systems. We introduce a new property—convergence—to explore the these limits in a useful manner. Like consistency and availability, convergence formalizes a fundamental requirement of a storage system: writes by one correct node must eventually become observable to other connected correct nodes. Using convergence as our driving force, we make two additional contributions. First, we close the gap between what is known to be impossible (i.e. the consistency, availability, and partition-tolerance theorem) and known systems that are highly-available but that provide weaker consistency such as causal. Specifically, in an asynchronous system, we show that natural causal consistency, a strengthening of causal consistency that respects the real-time ordering of operations, provides a tight bound on consistency semantics that can be enforced without compromising availability and convergence. In an asynchronous system with Byzantine-failures, we show that it is impossible to implement many of the recently introduced forking-based consistency semantics without sacrificing either availability or convergence. Finally, we show that it is not necessary to compromise availability or convergence by showing that there exist practically useful semantics that are enforceable by available, convergent, and Byzantine-fault tolerant systems.

[1]  Alexander Siegel Performance in flexible distributed file systems , 1992 .

[2]  Dan Dobre,et al.  Abortable Fork-Linearizable Storage , 2009, OPODIS.

[3]  Hagit Attiya,et al.  Sequential consistency versus linearizability , 1994, TOCS.

[4]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[5]  Lei Gao,et al.  PRACTI Replication , 2006, NSDI.

[6]  Arvind Krishnamurthy,et al.  Turning the postal system into a generic digital communication mechanism , 2004, SIGCOMM '04.

[7]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[8]  Michael K. Reiter,et al.  Low-overhead byzantine fault-tolerant storage , 2007, SOSP.

[9]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[10]  Victor Luchangco,et al.  Computation-centric memory models , 1998, SPAA '98.

[11]  Catherine C. Marshall,et al.  Cimbiosys: a platform for content-based partial replication , 2009, NSDI 2009.

[12]  Michael K. Reiter,et al.  On Consistency of Encrypted Files , 2006, DISC.

[13]  Brian A. Coan,et al.  Limitations on database availability when networks partition , 1986, PODC '86.

[14]  Idit Keidar,et al.  Fail-Aware Untrusted Storage , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[15]  L. Alvisi,et al.  Minimal Byzantine Quorums , 2007 .

[16]  Eric A. Brewer,et al.  TierStore: A Distributed Filesystem for Challenged Networks in Developing Regions , 2008, FAST.

[17]  David Mazières,et al.  Beyond One-Third Faulty Replicas in Byzantine Fault Tolerant Systems , 2007, NSDI.

[18]  Dennis Shasha,et al.  Building secure file systems out of byzantine storage , 2002, PODC '02.

[19]  Michael K. Reiter,et al.  Byzantine quorum systems , 1997, STOC '97.

[20]  L. Alvisi,et al.  ASTRO : Autonomous and Trustworthy Data Sharing , 2008 .

[21]  John S. Heidemann,et al.  Resolving File Conflicts in the Ficus File System , 1994, USENIX Summer.

[22]  John S. Heidemann,et al.  Implementation of the Ficus Replicated File System , 1990, USENIX Summer.

[23]  Abhi Shelat,et al.  Efficient fork-linearizable access to untrusted shared memory , 2007, PODC '07.

[24]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[25]  Gil Neiger,et al.  Causal memory: definitions, implementation, and programming , 1995, Distributed Computing.

[26]  Sangmin Lee,et al.  Upright cluster services , 2009, SOSP '09.

[27]  Richard A. Golding A Weak-Consistency Architecture for Distributed Information Services , 1992, Comput. Syst..

[28]  PRACTI Replication ( Extended version ) , 2005 .

[29]  Petr Kuznetsov,et al.  Zeno: Eventually Consistent Byzantine-Fault Tolerance , 2009, NSDI.

[30]  Marvin Theimer,et al.  Flexible update propagation for weakly consistent replication , 1997, SOSP.

[31]  Srinath T. V. Setty,et al.  Depot: Cloud Storage with Minimal Trust , 2010, TOCS.

[32]  Mahadev Satyanarayanan,et al.  Disconnected Operation in the Coda File System , 1999, Mobidata.

[33]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[34]  Liuba Shrira,et al.  Providing high availability using lazy replication , 1992, TOCS.

[35]  Sarita V. Adve,et al.  Shared Memory Consistency Models: A Tutorial , 1996, Computer.

[36]  Roger M. Needham,et al.  Grapevine: an exercise in distributed computing , 1982, CACM.

[37]  Idit Keidar,et al.  Fork sequential consistency is blocking , 2009, Inf. Process. Lett..

[38]  Werner Vogels,et al.  Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. , 2022 .

[39]  Leslie Lamport,et al.  Interprocess Communication , 2020, Practical System Programming with C.

[40]  Robert Tappan Morris,et al.  Ivy: a read/write peer-to-peer file system , 2002, OSDI '02.

[41]  Roy Friedman,et al.  On the composability of consistency conditions , 2003, Inf. Process. Lett..

[42]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[43]  John Lane,et al.  Byzantine replication under attack , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[44]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[45]  Mahadev Satyanarayanan,et al.  Coda: a highly available file system for a distributed workstation environment , 1989, Proceedings of the Second Workshop on Workstation Operating Systems.

[46]  Liuba Shrira,et al.  Lazy replication: exploiting the semantics of distributed services (extended abstract) , 1990, OPSR.

[47]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[48]  Dennis Shasha,et al.  Secure Untrusted Data Repository (SUNDR) , 2004, OSDI.

[49]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[50]  Gil Neiger,et al.  Causal Memory , 1991, WDAG.

[51]  Matteo Frigo,et al.  The weakest reasonable memory model , 1998 .