Dual-Quorum: A Highly Available and Consistent Replication System for Edge Services

This paper introduces dual-quorum replication, a novel data replication algorithm designed to support Internet edge services. Edge services allow clients to access Internet services via distributed edge servers that operate on a shared collection of underlying data. Although it is generally difficult to share data while providing high availability, good performance, and strong consistency, replication algorithms designed for specific access patterns can offer nearly ideal trade-offs among these metrics. In this paper, we focus on the key problem of sharing read/write data objects across a collection of edge servers when the references to each object (1) tend not to exhibit high concurrency across multiple nodes and (2) tend to exhibit bursts of read-dominated or write-dominated behavior. Dual-quorum replication combines volume leases and quorum-based techniques to achieve excellent availability, response time, and consistency for such workloads. In particular, through both analytical and experimental evaluations, we show that the dual-quorum protocol can (for the workloads of interest) approach the optimal performance and availability of Read-One/Write-All-Asynchronously (ROWA-A) epidemic algorithms without suffering the weak consistency guarantees and resulting design complexity inherent in ROWA-A systems.

[1]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[2]  David E. Culler,et al.  Using smart clients to build scalable services , 1997 .

[3]  Lei Gao,et al.  Dual-Quorum Replication for Edge Services , 2005, Middleware.

[4]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[5]  Amin Vahdat,et al.  Design and evaluation of a conit-based continuous consistency model for replicated services , 2002, TOCS.

[6]  Robert Tappan Morris,et al.  Ivy: a read/write peer-to-peer file system , 2002, OSDI '02.

[7]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1987, SOSP '87.

[8]  Mostafa H. Ammar,et al.  The Grid Protocol: A High Performance Scheme for Maintaining Replicated Data , 1992, IEEE Trans. Knowl. Data Eng..

[9]  Aleksandar Kuzmanovic,et al.  Drafting behind Akamai (travelocity-based detouring) , 2006, SIGCOMM 2006.

[10]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[11]  Michael Dahlin,et al.  Volume Leases for Consistency in Large-Scale Systems , 1999, IEEE Trans. Knowl. Data Eng..

[12]  Mike Hibler,et al.  An integrated experimental environment for distributed systems and networks , 2002, OPSR.

[13]  Yair Amir,et al.  Replication using group communication over a partitioned network (שכפול באמצעות תקשרת קבוצות מעל רשת דינמית.) , 1995 .

[14]  Evelyn Tumlin Pierce,et al.  A Recipe for Atomic Semantics for Byzantine Quorum Systems , 2007 .

[15]  Hector Garcia-Molina,et al.  The Reliability of Voting Mechanisms , 1987, IEEE Transactions on Computers.

[16]  Geoffrey M. Voelker,et al.  Characterization of a Large Web Site Population with Implications for Content Delivery , 2004, WWW '04.

[17]  Michael K. Reiter,et al.  An Architecture for Survivable Coordination in Large Distributed Systems , 2000, IEEE Trans. Knowl. Data Eng..

[18]  Marianne Shaw,et al.  Scale and performance in the Denali isolation kernel , 2002, OSDI '02.

[19]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[20]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[21]  Michael Dahlin,et al.  Transparent Information Dissemination , 2004, Middleware.

[22]  Amr El Abbadi,et al.  Maintaining availability in partitioned replicated databases , 1987, ACM Trans. Database Syst..

[23]  E BustamanteFabián,et al.  Drafting behind Akamai (travelocity-based detouring) , 2006 .

[24]  Ramakrishna Kotla,et al.  Zyzzyva , 2007, SOSP.

[25]  Matteo Frigo,et al.  The weakest reasonable memory model , 1998 .

[26]  Michael Dahlin,et al.  Minimal Byzantine Storage , 2002, DISC.

[27]  Leslie Lamport,et al.  On interprocess communication , 1986, Distributed Computing.

[28]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[29]  Magnus Karlsson,et al.  Taming aggressive replication in the Pangaea wide-area file system , 2002, OPSR.

[30]  Robbert van Renesse,et al.  Voting with ghosts , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[31]  Mendel Rosenblum,et al.  The vMatrix: A Network of Virtual Machine Monitors for Dynamic Content Distribution , 2002 .

[32]  Joel Wein,et al.  ACMS: the Akamai configuration management system , 2005, NSDI.

[33]  Marvin Theimer,et al.  Flexible update propagation for weakly consistent replication , 1997, SOSP.

[34]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[35]  Lorenzo Alvisi,et al.  The Paxos Register , 2007, 2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007).

[36]  Hector Garcia-Molina,et al.  How to assign votes in a distributed system , 1985, JACM.

[37]  Lei Gao,et al.  Improving Availability and Performance with Application-Specific Data Replication , 2004 .

[38]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[39]  B SchneiderFred Implementing fault-tolerant services using the state machine approach: a tutorial , 1990 .

[40]  Darrell D. E. Long,et al.  Efficient dynamic voting algorithms , 1988, Proceedings. Fourth International Conference on Data Engineering.

[41]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[42]  Mostafa H. Ammar,et al.  Optimizing vote and quorum assignments for reading and writing replicated data , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.