Adaptive and dynamic funnel replication in clouds

We consider the problem of strongly consistent replication in a multi data center cloud setting. This environment is characterized by high latency communication between data centers, significant fluctuations in the performance of seemingly identical virtual machines (VMs) and temporary disconnects of data centers from the rest of the cloud. In this paper we introduce the adaptive and dynamic Funnel Replication (FR) protocol that is designed to achieve high throughout and low latency for reads, to accommodate arbitrary latency/throughput tradeoffs for writes, to maximize performance in the face of VM performance variations and to provide high availability for read requests in the presence of network partitions. FR is based on the idea of flexible write dissemination topologies which enables it to achieve, per message, the desired tradeoff between latency and throughput, depending on the message size, the observed network conditions, and the importance of latency as indicated by the client. We demonstrate the benefits of flexible dissemination topologies and show that in a cloud setting with N identical replicas FR can improve the write latency up to a factor of N/2 for N ≥ 2 compared to the notable chain replication (CR) protocol at the expense of a slight decrease in the write throughput. In a setting with potentially high variability in the performance of replicas, e.g., as in Amazon EC2, FR can achieve throughput up to a factor of 16 higher than CR while also improving the latency. FR does this by adopting a topology that consists of concurrent disjoint data replication paths so that load on high throughput paths is adaptively increased while load on congested replicas is reduced.

[1]  Matt Brown,et al.  Invited talk , 2007 .

[2]  Rachid Guerraoui,et al.  Replication Techniques for Availability , 2010, Replication.

[3]  Keith Marzullo,et al.  Mencius: Building Efficient Replicated State Machine for WANs , 2008, OSDI.

[4]  Leslie Lamport,et al.  Generalized Consensus and Paxos , 2005 .

[5]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[6]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[7]  Prashant J. Shenoy,et al.  Empirical evaluation of latency-sensitive application performance in the cloud , 2010, MMSys '10.

[8]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[9]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[10]  Gustavo Alonso,et al.  Are quorums an alternative for data replication? , 2003, TODS.

[11]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[12]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[13]  Guillaume Pierre,et al.  Resource Provisioning of Web Applications in Heterogeneous Clouds , 2011, WebApps.

[14]  Margo I. Seltzer,et al.  Data Management for Internet-Scale Single-Sign-On , 2006, WORLDS.

[15]  Michael Burrows,et al.  The Chubby Lock Service for Loosely-Coupled Distributed Systems , 2006, OSDI.

[16]  Robert Griesemer,et al.  Paxos made live: an engineering perspective , 2007, PODC '07.

[17]  Christoph Lenzen,et al.  Tight bounds for clock synchronization , 2010, JACM.

[18]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[19]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[20]  Gustavo Alonso,et al.  Consistency Rationing in the Cloud: Pay only when it matters , 2009, Proc. VLDB Endow..

[21]  Michael J. Freedman,et al.  Object Storage on CRAQ: High-Throughput Chain Replication for Read-Mostly Workloads , 2009, USENIX Annual Technical Conference.

[22]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[23]  Miguel Castro,et al.  SplitStream: high-bandwidth multicast in cooperative environments , 2003, SOSP '03.

[24]  Leslie Lamport,et al.  Fast Paxos , 2006, Distributed Computing.

[25]  Dahlia Malkhi Virtually Synchronous Methodology for Dynamic Service Replication , 2010 .