Highly Available Primary-Backup Mechanism for Internet Services with Optimistic Consensus

We present an optimistic primary-backup (so-called passive replication) mechanism for highly available Internet services on intercloud platforms. Our proposed method aims at providing Internet services despite the occurrence of a large-scale disaster. To this end, each service in our method creates replicas in different data centers and coordinates them with an optimistic consensus algorithm instead of a majority-based consensus algorithm such as Paxos. Although our method allows temporary inconsistencies among replicas, it eventually converges on the desired state without an interruption in services. In particular, the method tolerates simultaneous failure of the majority of nodes and a partitioning of the network. Moreover, through interservice communications, members of the service groups are autonomously reorganized according to the type of failure and changes in system load. This enables both load balancing and power savings, as well as provisioning for the next disaster. We demonstrate the service availability provided by our approach for simulated failure patterns and its adaptation to changes in workload for load balancing and power savings by experiments with a prototype implementation.

[1]  David Mazières Paxos Made Practical , 2007 .

[2]  Shlomi Dolev,et al.  Self Stabilization , 2004, J. Aerosp. Comput. Inf. Commun..

[3]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[4]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[5]  Xavier Défago,et al.  Semi-passive replication and Lazy Consensus , 2004, J. Parallel Distributed Comput..

[6]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[7]  Yolande Berbers,et al.  Power-reduction techniques for data-center storage systems , 2013, CSUR.

[8]  Kazuhiko Kato,et al.  Power-Saving in Large-Scale Storage Systems with Data Migration , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[9]  Edsger W. Dijkstra,et al.  Self-stabilizing systems in spite of distributed control , 1974, CACM.

[10]  Rachid Guerraoui,et al.  Consensus service: a modular approach for building agreement protocols in distributed systems , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[11]  Shlomi Dolev,et al.  When Consensus Meets Self-stabilization , 2006, OPODIS.

[12]  Kazuhiko Kato,et al.  Self-Stabilizing Passive Replication for Internet Service Platforms , 2011, 2011 4th IFIP International Conference on New Technologies, Mobility and Security.

[13]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.