论文信息 - How to Choose a Timing Model

How to Choose a Timing Model

When employing a consensus algorithm for state machine replication, should one optimize for the case that all communication links are usually timely or for fewer timely links? Does optimizing a protocol for better message complexity hamper the time complexity? In this paper, we investigate these types of questions using mathematical analysis as well as experiments over PlanetLab (WAN) and a LAN. We present a new and efficient leader-based consensus protocol that has O(n) stable-state message complexity (in a system with n processes) and requires only O(n) links to be timely at stable times. We compare this protocol with several previously suggested protocols. Our results show that a protocol that requires fewer timely links can achieve better performance, even if it sends fewer messages.

Idit Keidar | Alexander Shraer

[1] David E. Culler,et al. Operating Systems Support for Planetary-Scale Network Services , 2004, NSDI.

[2] Mikel Larrea,et al. Optimal implementation of the weakest failure detector for solving consensus , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[3] Michel Raynal,et al. Eventual Leader Election with Weak Assumptions on Initial Knowledge, Communication Reliability, and Synchrony , 2006, DSN.

[4] Ulrich Schmid. How to model link failures: a perception-based fault model , 2001, 2001 International Conference on Dependable Systems and Networks.

[5] Danny Dolev,et al. Evaluating Total Order Algorithms in WAN , 2003 .

[6] Idit Keidar,et al. The Overhead of Indulgent Failure Recovery , 2006 .

[7] Rachid Guerraoui,et al. Fast Indulgent Consensus with Zero Degradation , 2002, EDCC.

[8] Nancy A. Lynch,et al. An introduction to input/output automata , 1989 .

[9] Rachid Guerraoui. Revistiting the Relationship Between Non-Blocking Atomic Commitment and Consensus , 1995, WDAG.

[10] Sam Toueg,et al. The weakest failure detector for solving consensus , 1996, JACM.

[11] Idit Keidar,et al. Timeliness, failure-detectors, and consensus performance , 2006, PODC '06.

[12] Nancy A. Lynch,et al. Consensus in the presence of partial synchrony , 1988, JACM.

[13] Péter Urbán,et al. Comparison of failure detectors and group membership: performance study of two atomic broadcast algorithms , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[14] Ulrich Schmid. Failure Model Coverage under Transient Link Failures , 2008 .

[15] Dahlia Malkhi,et al. Omega Meets Paxos: Leader Election and Stability Without Eventual Timely Links , 2005, DISC.

[16] Leslie Lamport,et al. The Implementation of Reliable Distributed Multiprocess Systems , 1978, Comput. Networks.

[17] Rachid Guerraoui,et al. The information structure of indulgent consensus , 2004, IEEE Transactions on Computers.

[18] Leslie Lamport,et al. The part-time parliament , 1998, TOCS.

[19] Rachid Guerraoui,et al. The overhead of consensus failure recovery , 2007, Distributed Computing.

[20] Marcos K. Aguilera,et al. Stable Leader Election , 2001, DISC.

[21] Rachid Guerraoui,et al. How fast can eventual synchrony lead to consensus? , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[22] Idit Keidar,et al. Evaluating the running time of a communication round over the internet , 2002, PODC '02.

[23] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[24] Flaviu Cristian,et al. The Timed Asynchronous Distributed System Model , 1999, IEEE Trans. Parallel Distributed Syst..