Time and Communication Efficient Consensus for Crash Failures

This paper is about consensus solutions optimized simultaneously for the time and communication complexities. Synchronous message passing with processors prone to crashes is the computing environment. The number f of crashes can be arbitrary as long as it is smaller than the number n of processors in the system. As a building block to our consensus solutions, we consider the gossiping problem in which processors have input rumors and the goal of every processor is to learn all the rumors of the processors that have not crashed. We show that gossiping can be achieved by a deterministic algorithm working in ${{\mathcal O}}(\log^3 n)$ time and sending ${{\mathcal O}}(n\log^4 n)$ point-to-point messages. These results improve upon the best previously known deterministic solution of gossiping that operated in ${{\mathcal O}}(\log^2 n)$ time and generated ${{\mathcal O}}(n^{1+\varepsilon})$ messages, for any constant e>0. The efficient gossiping algorithm is applied to the problem of reaching consensus. In the Consensus problem, each processor starts with its input value and the goal is to have all processors agree on exactly one value among the inputs. First we develop a deterministic algorithm solving Consensus in ${{\mathcal O}}(n)$ time while sending ${{\mathcal O}}(n \log^5 n)$ messages. The best previously known algorithms solving Consensus in ${{\mathcal O}}(n)$ time had the message complexity bounded by ${{\mathcal O}}(n^{1+\varepsilon})$, for any constant e>0. Next we improve the Consensus solution so that it is early stopping, which means that it terminates in ${{\mathcal O}}(f+1)$ time, where f is the number of crashes in an execution, while preserving the message complexity ${{\mathcal O}}(n \log^5 n)$.

[1]  A. Wigderson,et al.  ENTROPY WAVES, THE ZIG-ZAG GRAPH PRODUCT, AND NEW CONSTANT-DEGREE , 2004, math/0406038.

[2]  Robbert van Renesse,et al.  A Gossip-Style Failure Detection Service , 2009 .

[3]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[4]  Andrzej Pelc,et al.  Dissemination of Information in Communication Networks - Broadcasting, Gossiping, Leader Election, and Fault-Tolerance , 2005, Texts in Theoretical Computer Science. An EATCS Series.

[5]  Avi Wigderson,et al.  Entropy waves, the zig-zag graph product, and new constant-degree expanders and extractors , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[6]  Jon M. Kleinberg,et al.  Protocols and impossibility results for gossip-based communication mechanisms , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[7]  N. Ling The Mathematical Theory of Infectious Diseases and its applications , 1978 .

[8]  Dariusz R. Kowalski,et al.  Explicit Combinatorial Structures for Cooperative Distributed Algorithms , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[9]  Richard M. Karp,et al.  Randomized rumor spreading , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[10]  Dariusz R. Kowalski,et al.  Gossiping to reach consensus , 2002, SPAA '02.

[11]  Joseph Y. Halpern,et al.  Performing work efficiently in the presence of faults , 1992, PODC '92.

[12]  Chryssis Georgiou,et al.  Efficient gossip and robust distributed computation , 2005, Theor. Comput. Sci..

[13]  Nancy A. Lynch,et al.  A Lower Bound for the Time to Assure Interactive Consistency , 1982, Inf. Process. Lett..

[14]  Jon M. Kleinberg,et al.  Spatial gossip and resource location protocols , 2001, JACM.

[15]  Amnon Ta-Shma,et al.  Loss-less condensers, unbalanced expanders, and extractors , 2001, STOC '01.

[16]  Yoram Moses,et al.  Fully Polynomial Byzantine Agreement for n > 3t Processors in t + 1 Rounds , 1998, SIAM J. Comput..

[17]  Nicholas Pippenger,et al.  Sorting and Selecting in Rounds , 1987, SIAM J. Comput..

[18]  Dariusz R. Kowalski,et al.  Collective asynchronous reading with polylogarithmic worst-case overhead , 2004, STOC '04.

[19]  Scott Shenker,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[20]  Vassos Hadzilacos,et al.  On the message complexity of binary byzantine agreement under crash failures , 1992, Distributed Computing.

[21]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[22]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[23]  Moti Yung,et al.  Resolving message complexity of Byzantine Agreement and beyond , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[24]  S. Louis Hakimi,et al.  Information Dissemination in Distributed Systems With Faulty Units , 1994, IEEE Trans. Computers.

[25]  Alexander Grey,et al.  The Mathematical Theory of Infectious Diseases and Its Applications , 1977 .

[26]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[27]  Moti Yung,et al.  Time-optimal message-efficient work performance in the presence of faults , 1994, PODC '94.