Emulating Shared-Memory Do-All Algorithms in Asynchronous Message-Passing Systems

A fundamental problem in distributed computing is performing a set of tasks despite failures and delays. Stated abstractly, the problem is to perform N tasks using P failure-prone processors. This paper studies the efficiency of emulating shared-memory task-performing algorithms on asynchronous message-passing processors with quantifiable message latency. Efficiency is measured in terms of work and communication, and the challenge is to obtain subquadratic work and message complexity. While prior solutions assumed synchrony and constant delays, the solutions given here yields subquadratic efficiency with asynchronous processors when the delays and failures is suitably constrained. The solutions replicate shared objects using a quorum system, provided it is not disabled. One algorithm has subquadratic work and communication when the delays and the number of processors, K, owning object replicas, are O(P 0.41). It tolerates \(\lceil \frac{K-1}{2}\rceil\) crashes. It is also shown that there exists an algorithm that has subquadratic work and communication and that tolerates o(P) failures, provided message delays are sublinear.

[1]  Samir Khuller Efficient Robust Parallel Computations (Extended Abstract) , 1990 .

[2]  Z. M. Kedem,et al.  Combining tentative and definite executions for dependable parallel computing , 1990 .

[3]  Hagit Attiya,et al.  Sharing memory robustly in message-passing systems , 1990, PODC '90.

[4]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[5]  Moti Yung,et al.  Time-optimal message-efficient work performance in the presence of faults , 1994, PODC '94.

[6]  Moti Yung,et al.  Resolving message complexity of Byzantine Agreement and beyond , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[7]  Joseph Y. Halpern,et al.  Performing Work Efficiently in the Presence of Faults , 1998, SIAM J. Comput..

[8]  Dariusz R. Kowalski,et al.  Bounding Work and Communication in Robust Cooperative Computation , 2002, DISC.

[9]  Richard J. Anderson,et al.  Algorithms for the Certified Write-All Problem , 1997, SIAM J. Comput..

[10]  Dariusz R. Kowalski,et al.  Performing work with asynchronous processors: message-delay-sensitive bounds , 2003, PODC '03.

[11]  Alexander Russell,et al.  Work-competitive scheduling for cooperative computing with dynamic groups , 2003, STOC '03.

[12]  Nancy A. Lynch,et al.  RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks , 2002, DISC.

[13]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[14]  Prabhakar Ragde,et al.  Parallel Algorithms with Processor Failures and Delays , 1996, J. Algorithms.

[15]  Chryssis Georgiou,et al.  Efficient gossip and robust distributed computation , 2005, Theor. Comput. Sci..

[16]  Harriet Ortiz,et al.  Proceedings of the twenty-second annual ACM symposium on Theory of computing , 1990, STOC 1990.

[17]  Chryssis Georgiou,et al.  Cooperative computing with fragmentable and mergeable groups , 2003, J. Discrete Algorithms.

[18]  Z. M. Kedem,et al.  Combining tentative and definite executions for very fast dependable parallel computing , 1991, STOC '91.

[19]  Dariusz R. Kowalski,et al.  Towards practical deteministic write-all algorithms , 2001, SPAA '01.

[20]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[21]  Shlomi Dolev,et al.  Dynamic load balancing with group communication , 2006, Theor. Comput. Sci..

[22]  Alexander A. Shvartsman,et al.  Fault-Tolerant Parallel Computation , 1997 .

[23]  Alexander Russell,et al.  Distributed scheduling for disconnected cooperation , 2005, Distributed Computing.