A Work-Optimal Deterministic Algorithm for the Certified Write-All Problem with a Nontrivial Number of Asynchronous Processors

Martel [C. Martel, A. Park, and R. Subramonian, SIAM J. Comput., 21 (1992), pp. 1070--1099] posed a question for developing a work-optimal deterministic asynchronous algorithm for the fundamental load-balancing and synchronization problem called Certified Write-All (CWA). In this problem, introduced in a slightly different form by Kanellakis and Shvartsman in a PODC'89 paper [P. C. Kanellakis and A. A. Shvartsman, Distributed Computing, 5 (1992), pp. 201--247], p processors must update n memory cells and only then signal the completion of the updates. It is known that solutions to this problem can be used to simulate synchronous parallel programs on asynchronous systems with worst-case guarantees for the overhead of a simulation. Such simulations are interesting because they may increase productivity in parallel computing since synchronous parallel programs are easier to reason about than are asynchronous ones. This paper presents the first solution to the question of Martel, Park, and Subramonian. Specifically, we show a deterministic asynchronous algorithm for the CWA problem. Our algorithm has the work complexity of O(n+p4log n). This work complexity is asymptotically optimal for a nontrivial number of processors $p \leq \left(n/\log n\right)^{1/4}$. In contrast, all known deterministic algorithms require superlinear in n work when p = n1/r for any fixed $r \geq 1$. Our algorithm generalizes the collision principle introduced by Buss et al. [J. Buss, P. C. Kanellakis, P. L. Ragde, and A. A. Shvartsman, J. Algorithms, 20 (1996), pp. 45--86] in 1996, which has not been previously generalized despite various attempts. Each processor maintains a collection of intervals of {1,2,...,n}. Any processor iteratively selects an interval and works from its tip toward the other tip until it finishes the work or collides with another processor. Collisions are detected effectively using a special Read-Modify-Write operation. In any case, the processor transforms its collection appropriately. Our analysis shows that the transformations preserve some structural properties of collections of intervals. This guarantees that work is assigned to processors in an efficient manner.

[1]  Richard Cole,et al.  The APRAM: incorporating asynchrony into the PRAM model , 1989, SPAA '89.

[2]  Partha Dasgupta,et al.  Parallel processing on networks of workstations: a fault-tolerant, high performance approach , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[3]  Alexander Russell,et al.  Distributed scheduling for disconnected cooperation , 2005, Distributed Computing.

[4]  L. Lovász Combinatorial problems and exercises , 1979 .

[5]  Joseph Naor,et al.  Constructions of Permutation Arrays for Certain Scheduling Cost Measures , 1995, Random Struct. Algorithms.

[6]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[7]  Maurice Herlihy,et al.  Wait-free data structures in the asynchronous PRAM model , 1990, SPAA '90.

[8]  Jan Friso Groote,et al.  An algorithm for the asynchronous Write-All problem based on process collision , 2001, Distributed Computing.

[9]  Alexander A. Shvartsman,et al.  Fault-Tolerant Parallel Computation , 1997 .

[10]  Phillip B. Gibbons A more practical PRAM model , 1989, SPAA '89.

[11]  T. J. Watson Highly Efficient Asynchronous Execution of Large-Grained Parallel Programs , 1993 .

[12]  Grzegorz Malewicz,et al.  A tight analysis and near-optimal instances of the algorithm of Anderson and Woll , 2004, Theor. Comput. Sci..

[13]  Paul G. Spirakis,et al.  Efficient robust parallel computations , 2018, STOC '90.

[14]  Friedhelm Meyer auf der Heide,et al.  Efficient PRAM simulation on a distributed memory machine , 1992, STOC '92.

[15]  Michael A. Bender,et al.  Efficient execution of nondeterministic parallel programs on asynchronous systems , 1996, SPAA '96.

[16]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[17]  Richard Cole,et al.  The expected advantage of asynchrony , 1990, SPAA '90.

[18]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[19]  Charles U. Martel,et al.  Asynchronous PRAM Algorithms for List Ranking and Transitive Closure , 1990, ICPP.

[20]  Alexander A. Shvartsman,et al.  Efficient parallel algorithms can be made robust , 1989, PODC '89.

[21]  Z. M. Kedem,et al.  Combining tentative and definite executions for very fast dependable parallel computing , 1991, STOC '91.

[22]  Ramesh Subramonian,et al.  Designing synchronous algorithms for asynchronous processors , 1992, SPAA '92.

[23]  Alexander A. Shvartsman Achieving Optimal CRCW PRAM Fault-Tolerance , 1991, Inf. Process. Lett..

[24]  Barton P. Miller,et al.  On the Complexity of Event Ordering for Shared-Memory Parallel Program Executions , 1990, ICPP.

[25]  Larry Rudolph,et al.  A Complexity Theory of Efficient Parallel Algorithms , 1990, Theor. Comput. Sci..

[26]  Dariusz R. Kowalski,et al.  Writing-all deterministically and optimally using a non-trivial number of asynchronous processors , 2004, SPAA '04.

[27]  Naomi Nishimura,et al.  Asynchronous shared memory parallel computation , 1990, SPAA '90.

[28]  Krishna V. Palem,et al.  Efficient program transformations for resilient parallel computation via randomization (preliminary version) , 1992, STOC '92.

[29]  Y. Aumann,et al.  Clock construction in fully asynchronous parallel systems and PRAM simulation , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[30]  Paul G. Spirakis,et al.  Tentative and Definite Distributed Computations: An Optimistic Approach to Network Synchronization , 1994, Theor. Comput. Sci..

[31]  Dariusz R. Kowalski,et al.  Towards practical deteministic write-all algorithms , 2001, SPAA '01.

[32]  Partha Dasgupta,et al.  CALYPSO: a novel software system for fault-tolerant parallel processing on distributed platforms , 1995, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing.

[33]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[34]  Piotr Indyk,et al.  PRAM Computations Resilient to Memory Faults , 1994, ESA.

[35]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[36]  Prabhakar Ragde,et al.  Parallel Algorithms with Processor Failures and Delays , 1996, J. Algorithms.

[37]  Grzegorz Malewicz,et al.  A work-optimal deterministic algorithm for the asynchronous certified write-all problem , 2003, PODC '03.

[38]  Richard J. Anderson,et al.  Algorithms for the Certified Write-All Problem , 1997, SIAM J. Comput..

[39]  Charles U. Martel,et al.  Work-Optimal Asynchronous Algorithms for Shared Memory Parallel Computers , 1992, SIAM J. Comput..

[40]  Charles U. Martel,et al.  On the Complexity of Certified Write-All Algorithms , 1994, J. Algorithms.