Towards practical deteministic write-all algorithms

The problem of performing <i>t</i> tasks on <i>n</i> asynchronous or undependable processors is a basic problem in parallel and distributed computing. We consider an abstraction of this problem called the <i>Write-All</i> problem— <i>using n processors write 1's into all locations of an array of size t</i>. The most efficient known deterministic asynchronous algorithms for this problem are due to Anderson and Woll. The first class of algorithms has <i>work</i> complexity of <i>&Ogr;</i>(<i>t</i> . <i>n</i> <sup>ε</sup>), for <i>n</i> ≰ <i>t</i>y and any ε > 0, and they are the best known for the full range of processors (<i>n</i> = <i>t</i>). To schedule the work of the processors, the algorithms use sets of <i>q</i> permutations on [<i>q</i>] (<i>q</i> ≰ <i>n</i>) that have certain combinatorial properties. Instantiating such an algorithm for a specific ε either requires substantial pre-processing (exponential in 1/ε<sup>2</sup>) to find the requisite permutations, or imposes a prohibitive constant (exponential in 1/ε<sup>3</sup>) hidden by the asymptotic analysis. The second class deals with the specific case of <i>t</i> = <i>n<sup>u</sup>, u</i> ≰ 2, and these algorithms have work complexity of <i>&Ogr;</i>(<i>t</i> log <i>t</i>). They also use sets of permutations with the same combinatorial properties. However instantiating these algorithms requires exponential in <i>n</i> preprocessing to find the permutations. To alleviate this costly instantiation Kanellakis and Shvartsman proposed a simple way of computing the permutation schedules. They conjectured that their construction has the desired properties but they provided no analysis. In this paper we show, for the first time, an analysis of the properties of the set of permutations proposed by Kanellakis and Shvartsman. Our result is hybrid as it includes analytical and empirical parts. The analytical result covers a subset of the possible adversarial patterns of asynchrony. The empirical results provide strong evidence that our analysis covers the worst case scenario, and we formally state it as a conjecture. We use these results to analyze an algorithm for <i>t</i> = <i>n<sup>u</sup> </i> (<i>u</i> ⪈ 2), tasks, that takes advantage of processor slackness and that has work <i>&Ogr;</i>(<i>t</i> log<sup>2</sup> <i>t</i>), conditioned on our conjecture. This algorithm requires only <i>&Ogr;</i>(<i>n</i> log <i>n</i>) time to instantiate it. Next we study the case for the full range of processors <i>n</i> ≰ <i>t</i>. We define a family of deterministic asynchronous <i>Write-All</i> algorithms with work <i>&Ogr;</i>(<i>t</i> . <i>n</i> <sup>ε</sup>) contingent upon our conjecture. We show that our method yields a faster construction of <i>&Ogr;</i>(<i>t</i> . <i>n</i> <sup>ε</sup>) <i>Write-All</i> algorithms than the method developed by Anderson and Woll. Finally we show that our approach yields more efficient <i>Write-all</i> algorithms as compared to the algorithms induced by the constructions of Naor and Roth for the same asymptotic work complexity.

[1]  Jan Friso Groote,et al.  RAPPORT Waitfree Distributed Memory Management by Create , and Read Until Deletion ( CRUD ) , 1999 .

[2]  Joseph Naor,et al.  Constructions of Permutation Arrays for Certain Scheduling Cost Measures , 1995, Random Struct. Algorithms.

[3]  Yonatan Aumann,et al.  Highly efficient asynchronous execution of large-grained parallel programs , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[4]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[5]  Richard M. Karp,et al.  Parallel Algorithms for Shared-Memory Machines , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[6]  Franco P. Preparata,et al.  Deterministic P-RAM Simulation with Constant Redundancy , 1991, Inf. Comput..

[7]  Paul G. Spirakis,et al.  “Dynamic-fault-prone BSP”: a paradigm for robust computations in changing environments , 1998, SPAA '98.

[8]  Richard Cole,et al.  The APRAM: incorporating asynchrony into the PRAM model , 1989, SPAA '89.

[9]  Charles U. Martel,et al.  Work-Optimal Asynchronous Algorithms for Shared Memory Parallel Computers , 1992, SIAM J. Comput..

[10]  Alexander A. Shvartsman,et al.  Efficient parallel algorithms can be made robust , 1989, PODC '89.

[11]  Alexander A. Shvartsman Achieving Optimal CRCW PRAM Fault-Tolerance , 1991, Inf. Process. Lett..

[12]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[13]  Donald E. Knuth,et al.  The Art of Computer Programming, Vol. 2 , 1981 .

[14]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[15]  Mihalis Yannakakis,et al.  Towards an Architecture-Independent Analysis of Parallel Algorithms , 1990, SIAM J. Comput..

[16]  Paul G. Spirakis,et al.  Lectures on parallel computation , 1993 .

[17]  Richard Cole,et al.  The expected advantage of asynchrony , 1990, SPAA '90.

[18]  Naomi Nishimura,et al.  Asynchronous shared memory parallel computation , 1990, SPAA '90.

[19]  Partha Dasgupta,et al.  Parallel processing on networks of workstations: a fault-tolerant, high performance approach , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[20]  Richard J. Anderson,et al.  Algorithms for the Certified Write-All Problem , 1997, SIAM J. Comput..

[21]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[22]  Leslie G. Valiant,et al.  General Purpose Parallel Architectures , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[23]  Alexander A. Shvartsman,et al.  Fault-Tolerant Parallel Computation , 1997 .

[24]  Phillip B. Gibbons A more practical PRAM model , 1989, SPAA '89.

[25]  Prabhakar Ragde,et al.  Parallel Algorithms with Processor Failures and Delays , 1996, J. Algorithms.

[26]  Jan Friso Groote,et al.  An algorithm for the asynchronous Write-All problem based on process collision , 2001, Distributed Computing.

[27]  R. Guy Unsolved Problems in Number Theory , 1981 .

[28]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[29]  Komaravolu Chandrasekharan,et al.  Introduction to Analytic Number Theory , 1969 .

[30]  Paul G. Spirakis,et al.  Efficient Robust Parallel Computations (Extended Abstract) , 1990, STOC 1990.

[31]  R. Subramonian,et al.  Asynchronous PRAMs are (almost) as good as synchronous PRAMs , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[32]  Krishna V. Palem,et al.  Efficient program transformations for resilient parallel computation via randomization (preliminary version) , 1992, STOC '92.

[33]  Y. Aumann,et al.  Clock construction in fully asynchronous parallel systems and PRAM simulation , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[34]  Andrew Chi-Chih Yao,et al.  Analysis of the subtractive algorithm for greatest common divisors , 1975, SIGS.