Efficient Parallel Computing with Memory Faults

In this paper we show two results on PRAM with constant fraction of memory faults. First we show how to preprocess (i.e. connect a constant fraction of processors into a binary tree) a faulty EREW PRAM with n/log n processors and O(n) memory cells in O(log n) time. The preprocessing is a basic step of simulations from [7, 9, 17]. Our algorithm, together with the results from [17], gives a first fully work-optimal randomized simulations of EREW on EREW with faults with logarithmic overhead. In the second part of this paper, we consider the CRCW PRAM with memory faults. We show that (after O(log* n)-time preprocessing) any algorithm for O(n)-processor PRAM can be simulated with optimal work in O(log* n) time on CRCW with memory faults. The simulation improves the result of [7]. All simulations assume static faults, i.e. that the errors are determined before the computation starts and that no new errors occur during the computation.

[1]  Friedhelm Meyer auf der Heide,et al.  Fault-Tolerant Shared Memory Simulations , 1996, STACS.

[2]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[3]  Noga Alon,et al.  Explicit construction of linear sized tolerant networks , 1988, Discret. Math..

[4]  Alexander A. Shvartsman,et al.  Efficient parallel algorithms can be made robust , 1989, PODC '89.

[5]  Andrzej Pelc,et al.  Fast Deterministic Simulation of Computations on Faulty Parallel Machines , 1995, ESA.

[6]  Z. M. Kedem,et al.  Combining tentative and definite executions for very fast dependable parallel computing , 1991, STOC '91.

[7]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[8]  Alexander A. Shvartsman,et al.  Controlling Memory Access Concurrency in Efficient Fault-Tolerant Parallel Algorithms , 1995, Nord. J. Comput..

[9]  Richard M. Karp,et al.  Parallel Algorithms for Shared-Memory Machines , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[10]  Wojciech Rytter,et al.  Efficient parallel algorithms , 1988 .

[11]  David S. Greenberg,et al.  Computing with faulty shared memory , 1992, PODC '92.

[12]  Daniel A. Spielman,et al.  Expander codes , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[13]  Krzysztof Diks,et al.  Reliable Computations on Faulty EREW PRAM , 1996, Theor. Comput. Sci..

[14]  Aravind Srinivasan,et al.  Chernoff-Hoeffding bounds for applications with limited independence , 1995, SODA '93.

[15]  Zvi Galil,et al.  Explicit Constructions of Linear-Sized Superconcentrators , 1981, J. Comput. Syst. Sci..

[16]  Piotr Indyk,et al.  PRAM Computations Resilient to Memory Faults , 1994, ESA.

[17]  Uzi Vishkin,et al.  Recursive Star-Tree Parallel Data Structure , 1993, SIAM J. Comput..

[18]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[19]  Alan Siegel,et al.  On universal classes of fast high performance hash functions, their time-space tradeoff, and their applications , 1989, 30th Annual Symposium on Foundations of Computer Science.

[20]  Torben Hagerup,et al.  The Log-Star Revolution , 1992, STACS.

[21]  Michael O. Rabin,et al.  Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.

[22]  Friedhelm Meyer auf der Heide,et al.  Hashing Strategies for Simulating Shared Memory on Distributed Memory Machines , 1992, Heinz Nixdorf Symposium.

[23]  Piotr Indyk,et al.  Shared-Memory Simulations on a Faulty-Memory DMM , 1996, ICALP.

[24]  Torben Hagerup,et al.  Fast and reliable parallel hashing , 1991, SPAA '91.

[25]  Piotr Indyk,et al.  On Word-Level Parallelism in Fault-Tolerant Computing , 1996, STACS.

[26]  Paul G. Spirakis,et al.  Efficient robust parallel computations , 2018, STOC '90.

[27]  Torben Hagerup,et al.  A Guided Tour of Chernoff Bounds , 1990, Inf. Process. Lett..

[28]  Prabhakar Ragde,et al.  Parallel Algorithms with Processor Failures and Delays , 1996, J. Algorithms.