Highly Fault-Tolerant Parallel Computation Extended Abstract*

We re-introduce the coded model of fault-tolerant computation in which the input and output of a computational device are treated as words in an errorcorrecting code. A computational device correctly computes a function in the coded model if its input and output, once decoded, are a valid input and output of the function. In the coded model, it is reasonable to hope to simulate all computational devices by devices whose size is greater by a constant factor but which are exponentially reliable euen if each of their components can fail with some constant probability. We consider fine-grained parallel computations in which each processor has a constant probability of producing the wrong output at each time step. We show that any parallel computation that runs for time t on w processors can be performed reliably on a faulty machine in the coded model using w logo(’) w processors and time t logo(’) w. The failure probability of the computation will be at most t . e~p(-w’/~). The codes used to communicate with our faulttolerant machines are generalized Reed-Solomon codes and can thus be encoded and decoded in 0 nlogo(’) sequential time and are independent of the machine they are used to communicate with. We also show how coded computation can be used to self-correct many linear functions in parallel with arbitrarily small overhead.

[1]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[2]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[3]  Ronitt Rubinfeld Batch Checking with Applications to Linear Functions , 1992, Inf. Process. Lett..

[4]  Rüdiger Reischuk,et al.  Reliable computation with noisy circuits and decision trees-a general n log n lower bound , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[5]  D. Spielman,et al.  Expander codes , 1996 .

[6]  Madhu Sudan,et al.  Efficient Checking of Polynomials and Proofs and the Hardness of Appoximation Problems , 1995, Lecture Notes in Computer Science.

[7]  Leonid A. Levin,et al.  Checking computations in polylogarithmic time , 1991, STOC '91.

[8]  Rudolf Ahlswede Improvements of Winograd's result on computation in the presence of noise , 1984, IEEE Trans. Inf. Theory.

[9]  Daniel A. Spielman,et al.  Nearly-linear size holographic proofs , 1994, STOC '94.

[10]  Nicholas Pippenger Invariance of complexity measures for networks with unreliable gates , 1989, JACM.

[11]  Manuel Blum,et al.  Self-testing/correcting with applications to numerical problems , 1990, STOC '90.

[12]  Shmuel Winograd,et al.  Coding for Logical Operations , 1962, IBM J. Res. Dev..

[13]  Péter Gács,et al.  Lower bounds for the complexity of reliable Boolean circuits with noisy gates , 1994, IEEE Trans. Inf. Theory.

[14]  J. H. van Lint,et al.  Introduction to Coding Theory , 1982 .

[15]  Nicholas Pippenger,et al.  On networks of noisy gates , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[16]  J. von Neumann,et al.  Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components , 1956 .

[17]  Volker Strassen,et al.  The Computational Complexity of Continued Fractions , 1983, SIAM J. Comput..

[18]  Anna Gál,et al.  Fault tolerant circuits and probabilistically checkable proofs , 1995, Proceedings of Structure in Complexity Theory. Tenth Annual IEEE Conference.