Highly fault-tolerant parallel computation

We re-introduce the coded model of fault-tolerant computation in which the input and output of a computational device are treated as words in an error-correcting code. A computational device correctly computes a function in the coded model if its input and output, once decoded, are a valid input and output of the function. In the coded model, it is reasonable to hope to simulate all computational devices by devices whose size is greater by a constant factor but which are exponentially reliable even if each of their components can fail with some constant probability. We consider fine-grained parallel computations in which each processor has a constant probability of producing the wrong output at each time step. We show that any parallel computation that runs for time t on w processors can be performed reliably on a faulty machine in the coded model using wlog/sup 0(1/)w processors and time tlog/sup 0(1)/w. The failure probability of the computation will be at most t/spl middot/exp(-w/sup 1/4 /). The codes used to communicate with our fault-tolerant machines are generalized Reed-Solomon codes and can thus be encoded and decoded in O(nlog/sup 0(1)/n) sequential time and are independent of the machine they are used to communicate with. We also show how coded computation can be used to self-correct many linear functions in parallel with arbitrarily small overhead.

[1]  Daniel A. Spielman,et al.  Expander codes , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[2]  Volker Strassen,et al.  The Computational Complexity of Continued Fractions , 1983, SIAM J. Comput..

[3]  Volker Strassen,et al.  Algebraic Complexity Theory , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[4]  Manuel Blum,et al.  Self-testing/correcting with applications to numerical problems , 1990, STOC '90.

[5]  Volker Strassen,et al.  The computational complexity of continued fractions , 1981, SYMSAC '81.

[6]  Peter Elias,et al.  Computation in the Presence of Noise , 1958, IBM J. Res. Dev..

[7]  Nicholas Pippenger,et al.  On networks of noisy gates , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[8]  J. von Neumann,et al.  Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components , 1956 .

[9]  Dilip V. Sarwate On the complexity of decoding Goppa codes (Corresp.) , 1977, IEEE Trans. Inf. Theory.

[10]  Nicholas Pippenger,et al.  Developments in "The synthesis of reliable organ-isms from unreliable components , 1990 .

[11]  Madhu Sudan,et al.  Efficient Checking of Polynomials and Proofs and the Hardness of Appoximation Problems , 1995, Lecture Notes in Computer Science.

[12]  Rudolf Ahlswede Improvements of Winograd's result on computation in the presence of noise , 1984, IEEE Trans. Inf. Theory.

[13]  Daniel A. Spielman,et al.  Nearly-linear size holographic proofs , 1994, STOC '94.

[14]  Leonid A. Levin,et al.  Checking computations in polylogarithmic time , 1991, STOC '91.

[15]  L. Babai,et al.  On slightly superlinear transparent proofs , 1993 .

[16]  Rüdiger Reischuk,et al.  Reliable computation with noisy circuits and decision trees-a general n log n lower bound , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[17]  Michael G. Taylor Reliable information storage in memories designed from unreliable components , 1968 .

[18]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[19]  Péter Gács,et al.  Lower bounds for the complexity of reliable Boolean circuits with noisy gates , 1994, IEEE Trans. Inf. Theory.

[20]  D. Spielman,et al.  Expander codes , 1996 .

[21]  Anna Gál,et al.  Lower bounds for the complexity of reliable Boolean circuits with noisy gates , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[22]  Carsten Lund,et al.  Nondeterministic exponential time has two-prover interactive protocols , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[23]  Nicholas Pippenger Invariance of complexity measures for networks with unreliable gates , 1989, JACM.

[24]  Anna Gál,et al.  Fault tolerant circuits and probabilistically checkable proofs , 1995, Proceedings of Structure in Complexity Theory. Tenth Annual IEEE Conference.

[25]  E T. Leighton,et al.  Introduction to parallel algorithms and architectures , 1991 .

[26]  Ronitt Rubinfeld Batch Checking with Applications to Linear Functions , 1992, Inf. Process. Lett..

[27]  Péter Gács,et al.  Reliable computation with cellular automata , 1983, J. Comput. Syst. Sci..

[28]  O. Antoine,et al.  Theory of Error-correcting Codes , 2022 .

[29]  Shmuel Winograd,et al.  Coding for Logical Operations , 1962, IBM J. Res. Dev..

[30]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[31]  Jørn Justesen,et al.  On the complexity of decoding Reed-Solomon codes (Corresp.) , 1976, IEEE Trans. Inf. Theory.