Fault-secure algorithms for multiple-processor systems

In this paper we describe techniques for achieving fault secureness with low cost in multiple processor7 systems. In order to do this we consider the relationshipsN between algorithms, parallel architectures, and fault tolerance. The concept of fault-secure algorithms, described in this paper, involves the application of the ideas of fault tolerance at the system level to high-performance multiple-processor algorithms to make the results of the computation reliable. Algorithms are classified into broad classes called paradigms which are determined exclusively by the communication patterns of the processors. Fault-secure techniques are presented for three powerful paradigms: the multiplex, the recursive combination, and the multiplex-demultiplex paradigms. The basic idea used in the design of fault-tolerant algorithms is that the algorithms operate on encoded input data and produce encoded output data such that the over-head in time and number of processors is not high. This technique is distinguished by three characteristics: the encoding of the data used by the algorithm, the re-design of the algorithm to operate on the encoded data, and the distribution of the computation steps in the algorithm among the computation units.

[1]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[2]  Janak H. Patel,et al.  Concurrent Error Detection in ALU's by Recomputing with Shifted Operands , 1982, IEEE Transactions on Computers.

[3]  Jacob A. Abraham,et al.  LBW COST SCEEMES FOR FAULT TOLEEANCE IN MATRIX OPERATIONS WITH PROCESSOR ARRAYS , 1982 .

[4]  H. T. Kung Let's Design Algorithms for VLSI Systems , 1979 .

[5]  Peter J. Varman,et al.  Fault-tolerant wafer-scale architectures for VLSI , 1982, ISCA '82.

[6]  Franco P. Preparata,et al.  The cube-connected-cycles: A versatile network for parallel computation , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[7]  Kyung-Yong Chwa,et al.  Schemes for Fault-Tolerant Computing: A Comparison of Modularly Redundant and t-Diagnosable Systems , 1981, Inf. Control..

[8]  J. Goldberg,et al.  SIFT: Design and analysis of a fault-tolerant computer for aircraft control , 1978, Proceedings of the IEEE.

[9]  Richard M. Brown,et al.  The ILLIAC IV Computer , 1968, IEEE Transactions on Computers.

[10]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[11]  Jacob A. Abraham,et al.  FAULT CHARACTERIZATION OF VLSI MOS CIRCUITS. , 1982 .

[12]  Samuel H. Fuller,et al.  Cm*: a modular, multi-microprocessor , 1977, AFIPS '77.

[13]  John P. Hayes,et al.  A Graph Model for Fault-Tolerant Computing Systems , 1976, IEEE Transactions on Computers.

[14]  Richard P. Brent,et al.  The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.

[15]  J. Ian Munro,et al.  Optimal Algorithms for Parallel Polynomial Evaluation , 1971, J. Comput. Syst. Sci..

[16]  Israel Koren A reconfigurable and fault-tolerant VLSI multiprocessor array , 1981, ISCA '81.

[17]  D. A. Anderson,et al.  Design of self-checking digital networks using coding techniques , 1971 .

[18]  P BrentRichard The Parallel Evaluation of General Arithmetic Expressions , 1974 .

[19]  Franco P. Preparata,et al.  Area-Time Optimal VLSI Networks for Computing Integer Multiplications and Discrete Fourier Transform , 1981, ICALP.

[20]  David J. Kuck,et al.  The Burroughs Scientific Processor (BSP) , 1982, IEEE Transactions on Computers.

[21]  C. C. Beh,et al.  Do Stuck Fault Models Reflect Manufacturing Defects? , 1982, ITC.

[22]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[23]  Kenneth E. Batcher,et al.  Design of a Massively Parallel Processor , 1980, IEEE Transactions on Computers.

[24]  David E. Muller,et al.  Bounds to Complexities of Networks for Sorting and for Switching , 1975, JACM.

[25]  Gernot Metze,et al.  Fault Detection Capabilities of Alternating Logic , 1978, IEEE Transactions on Computers.

[26]  Derek G. Corneil,et al.  Parallel computations in graph theory , 1975, 16th Annual Symposium on Foundations of Computer Science (sfcs 1975).