Self-diagnosis of regular arrays of processors

Abstract This correspondence deals with self diagnosis and system level diagnosis of regular arrays of N processor units, where each unit can test its L neighbors. It is first shown that such an array is one-step L -fault diagnosable iff N ⩾ 2 L + 1. To produce an array which is t -fault diagnosable, for larger values of t than L , the following four methods are investigated: (1) Limit the fault patterns in the system; (2) Employ sequential diagnosis; (3) Employ inexact ( t / s ) diagnosis; (4) Make the diagnosis independent of t . If the distance between faulty units is at least d + 1, then an n × n array is one-step n ⌊ n /( d + 1)⌋ — fault diagnosable if d > 1 and n > d + 1. Simple algorithms for identifying faulty units are presented. If N > t , then the array is at most ⌈ t / L ⌉ — step t -fault diagnosable. Also, the array is one step t / s diagnosable iff N ⩾ s + L + 1. The exact expression for s is presented, and in most cases s ≈ t 2 /4 L . A relationship between diagnosis resolution, i.e. the number of good units diagnosed as potentially being faulty, and the number of steps required in the sequential diagnostic procedure is derived. By increasing the number of steps in the test process, the diagnostic resolution can be reduced. The paper concludes with some comments on the control unit required to decode the syndrome and identify good and faulty units.

[1]  Gerald M. Masson,et al.  Greedy Diagnosis of Hybrid Fault Situations , 1983, IEEE Transactions on Computers.

[2]  Jacob A. Abraham,et al.  LBW COST SCEEMES FOR FAULT TOLEEANCE IN MATRIX OPERATIONS WITH PROCESSOR ARRAYS , 1982 .

[3]  Cauligi S. Raghavendra Fault Tolerance in Regular Network Architectures , 1984, IEEE Micro.

[4]  Arthur D. Friedman,et al.  Analysis of Digital Systems Using a New Measure of System Diagnosis , 1979, IEEE Transactions on Computers.

[5]  James E. Smith Universal System Diagnosis Algorithms , 1979, IEEE Transactions on Computers.

[6]  Israel Koren A reconfigurable and fault-tolerant VLSI multiprocessor array , 1981, ISCA '81.

[7]  Sudhakar M. Reddy,et al.  Distributed fault-tolerance for large multiprocessor systems , 1980, ISCA '80.

[8]  Seyed Hossein Hosseini On Fault-Tolerant Structure, Distributed Fault-Diagnosis, Reconfiguration, and Recovery of the Array Processors , 1989, IEEE Trans. Computers.

[9]  Gerard G. L. Meyer A fault diagnosis algorithm for asymmetric modular architectures , 1981, IEEE Transactions on Computers.

[10]  Larry D. Wittie Efficient message routing in Mega-Micro-Computer networks , 1976, ISCA.

[11]  GERNOT METZE,et al.  On the Connection Assignment Problem of Diagnosable Systems , 1967, IEEE Trans. Electron. Comput..

[12]  H. T. Kung,et al.  Systolic Arrays for (VLSI). , 1978 .

[13]  S. Louis Hakimi,et al.  Characterization of Connection Assignment of Diagnosable Systems , 1974, IEEE Transactions on Computers.

[14]  Vinod K. Agarwal,et al.  A Generalized Theory for System Level Diagnosis , 1987, IEEE Transactions on Computers.

[15]  Fabrizio Grandoni,et al.  A Theory of Diagnosability of Digital Systems , 1976, IEEE Transactions on Computers.

[16]  Ed Greenwood VLSI Array Processor , 1982 .

[17]  Robert S. Swarz,et al.  The theory and practice of reliable system design , 1982 .

[18]  Larry D. Wittie,et al.  Communication Structures for Large Networks of Microcomputers , 1981, IEEE Transactions on Computers.

[19]  John A. McPherson,et al.  Diagnosis in the Presence of Known Faults , 1984, IEEE Transactions on Computers.

[20]  Gerald M. Masson,et al.  An Efficient Fault Diagnosis Algorithm for Symmetric Multiple Processor Architectures , 1978, IEEE Transactions on Computers.

[21]  Paul Losleben,et al.  Advanced Research in VLSI , 1987 .

[22]  Gerard G. L. Meyer,et al.  A Diagnosis Algorithm for the BGM System Level Fault Model , 1984, IEEE Transactions on Computers.

[23]  H. T. Kung,et al.  Fault-Tolerance and Two-Level Pipelining in VLSI Systolic Arrays , 1983 .

[24]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[25]  Che-Liang Yang,et al.  On Fault Isolation and Identification in t1/t1-Diagnosable Systems , 1986, IEEE Transactions on Computers.