Sequential diagnosis of processor array systems

We examine the diagnosis of processor array systems formed as two-dimensional arrays, with boundaries, and either four or eight neighbors for each interior processor. We employ a parallel test schedule. Neighboring processors test each other, and report the results. Our diagnostic objective is to find a fault-free processor or set of processors. The system may then be sequentially diagnosed by repairing those processors tested faulty according to the identified fault-free set, or a job may be run on the identified fault-free processors. We establish an upper bound on the maximum number of faults which can be sustained without invalidating the test results under worst case conditions. We give test schedules and diagnostic algorithms which meet the upper bound as far as the highest order term. We compare these near optimal diagnostic algorithms to alternative algorithms, both new and already in the literature, and against an upper bound ideal case algorithm, which is not necessarily practically realizable. For eight-way array systems with N processors, an ideal algorithm has diagnosability 3N/sup 2/3/-2N/sup 1/2/ plus lower-order terms. No algorithm exists which can exceed this. We give an algorithm which starts with tests on diagonally connected processors, and which achieves approximately this diagnosability. So the given algorithm is optimal to within the two most significant terms of the maximum diagnosability. Similarly, for four-way array systems with N processors, no algorithm can have diagnosability exceeding 3N/sup 2/3//2/sup 1/3/-2N/sup 1/2/ plus lower-order terms. And we give an algorithm which begins with tests arranged in a zigzag pattern, one consisting of pairing nodes for tests in two different directions in two consecutive test stages; this algorithm achieves diagnosability (3/2)(5/2)/sup 1/3/N/sup 2/3/-(5/4)N/sup 1/2/ plus lower-order terms, which is about 0.85 of the upper bound due to an ideal algorithm.

[1]  Anand R. Tripathi,et al.  Sequential Diagnosability is Co-NP Complete , 1991, IEEE Trans. Computers.

[2]  Laurence E LaForge Feasible regions quantify the probabilistic configuration power of arrays with multiple fault types , 1996, 1996 Proceedings. Eighth Annual IEEE International Conference on Innovative Systems in Silicon.

[3]  Miroslaw Malek,et al.  A comparison connection assignment for diagnosis of multiprocessor systems , 1980, ISCA '80.

[4]  Gerald M. Masson,et al.  An 0(n2.5) Fault Identification Algorithm for Diagnosable Systems , 1984, IEEE Transactions on Computers.

[5]  GERNOT METZE,et al.  On the Connection Assignment Problem of Diagnosable Systems , 1967, IEEE Trans. Electron. Comput..

[6]  Douglas M. Blough,et al.  Analysis and experimental evaluation of comparison-based system-level diagnosis for multiprocessor systems , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[7]  Gregory F. Sullivan,et al.  A Polynomial Time Algorithm for Fault Diagnosability , 1984, FOCS.

[8]  S. Louis Hakimi,et al.  Distributed Diagnosis and the System User , 1988, IEEE Trans. Computers.

[9]  Sudhakar M. Reddy,et al.  On Self-Fault Diagnosis of the Distributed Systems , 1988, IEEE Trans. Computers.

[10]  Fabrizio Grandoni,et al.  A Theory of Diagnosability of Digital Systems , 1976, IEEE Transactions on Computers.

[11]  Sanjeev Khanna,et al.  A Graph Partitioning Approach to Sequential Diagnosis , 1997, IEEE Trans. Computers.

[12]  Vinod K. Agarwal,et al.  A Diagnosis Algorithm for Constant Degree Structures and Its Application to VLSI Circuit Testing , 1995, IEEE Trans. Parallel Distributed Syst..

[13]  James E. Smith,et al.  Self-Diagnosis in Distributed Systems , 1985, IEEE Transactions on Computers.

[14]  V. Raghavan,et al.  Improved Diagnosability Algorithms , 1991, IEEE Trans. Computers.

[15]  Kaiyuan Huang,et al.  Almost sure diagnosis of almost every good element , 1991, [Proceedings] 1991 International Workshop on Defect and Fault Tolerance on VLSI Systems.

[16]  Charles R. Kime,et al.  System Fault Diagnosis: Masking, Exposure, and Diagnosability Without Repair , 1975, IEEE Transactions on Computers.

[17]  Fabrizio Lombardi,et al.  An Adaptive System-Level Diagnosis Approach for Mesh Connected Multiprocessors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[18]  Sudhakar M. Reddy,et al.  Distributed fault-tolerance for large multiprocessor systems , 1980, ISCA '80.

[19]  L. Baldelli,et al.  Diagnosis of processor arrays , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[20]  Krishan K. Sabnani,et al.  The Comparison Approach to Multiprocessor Fault Diagnosis , 1987, IEEE Transactions on Computers.