Optimal adaptive fault diagnosis for simple multiprocessor systems

We studied adaptive system-level fault diagnosis for multiprocessor systems. Processors can test each other and future tests can be selected on the basis of previous test results. Fault-free testers give always correct test results, while faulty testers are completely unreliable. The aim of diagnosis is to determine correctly the fault status of all processors. We present adaptive diagnosis algorithms for systems modeled by trees, rings, and tori. These algorithms use the smallest possible number of tests in each case. Our results also imply optimal diagnosis for more general systems, assuming a small number of faults. The cost of adaptive diagnosis were found to be significantly smaller than that of classical (one-step) diagnosis.

[1]  GERNOT METZE,et al.  On the Connection Assignment Problem of Diagnosable Systems , 1967, IEEE Trans. Electron. Comput..

[2]  S. Louis Hakimi,et al.  Characterization of Connection Assignment of Diagnosable Systems , 1974, IEEE Transactions on Computers.

[3]  Pavel M. Blecher,et al.  On a logical problem , 1983, Discret. Math..

[4]  S. Louis Hakimi,et al.  On Adaptive System Diagnosis , 1984, IEEE Transactions on Computers.

[5]  Andrzej Pelc,et al.  Undirected Graph Models for System-Level Fault Diagnosis , 1991, IEEE Trans. Computers.

[6]  Douglas M. Blough,et al.  Efficient Diagnosis of Multiprocessor Systems under Probabilistic Models , 1992, IEEE Trans. Computers.

[7]  Miroslaw Malek,et al.  The consensus problem in fault-tolerant computing , 1993, CSUR.

[8]  Dhiraj K. Pradhan,et al.  Fault-tolerant computer system design , 1996 .

[9]  Eli Upfal,et al.  Reliable Fault Diagnosis with Few Tests , 1998, Comb. Probab. Comput..