Bayesian analysis for fault location in homogeneous distributed systems

A simple and practical probabilistic comparison-based model, employing multiple incomplete test concepts, for handling fault location in distributed systems using a Bayesian analysis procedure is proposed. This approach is more practical and complete than previous ones since it does not assume any conditions such as permanently faulty units, complete tests, perfect environments, or non-malicious environments. Fault-free systems are handled without overhead; hence, the test procedure may be used to monitor a functioning system. Given a system S with a specific test graph, the corresponding conditional distribution between the comparison test results (syndrome) and the fault patterns of S can be generated. To avoid the complex global Bayesian estimation process, a simple bitwise Bayesian algorithm is developed for fault location in S, which locates system failures with linear complexity, suitable for hard real-time systems.<<ETX>>

[1]  Douglas M. Blough,et al.  Fault diagnosis for sparsely interconnected multiprocessor systems , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[2]  James E. Smith Universal System Diagnosis Algorithms , 1979, IEEE Transactions on Computers.

[3]  Miroslaw Malek,et al.  A comparison connection assignment for diagnosis of multiprocessor systems , 1980, ISCA '80.

[4]  S. Louis Hakimi,et al.  Characterization of Connection Assignment of Diagnosable Systems , 1974, IEEE Transactions on Computers.

[5]  GERNOT METZE,et al.  On the Connection Assignment Problem of Diagnosable Systems , 1967, IEEE Trans. Electron. Comput..

[6]  Yu Lo Cyrus Chang,et al.  An inference design for fault location in real-time control systems , 1994, J. Syst. Softw..

[7]  Edward J. McCluskey Verification Testing - A Pseudoexhaustive Test Technique , 1984, IEEE Trans. Computers.

[8]  Arthur D. Friedman,et al.  Incomplete Fault Coverage In Modular Miltiprocessor Systems , 1978, ACM Annual Conference.

[9]  L. C. Lander,et al.  Bayesian inference for fault diagnosis in real-time distributed systems , 1993, Proceedings of 1993 IEEE 2nd Asian Test Symposium (ATS).

[10]  Kang G. Shin,et al.  Optimal multiple syndrome probabilistic diagnosis , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[11]  Abhijit Sengupta,et al.  On self-diagnosable multiprocessor systems: diagnosis by the comparison approach , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[12]  Robert S. Swarz,et al.  The theory and practice of reliable system design , 1982 .

[13]  Andrzej Pelc,et al.  Distributed probabilistic fault diagnosis for multiprocessor systems , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[14]  Gerald M. Masson,et al.  Diagnosable Systems for Intermittent Faults , 1978, IEEE Transactions on Computers.

[15]  S. Louis Hakimi,et al.  On Models for Diagnosable Systems and Probabilistic Fault Diagnosis , 1976, IEEE Transactions on Computers.

[16]  Charles R. Kime,et al.  System Fault Diagnosis: Masking, Exposure, and Diagnosability Without Repair , 1975, IEEE Transactions on Computers.

[17]  Ying-Wah Ng,et al.  Reliability modeling and analysis for fault-tolerant computers. , 1976 .

[18]  Sampath Rangarajan,et al.  Probabilistic diagnosis of multiprocessor systems with arbitrary connectivity , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[19]  Charles R. Kime,et al.  System Fault Diagnosis: Closure and Diagnosability with Repair , 1975, IEEE Transactions on Computers.

[20]  Kyung-Yong Chwa,et al.  Schemes for Fault-Tolerant Computing: A Comparison of Modularly Redundant and t-Diagnosable Systems , 1981, Inf. Control..

[21]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[22]  Kang G. Shin,et al.  Location of a Faulty Module in a Computing System , 1990, IEEE Trans. Computers.

[23]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[24]  Krishan K. Sabnani,et al.  The Comparison Approach to Multiprocessor Fault Diagnosis , 1987, IEEE Transactions on Computers.