A message-based fault diagnosis procedure

A new diagnostic message protocol that provides fault diagnosis capabilities for the communications in a distributed system environment is described. The protocol is designed to operate in conjunction with a standard end-to-end communication protocol and uses special messages to determine the system fault state. A diagnosis message is represented using a test dependency model that is derived from the system topology. These messages are used by an adaptive strategy designed to achieve specific objectives such as reduced testing cost. Using the test dependency model, a general purpose algorithm is developed for generating these strategies based on an information theory criterion. Specific properties of the protocol are discussed, and several examples of strategies for a distributed system topology are provided.

[1]  Pramod K. Varshney,et al.  Application of information theory to the construction of efficient decision trees , 1982, IEEE Trans. Inf. Theory.

[2]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[3]  Willie Y.-P. Lim A test strategy for packet switching networks , 1982, ICPP.

[4]  GERNOT METZE,et al.  On the Connection Assignment Problem of Diagnosable Systems , 1967, IEEE Trans. Electron. Comput..

[5]  Bernard M. E. Moret,et al.  Decision Trees and Diagrams , 1982, CSUR.

[6]  Laveen N. Kanal,et al.  Problem-Solving Models and Search Strategies for Pattern Recognition , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Parag K. Lala,et al.  Fault tolerant and fault testable hardware design , 1985 .

[8]  Howard Jay Siegel,et al.  Fault location techniques for distributed control interconnection networks , 1985, IEEE Transactions on Computers.

[9]  Richard C. T. Lee,et al.  Application of game tree searching techniques to sequential pattern recognition , 1971, CACM.

[10]  Pamela K. Fink Control and Integration of Diverse Knowledge in a Diagnostic Expert System , 1985, IJCAI.

[11]  Frank J. Pipitone,et al.  Model-Based Probabilistic Reasoning for Electronics Troubleshooting , 1983, IJCAI.

[12]  Kyung-Yong Chwa,et al.  On Fault Identification in Diagnosable Systems , 1981, IEEE Transactions on Computers.

[13]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[14]  Charles R. Kime,et al.  System Fault Diagnosis: Closure and Diagnosability with Repair , 1975, IEEE Transactions on Computers.

[15]  H. Raymond Strong,et al.  Problems in Maintaining Agreement , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[16]  Charles R. Kime,et al.  System Fault Diagnosis: Masking, Exposure, and Diagnosability Without Repair , 1975, IEEE Transactions on Computers.

[17]  Paul D. Ezhilchelvan,et al.  A Characterisation of Faults in Systems , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[18]  Y. F. Lam,et al.  Reliability Modeling and Analysis of Communication Networks with Dependent Failures , 1986, IEEE Trans. Commun..

[19]  Tse-Yun Feng,et al.  Fault-Diagnosis for a Class of Multistage Interconnection Networks , 1981, IEEE Trans. Computers.

[20]  Sally A. Bruso A Failure Detection and Notification Protocol for Distributed Computing Systems , 1985, IEEE International Conference on Distributed Computing Systems.

[21]  Pramod K. Varshney,et al.  Application of Information Theory to Sequential Fault Diagnosis , 1982, IEEE Transactions on Computers.

[22]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[23]  Joe W. Duran,et al.  A General Expert System Design for Diagnostic Problem Solving , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  E. J. Kletsky,et al.  An Application of the Information Theory Approach to Failure Diagnosis , 1960 .

[25]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[26]  F. P. Coakley,et al.  Fault diagnosis of SPC switching systems based on structure and signalling , 1985, Softw. Microsystems.