Probabilistic approaches to fault detection in networked discrete event systems

In this paper, we consider distributed systems that can be modeled as finite state machines with known behavior under fault-free conditions, and we study the detection of a general class of faults that manifest themselves as permanent changes in the next-state transition functionality of the system. This scenario could arise in a variety of situations encountered in communication networks, including faults occurred due to design or implementation errors during the execution of communication protocols. In our approach, fault diagnosis is performed by an external observer/diagnoser that functions as a finite state machine and which has access to the input sequence applied to the system but has only limited access to the system state or output. In particular, we assume that the observer/diagnoser is only able to obtain partial information regarding the state of the given system at intermittent time intervals that are determined by certain synchronizing conditions between the system and the observer/diagnoser. By adopting a probabilistic framework, we analyze ways to optimally choose these synchronizing conditions and develop adaptive strategies that achieve a low probability of aliasing, i.e., a low probability that the external observer/diagnoser incorrectly declares the system as fault-free. An application of these ideas in the context of protocol testing/classification is provided as an example.

[1]  E. K. Gannett,et al.  THE INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS , 1965 .

[2]  Gregor von Bochmann,et al.  FSM-based incremental conformance testing methods , 2004, IEEE Transactions on Software Engineering.

[3]  Teruo Higashino,et al.  A conformance testing method for communication protocols modeled as concurrent DFSMs. Treatment of non-observable non-determinism , 2001, Proceedings 15th International Conference on Information Networking.

[4]  Alexander Graham,et al.  Kronecker Products and Matrix Calculus: With Applications , 1981 .

[5]  Jana Kosecka,et al.  Control of Discrete Event Systems , 1992 .

[6]  Steven H. Low,et al.  Probabilistic conformance testing of protocols with unobservable transitions , 1993, 1993 International Conference on Network Protocols.

[7]  John Odentrantz,et al.  Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.

[8]  C.S. Hood,et al.  Probabilistic network fault detection , 1996, Proceedings of GLOBECOM'96. 1996 IEEE Global Telecommunications Conference.

[9]  M. V. Iordache,et al.  Diagnosis and Fault-Tolerant Control , 2007, IEEE Transactions on Automatic Control.

[10]  Niraj K. Jha,et al.  Fault-tolerant computer system design , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[11]  Raja Sengupta,et al.  Diagnosability of discrete-event systems , 1995, IEEE Trans. Autom. Control..

[12]  Enrico Macii,et al.  Property verification of communication protocols based on probabilistic reachability analysis , 1996, Proceedings of the 39th Midwest Symposium on Circuits and Systems.

[13]  Albert Benveniste,et al.  Diagnosis of asynchronous discrete-event systems: a net unfolding approach , 2003, IEEE Trans. Autom. Control..

[14]  Michele Favalli,et al.  Aliasing in signature analysis testing with multiple input shift registers , 1990, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[15]  Raymond E. Miller,et al.  Fault management using passive testing for mobile IPv6 networks , 2001, GLOBECOM'01. IEEE Global Telecommunications Conference (Cat. No.01CH37270).

[16]  C. Hadjicostis,et al.  Aliasing Probability Calculations in Testing Sequential Circuits , 2007 .

[17]  Mischa Schwartz,et al.  Simple finite-state fault detectors for communication networks , 1992, IEEE Trans. Commun..

[18]  T. Williams,et al.  Aliasing errors in linear automata used as multiple-input signature analyzers , 1990 .

[19]  Mischa Schwartz,et al.  Fault identification using a finite state machine model with unreliable partially observed data sequences , 1993, IEEE Trans. Commun..

[20]  Luca Console,et al.  Readings in Model-Based Diagnosis , 1992 .

[21]  Symeon Papavassiliou,et al.  Adaptive and automated detection of service anomalies in transaction-oriented WANs: network analysis, algorithms, implementation, and deployment , 2000, IEEE Journal on Selected Areas in Communications.

[22]  Stéphane Lafortune,et al.  Failure diagnosis using discrete event models , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[23]  Shahin Hashtrudi-Zad,et al.  Fault diagnosis in discrete-event systems: framework and model reduction , 2003, IEEE Trans. Autom. Control..

[24]  Christos G. Cassandras,et al.  Introduction to Discrete Event Systems , 1999, The Kluwer International Series on Discrete Event Dynamic Systems.

[25]  Antonio Ramírez-Treviño,et al.  Observability of discrete event systems modeled by interpreted Petri nets , 2003, IEEE Trans. Robotics Autom..

[26]  Stéphane Lafortune,et al.  Failure diagnosis using discrete-event models , 1996, IEEE Trans. Control. Syst. Technol..

[27]  P. Ramadge,et al.  Supervisory control of a class of discrete event processes , 1987 .

[28]  David Lee,et al.  Passive testing and applications to network management , 1997, Proceedings 1997 International Conference on Network Protocols.

[29]  Venkat Venkatasubramanian,et al.  Model-based reasoning in diagnostic expert systems for chemical process plants , 1987 .

[30]  K. A. Arisha,et al.  On fault location in networks by passive testing , 2000, Conference Proceedings of the 2000 IEEE International Performance, Computing, and Communications Conference (Cat. No.00CH37086).

[31]  Gregor von Bochmann,et al.  Diagnostic tests for communicating finite state machines , 1993, Proceedings of Phoenix Conference on Computers and Communications.

[32]  Albert Benveniste,et al.  Markov nets: probabilistic models for distributed and concurrent systems , 2003, IEEE Trans. Autom. Control..

[33]  Stéphane Lafortune,et al.  Active diagnosis of discrete event systems , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[34]  Robert H. Deng,et al.  Models and algorithms for network fault detection and identification: a review , 1992, [Proceedings] Singapore ICCS/ISITA `92.

[35]  David Lee,et al.  Principles and methods of testing finite state machines-a survey , 1996, Proc. IEEE.

[36]  W. M. Wonham,et al.  The control of discrete event systems , 1989 .