Distributed Repair of Nondiagnosability

Automated fault diagnosis has significant practical impact by improving reliability and facilitating maintenance of systems [1]. Given a monitor continuously receiving observations from a dynamic eventdriven system, diagnosis algorithms infer possible fault events that explain the observations. For many applications, it is not sufficient to identify what faults could have occurred; rather, one wishes to know what faults have definitely occurred. Computing the latter requires diagnosability of the system, that is, the guarantee that the occurrence of a fault can be detected with certainty after a finite number of subsequent observations [2]. This paper defines a distributed framework that assists in assessing and improving the diagnosability of discrete-event systems. In this context, a system is diagnosable iff the presence or absence of each unobservable fault event can always be deduced once sufficiently many subsequent observable events have occurred. Otherwise, the system must be altered, for example by adding additional sensors, to allow to discriminate between ambiguous system behaviours. If the system is not diagnosable, additional sensors are required to distinguish the ambiguous system behaviours. Several past approaches deal with the problem of selecting sensor placements to ensure diagnosability of a system. However, the problem of computing an optimal sensor set with minimal size has a complexity exponential in the number of possible sensor placements [6]. Existing sensor placement algorithms are based on a global representation of the system, which may not be computable for large systems. In this paper we address the diagnosability problem in a distributed way by identifying those system behaviours that require modification to restore diagnosability. In fact, we show how to determine those subsystems whose modification is guaranteed to make the entire system diagnosable.

[1]  Raja Sengupta,et al.  Diagnosability of discrete-event systems , 1995, IEEE Trans. Autom. Control..

[2]  Stéphane Lafortune,et al.  On the computational complexity of some problems arising in partially-observed discrete-event systems , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[3]  Yannick Pencolé,et al.  Scalable Diagnosability Checking of Event-Driven Systems , 2007, IJCAI.

[4]  Characterization of diagnosability and repairability for self-healing Web Services , 2007 .

[5]  Markus Stumptner,et al.  Semantic Web Service Composition by Consistency-Based Model Refinement , 2007, The 2nd IEEE Asia-Pacific Service Computing Conference (APSCC 2007).

[6]  Gianfranco Lamperti,et al.  Diagnosis of Active Systems , 1998, ECAI.