Assessing the Impact of Imperfect Diagnosis on Service Reliability: A Parsimonious Model Approach

Modelling imperfect diagnosis performance in service reliability models can help identify best recovery strategies and diagnosis settings. In this work, a parsimonious Markov model of imperfect diagnosis is proposed. Capturing complex diagnosis behavior in the model is non-trivial. In our approach, representative diagnosis performance metrics have been defined and their closed-form solutions obtained for the Markov model. These equations enable model parameterization from traces of implemented diagnosis components. The diagnosis model has been integrated in a reliability model assessing the impact of diagnosis imperfections on reliability for time-constrained SCTP/TCP-based services. This enables: (a) a model-based sensitivity analysis of the service reliability to the diagnosis performance metrics, and (b) investigation of whether the chosen metrics provide a sufficiently detailed characterization of the diagnosis functions for the studied reliability problem. In a simulation study we finally analyze trade-off properties of diagnosis heuristics from literature, map them to the analytic Markov model, and investigate its suitability for service reliability optimization.

[1]  Charles M. Grinstead,et al.  Introduction to probability , 1999, Statistics for the Behavioural Sciences.

[2]  H.-P. Schwefel,et al.  Model based evaluation of policies for end-node driven fault recovery , 2009, 2009 7th International Workshop on Design of Reliable Communication Networks.

[3]  Andrea Bondavalli,et al.  Design and Evaluation of a Safe Driver Machine Interface , 2009 .

[4]  Mingyan Liu,et al.  Analysis of TCP transient behavior and its effect on file transfer latency , 2003, IEEE International Conference on Communications, 2003. ICC '03..

[5]  Paramvir Bahl,et al.  Towards highly reliable enterprise network services via inference of multi-level dependencies , 2007, SIGCOMM.

[6]  Victor C. M. Leung,et al.  A new method to support UMTS/WLAN vertical handover using SCTP , 2003, 2003 IEEE 58th Vehicular Technology Conference. VTC 2003-Fall (IEEE Cat. No.03CH37484).

[7]  C.S. Hood,et al.  Probabilistic network fault detection , 1996, Proceedings of GLOBECOM'96. 1996 IEEE Global Telecommunications Conference.

[8]  Maitreya Natu,et al.  Probabilistic Fault Diagnosis Using Adaptive Probing , 2007, DSOM.

[9]  Andrea Bondavalli,et al.  Threshold-Based Mechanisms to Discriminate Transient from Intermittent Faults , 2000, IEEE Trans. Computers.

[10]  Malgorzata Steinder,et al.  Probabilistic fault localization in communication systems using belief networks , 2004, IEEE/ACM Transactions on Networking.

[11]  Andrea Bondavalli,et al.  Hidden Markov Models as a Support for Diagnosis: Formalization of the Problem and Synthesis of the Solution , 2006, 2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06).

[12]  Lester Lipsky,et al.  Queueing Theory: A Linear Algebraic Approach , 1992 .

[13]  William H. Sanders,et al.  Automatic model-driven recovery in distributed systems , 2005, 24th IEEE Symposium on Reliable Distributed Systems (SRDS'05).

[14]  Jesper Grønbæk,et al.  Probabilistic Network Fault-Diagnosis Using Cross-Layer Observations , 2009, 2009 International Conference on Advanced Information Networking and Applications.

[15]  John N. Tsitsiklis,et al.  Introduction to Probability , 2002 .