A Framework for Automatic Debugging of Functional and Degradation Failures

Software diagnosis is a particularly challenging problem for modern systems, which may consist of dozens, if not hundreds, of components computing on concurrent and potentially distributed platforms, and using infrastructure and services built by many organizations. We propose a framework that generalizes state-of-the-art classical reasoning-based fault diagnosis which tolerates observation uncertainty and addresses degradation of quality of service. Empirical evaluation involving 27 000 highly realistic synthetic scenarios demonstrates an average accuracy improvement of 20% (with 99% statistical significance) which is considerable in the domain of Software Fault Localization (SFL). We measure the improvement in accuracy on well-established SFL performance metrics. Introduction One of the most important way to improve the trustworthiness of software systems is to increase their robustness in the face of (runtime) failures. While design-time methods are useful in improving confidence in software (e.g., [3, 13, 16, 19, 22, 31]), they cannot by themselves eliminate the possibility of run-time failures, which are induced by a variety of factors largely outside the control of the organization producing that software: faults in runtime infrastructure and components provided by third-parties, unpredictable loads, variable resources, and malicious attempts to break a system. Moreover, as mentioned in [12], the distinction between “healthy” and “broken” is often indistinct and fuzzy, and there is a gradual transition, over time, between these two states [12]. Consequently, stakeholders must take increasing responsibility for improving the trustworthiness of their systems through building automatic runtime problem detection and repair [12, 23]. Diagnosis for today’s complex systems, however, is particularly challenging. First, the presence of concurrency makes it difficult to identify which computation might have caused a problem. Second, reliance on middleware for distributed communication, and more generally the use of components and infrastructure produced by many organizations, means that in many cases neither specifications nor code is available for all parts of the system. Third, in many systems, problems may be intermittent, caused by transient faults or variability in loads. Fourth, many of the “faults” that we care about are reflected indirectly by violation of a systems quality of service goals, such as degradation of response latency, rather than by a direct failure such as a server or system crash. Such “soft” faults may 1 University of Porto and HASLab / INESC TEC, Portugal. email: nunopcardoso@gmail.com 2 Palo Alto Research Center, Inc, USA. email: rui@parc.com 3 Palo Alto Research Center, Inc, USA. email: afeldman@parc.com 4 Palo Alto Research Center, Inc, USA. email: dekleer@parc.com be difficult to detect and diagnose [12]. Consequently, although fault diagnosis has been studied extensively for both hardware and software systems as a development time activity, the ability to do this at run-time (i.e., while the system is operational) in a systematic way for complex systems has remained an elusive goal [8]. As no behavioral models are typically available, current approaches to software diagosis abstract the system under analysis in terms of component activity and correct/incorrect behavior, notably lacking mechanisms to encode soft faults. We propose a framework that generalizes state-of-the-art classical reasoning-based fault diagnosis (such as, Spectrum-based Fault Localization (SFL) [4], GDE [19]) to accommodate functional and degradatation failures. In particular, the framework is capable of reasoning under uncertainty (there is a variety of sources of uncertainty as the ability to observe the behavior of a system may be limited by the kinds of monitoring infrastructure available) and handle soft faults. In many cases the existence of a fault is linked to degradation of quality of service. For example, high latencies of responses to queries may indicate that servers are overloaded, that a network connection is faulty, or both. Our framework improves the classical reasoning approach in 65% of the cases and achieved at least equal performance in 94% of the cases. The overall relative improvement in the diagnostic quality was of 20% on average, with a 99% confidence interval. This paper makes the following contributions: • We discuss the limitations imposed by the classical reasoningbased fault diagnosis; • We propose a generalization of the classical reasoning-based diagnostic framework aimed at improving its accuracy when diagnosing soft faults; • We compare the accuracies of the classical and our novel approach using a simulation-based setup, which has been shown to be able to generate realistic scenarios [9]. Reasoning-Based Diagnosis In this section we introduce concepts and definitions used throughout the paper, as well as the reasoning-based SFL approach to diagnosis. Definition 1 (Diagnostic System). A diagnostic system DS is a set of components COMPS = {c1, c2, . . . , cm}. The type of systems we consider typically consists of hundreds to thousands of components. These components can be code-blocks, assembly-level instructions or whole state machines. The systems can be distributed, hybrid, and can contain network components such as routers and balancers. ECAI 2016 G.A. Kaminka et al. (Eds.) © 2016 The Authors and IOS Press. This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0). doi:10.3233/978-1-61499-672-9-569 569 Definition 2 (Transaction). A transaction 〈A, e〉 is a pair containing A ⊆ COMPS and the transaction outcome e ∈ {0, 1}. Transactions are typically computed by executing programs or program tasks and recording success or failure e. The program components that participate in A are instrumented by using debugger-like methods [14]. The convention is that e = 1 means failure and e = 0 means success. Transactions where e = 1 are also known as conflicts [19]. A conflict represents a set of components that cannot be simultaneously healthy to explain the observed erroneous behavior. Definition 3 (Hit Spectrum). A hit spectrum A is a set of transactions A = {〈A1, e1〉, 〈A2, e2〉, · · · , 〈An, en〉}. We assume that an external diagnostic engine [21, 28, 11, 1, 5] computes a set of diagnoses. These diagnoses are used as an input to our algorithm. Definition 4 (Diagnosis). A diagnosis 〈d,Pr(d)〉 is defined by a set of components d ⊆ COMPS and a prior probability Pr(d). The prior probability Pr(d) estimates to what extent a candidate, without further evidence, is responsible for the system’s malfunction. To define Pr(d), let pj denote the prior probability that a component cj is at fault. The value of pj is application dependent. In the context of development-time fault localization, pj is often approximated as pj = 1/1000, i.e., 1 fault for each 1000 lines of code [7]. Assuming that components fail independently, the prior probability for a particular diagnosis d is given by Pr(d) = ∏ j∈d pj · ∏ j∈COMPS\d (1− pj) (1) When the pj are equal the larger the candidate the smaller its a priori probability will be. This leads us to our main goal which is to apply a Bayes conditioning rule. Definition 5 (Bayes Conditioning). Given a diagnosis d and a hit spectrum A, the a posteriori probability Pr(d|A) is: Pr(d|A) = Pr(d) · ∏ i∈1,2,...,N Pr(Ai, ei | d) Pr(Ai) (2) In order to characterize the optimality of our algorithm we need to compute posteriori probability (given A) for a whole set of diagnoses and to rank them. This gives us our main computational problem: Problem 1 (Diagnostic Ordering). Given a set of diagnoses D = {〈dk,Pr(dk)〉} for k ∈ {1, 2, . . . ,K} and a hit spectrum A, compute Pr(dk|A) and order D such that: ∀dk ∈ D : Pr(dk|A) ≥ Pr(dk+1|A) (3) In what follows we describe the relevant aspects of the classical reasoning-based SFL approach to address the ranking problem [3, 19]. To simplify computation we assume conditional independence throughout the process, i.e., our Bayes classifier is naı̈ve. The denominator Pr(Ai) is a normalizing term that is identical for all d ∈ D and needs not to be calculated for ranking purposes as it does not alter the rank order. To bias the prior probability taking run-time information (i.e., observations) into account, Pr(Ai, ei | d) (referred to as likelihood) is defined as Pr(Ai, ei | d) = { G(d,Ai) if ei = 0 1− G(d,Ai) otherwise (4) G(d,Ai) (referred to as transaction goodness) is used to account for the fact that components may fail intermittently, estimating the probability of nominal system behavior under an activation pattern Ai and a diagnostic candidate d. Let gj (referred to as component goodness) denote the probability that a component cj performs nominally. Considering that all components must perform nominally to observe a nominal system behavior, G(d,Ai) is defined as G(d,Ai) = ∏ j∈(d∩Ai) gj (5) In scenarios where the values for gj are not otherwise available, those values can be estimated by maximizing Pr(A, e | d) (Maximum Likelihood Estimation (MLE) for naı̈ve Bayes classifier) under parameters {gj | j ∈ d ∧ 0 ≤ gj ≤ 1} [3]. Approach In this section we discuss how degradation failures can be more accurately detected/represented and how the diagnostic framework presented in the previous section can be enhanced to more accurately diagnose such kind of errors. Fuzzy Error Detection The first challenge in diagnosing soft failures related to their detection. Existent approaches to error detection (e.g., [8], SFL [4], and GDE [19]) make use of first-order logic descriptions of the correct behavior of the system (weak-fault models) to assign transactions to one of two possible sets: the pass set and the fail set (P and F respectively, where F = P ). A consequence of such fault models is the crisp distinction between correct and incorrect

[1]  Rui Abreu,et al.  A Low-Cost Approximate Minimal Hitting Set Algorithm and its Application to Model-Based Diagnosis , 2009, SARA.

[2]  Andy Zaidman,et al.  Improving Service Diagnosis through Increased Monitoring Granularity , 2013, 2013 IEEE 7th International Conference on Software Security and Reliability.

[3]  Peter Zoeteweij,et al.  A New Bayesian Approach to Multiple Intermittent Fault Diagnosis , 2009, IJCAI.

[4]  Franz Wotawa,et al.  Spectrum Enhanced Dynamic Slicing for better Fault Localization , 2012, ECAI.

[5]  Eric A. Brewer,et al.  Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.

[6]  Rui Abreu,et al.  Threats to the validity and value of empirical assessments of the accuracy of coverage-based fault locators , 2013, ISSTA.

[7]  Rajeev Gandhi,et al.  Kahuna: Problem diagnosis for Mapreduce-based cloud computing environments , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.

[8]  Rui Abreu,et al.  MHS2: A Map-Reduce Heuristic-Driven Minimal Hitting Set Search Algorithm , 2013, MUSEPAT.

[9]  Brian C. Williams,et al.  Mode Estimation of Probabilistic Hybrid Systems , 2002, HSCC.

[10]  Rui Abreu,et al.  Online Spectrum-based Fault Localization for Health Monitoring and Fault Recovery of Self-Adaptive Systems , 2012, International Conference on Autonomic and Autonomous Systems.

[11]  L. Zadeh Probability measures of Fuzzy events , 1968 .

[12]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[13]  Debanjan Ghosh,et al.  Self-healing systems - survey and synthesis , 2007, Decis. Support Syst..

[14]  Gregory M. Provan,et al.  Computing Minimal Diagnoses by Greedy Stochastic Search , 2008, AAAI.

[15]  Rajeev Gandhi,et al.  Black-Box Problem Diagnosis in Parallel File Systems , 2010, FAST.

[16]  Brian C. Williams,et al.  Diagnosing Multiple Faults , 1987, Artif. Intell..

[17]  Meir Kalech,et al.  Using Model-Based Diagnosis to Improve Software Testing , 2014, AAAI.

[18]  Markus Stumptner,et al.  Model-Based Debugging using Multiple Abstract Models , 2003, ArXiv.

[19]  Yu Qi,et al.  Bp Neural Network-Based Effective Fault Localization , 2009, Int. J. Softw. Eng. Knowl. Eng..

[20]  A.J.C. van Gemund,et al.  On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[21]  Franz Wotawa,et al.  A variant of Reiter's hitting-set algorithm , 2001, Inf. Process. Lett..

[22]  Rui Abreu,et al.  A Kernel Density Estimate-Based Approach to Component Goodness Modeling , 2013, AAAI.

[23]  Gregg Rothermel,et al.  An empirical investigation of program spectra , 1998, PASTE '98.

[24]  Peter Zoeteweij,et al.  Spectrum-Based Multiple Fault Localization , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[25]  Luca Console,et al.  Readings in Model-Based Diagnosis , 1992 .

[26]  Bradley R. Schmerl,et al.  Diagnosing architectural run-time failures , 2013, 2013 8th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS).

[27]  Stéphane Lafortune,et al.  Failure diagnosis using discrete event models , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[28]  Robert LIN,et al.  NOTE ON FUZZY SETS , 2014 .

[29]  Rui Abreu,et al.  Spectrum-Based Sequential Diagnosis , 2011, AAAI.