The Importance of Being Positive in Causal Statistical Fault Localization: Important Properties of Baah et al.'s CSFL Regression Model

This paper investigates the performance of Baah et al.'s causal regression model for fault localization when an important precondition for causal inference, called positivity, is violated. Two kinds of positivity violations are considered: structural and random ones. We prove that random, but not structural nonpositivity may harm the performance of Baah et al.'s causal estimator. To address the problem of random nonpositivity, we propose a modification to the way suspiciousness scores are assigned. Empirical results are presented that indicate it improves the performance of Baah et al.'s technique. We also present a probabilistic characterization of Baah et al.'s estimator, which provides a more efficient way to compute it.

[1]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[2]  Andy Podgurski,et al.  Mitigating the confounding effects of program dependences for effective fault localization , 2011, ESEC/FSE '11.

[3]  Andy Podgurski,et al.  Causal inference for statistical fault localization , 2010, ISSTA '10.

[4]  Mark J. van der Laan,et al.  Data-adaptive Selection Of The Adjustment Set In Variable Importance Estimation , 2008 .

[5]  Andy Podgurski,et al.  JavaPDG: A New Platform for Program Dependence Analysis , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[6]  J. Robins,et al.  Estimating causal effects from epidemiological data , 2006, Journal of Epidemiology and Community Health.

[7]  Feng Cao,et al.  MFL: Method-Level Fault Localization with Causal Inference , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[8]  Richard K. Crump,et al.  Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand , 2006, SSRN Electronic Journal.

[9]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[10]  Kristin E. Porter,et al.  Diagnosing and responding to violations in the positivity assumption , 2012, Statistical methods in medical research.

[11]  A.J.C. van Gemund,et al.  On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[12]  M. Petersen,et al.  Diagnosing Bias in the Inverse Probability of Treatment Weighted Estimator Resulting from Violation of Experimental Treatment Assignment , 2006 .

[13]  John T. Stasko,et al.  Visualization of test information to assist fault localization , 2002, ICSE '02.

[14]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[15]  Raluca Mihăescu,et al.  RE: ‘‘TRENDS IN ASTHMA PREVALENCE AND INCIDENCE IN ONTARIO, , 2011 .

[16]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[17]  Ross Gore,et al.  Reducing confounding bias in predicate-level statistical debugging metrics , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[18]  M. Hernán A definition of causal effect for epidemiological research , 2004, Journal of Epidemiology and Community Health.

[19]  Thomas Ball,et al.  What's in a region?: or computing control dependence regions in near-linear time for reducible control flow , 1993, LOPL.