Causal Program Dependence Analysis

We introduce Causal Program Dependence Analysis (CPDA), a dynamic dependence analysis that applies causal inference to model the strength of program dependence relations in a continuous space. CPDA observes the association between program elements by constructing and executing modified versions of a program. One advantage of CPDA is that this construction requires only light-weight parsing rather than sophisticated static analysis. The result is a collection of observations based on how often a change in the value produced by a mutated program element affects the behavior of other elements. From this set of observations, CPDA discovers a causal structure capturing the causal (i.e., dependence) relation between program elements. Qualitative evaluation finds that CPDA concisely expresses key dependence relationships between program elements. As an example application, we apply CPDA to the problem of fault localization. Using minimal test suites, our approach can rank twice as many faults compared to SBFL.

[1]  Joseph Robert Horgan,et al.  Fault localization using execution slices and dataflow tests , 1995, Proceedings of Sixth International Symposium on Software Reliability Engineering. ISSRE'95.

[2]  Matti Järvisalo,et al.  Learning Optimal Cyclic Causal Graphs from Interventional Data , 2020, PGM.

[3]  Clark Glymour,et al.  A million variables and more: the Fast Greedy Equivalence Search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images , 2016, International Journal of Data Science and Analytics.

[4]  Rui Abreu,et al.  A Survey on Software Fault Localization , 2016, IEEE Transactions on Software Engineering.

[5]  Thomas W. Reps,et al.  The use of program dependence graphs in software engineering , 1992, International Conference on Software Engineering.

[6]  Andy Podgurski,et al.  The Probabilistic Program Dependence Graph and Its Application to Fault Diagnosis , 2008, IEEE Transactions on Software Engineering.

[7]  Andy Podgurski,et al.  Matching Test Cases for Effective Fault Localization , 2011 .

[8]  Wes Masri,et al.  Cleansing Test Suites from Coincidental Correctness to Enhance Fault-Localization , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[9]  Joseph Robert Horgan,et al.  Dynamic program slicing , 1990, PLDI '90.

[10]  J. Pearl Causal inference in statistics: An overview , 2009 .

[11]  Shin Yoo,et al.  FLUCCS: using code and change metrics to improve fault localization , 2017, ISSTA.

[12]  Ross Gore,et al.  Reducing confounding bias in predicate-level statistical debugging metrics , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[13]  Robert Feldt,et al.  Finding test data with specific properties via metaheuristic search , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).

[14]  Robert Feldt,et al.  MOAD: Modeling Observation-Based Approximate Dependency , 2019, 2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[15]  A. Jefferson Offutt,et al.  Mutation analysis using mutant schemata , 1993, ISSTA '93.

[16]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[17]  Seongmin Lee Scalable and Approximate Program Dependence Analysis , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[18]  Peter Spirtes,et al.  Causal discovery and inference: concepts and recent methodological advances , 2016, Applied Informatics.

[19]  Ciarán M Lee,et al.  Improving the accuracy of medical diagnosis with causal machine learning , 2020, Nature Communications.

[20]  Collin McMillan,et al.  Do Programmers do Change Impact Analysis in Debugging? , 2016, Empirical Software Engineering.

[21]  A.J.C. van Gemund,et al.  On the Accuracy of Spectrum-based Fault Localization , 2007, Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007).

[22]  Thomas J. Ostrand,et al.  Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria , 1994, Proceedings of 16th International Conference on Software Engineering.

[23]  Ran Ettinger,et al.  Untangling: a slice extraction refactoring , 2004, AOSD '04.

[24]  David W. Binkley,et al.  Semantics Guided Regression Test Cost Reduction , 1997, IEEE Trans. Software Eng..

[25]  Nikolai Kosmatov,et al.  Frama-C: A software analysis perspective , 2015, Formal Aspects of Computing.

[26]  Judea Pearl,et al.  The seven tools of causal inference, with reflections on machine learning , 2019, Commun. ACM.

[27]  Illtyd Trethowan Causality , 1938 .

[28]  Xiao Liu,et al.  The Bayesian Network based program dependence graph and its application to fault localization , 2017, J. Syst. Softw..

[29]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[30]  Jonathan I. Maletic,et al.  srcML: An Infrastructure for the Exploration, Analysis, and Manipulation of Source Code: A Tool Demonstration , 2013, 2013 IEEE International Conference on Software Maintenance.

[31]  Rui Abreu,et al.  Threats to the validity and value of empirical assessments of the accuracy of coverage-based fault locators , 2013, ISSTA.

[32]  Václav Rajlich,et al.  Hidden dependencies in program comprehension and change propagation , 2001, Proceedings 9th International Workshop on Program Comprehension. IWPC 2001.

[33]  Wei Li,et al.  DeepFL: integrating multiple fault diagnosis dimensions for deep fault localization , 2019, ISSTA.

[34]  A. Aussem,et al.  Hip Fracture in the Elderly: A Re-Analysis of the EPIDOS Study with Causal Bayesian Networks , 2015, PloS one.

[35]  Keith Brian Gallagher,et al.  Using Program Slicing in Software Maintenance , 1991, IEEE Trans. Software Eng..

[36]  David Lo,et al.  Information retrieval and spectrum based bug localization: better together , 2015, ESEC/SIGSOFT FSE.

[37]  Lars Grunske,et al.  A learning-to-rank based fault localization approach using likely invariants , 2016, ISSTA.

[38]  David W. Binkley,et al.  Interprocedural slicing using dependence graphs , 1990, TOPL.

[39]  Feng Cao,et al.  MFL: Method-Level Fault Localization with Causal Inference , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[40]  Frank Tip,et al.  Platform-Independent Dynamic Taint Analysis for JavaScript , 2020, IEEE Transactions on Software Engineering.

[41]  Baowen Xu,et al.  A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization , 2013, TSEM.

[42]  Andy Podgurski,et al.  Causal inference for statistical fault localization , 2010, ISSTA '10.