Precise Learn-to-Rank Fault Localization Using Dynamic and Static Features of Target Programs

Finding the root cause of a bug requires a significant effort from developers. Automated fault localization techniques seek to reduce this cost by computing the suspiciousness scores (i.e., the likelihood of program entities being faulty). Existing techniques have been developed by utilizing input features of specific types for the computation of suspiciousness scores, such as program spectrum or mutation analysis results. This article presents a novel learn-to-rank fault localization technique called PRecise machINe-learning-based fault loCalization tEchnique (PRINCE). PRINCE uses genetic programming (GP) to combine multiple sets of localization input features that have been studied separately until now. For dynamic features, PRINCE encompasses both Spectrum Based Fault Localization (SBFL) and Mutation Based Fault Localization (MBFL) techniques. It also uses static features, such as dependency information and structural complexity of program entities. All such information is used by GP to train a ranking model for fault localization. The empirical evaluation on 65 real-world faults from CoREBench, 84 artificial faults from SIR, and 310 real-world faults from Defects4J shows that PRINCE outperforms the state-of-the-art SBFL, MBFL, and learn-to-rank techniques significantly. PRINCE localizes a fault after reviewing 2.4% of the executed statements on average (4.2 and 3.0 times more precise than the best of the compared SBFL and MBFL techniques, respectively). Also, PRINCE ranks 52.9% of the target faults within the top ten suspicious statements.

[1]  Moonzoo Kim,et al.  MUSIC: Mutation Analysis Tool with High Configurability and Extensibility , 2018, 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW).

[2]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[3]  Fumio Akiyama,et al.  An Example of Software System Debugging , 1971, IFIP Congress.

[4]  Hiroaki Yoshida,et al.  Anti-patterns in search-based program repair , 2016, SIGSOFT FSE.

[5]  Baowen Xu,et al.  A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization , 2013, TSEM.

[6]  Wynne Hsu,et al.  DESIGN OF MUTANT OPERATORS FOR THE C PROGRAMMING LANGUAGE , 2006 .

[7]  Myra B. Cohen,et al.  Directed test suite augmentation: an empirical investigation , 2015, Softw. Test. Verification Reliab..

[8]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[9]  Iris Vessey,et al.  Expertise in Debugging Computer Programs: A Process Analysis , 1984, Int. J. Man Mach. Stud..

[10]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[11]  Moonzoo Kim,et al.  Automated unit testing of large industrial embedded software using concolic testing , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[12]  Yves Le Traon,et al.  Using Mutants to Locate "Unknown" Faults , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[13]  Venera Arnaoudova,et al.  Improving Source Code Readability: Theory and Practice , 2019, 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC).

[14]  Sarfraz Khurshid,et al.  Improving the effectiveness of spectra-based fault localization using specifications , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[15]  Diomidis Spinellis,et al.  A survey on software smells , 2018, J. Syst. Softw..

[16]  Shin Hong,et al.  MUSEUM: Debugging real-world multilingual programs using mutation analysis , 2017, Inf. Softw. Technol..

[17]  Raúl A. Santelices,et al.  Lightweight fault-localization using multiple coverage types , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[18]  Xiangyu Zhang,et al.  Locating faults through automated predicate switching , 2006, ICSE.

[19]  Abhik Roychoudhury,et al.  CoREBench: studying complexity of regression errors , 2014, ISSTA 2014.

[20]  Emina Torlak,et al.  Angelic debugging , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[21]  Andreas Zeller,et al.  Lightweight bug localization with AMPLE , 2005, AADEBUG'05.

[22]  Alessandro Orso,et al.  Are automated debugging techniques actually helping programmers? , 2011, ISSTA '11.

[23]  P. F. Russell,et al.  On Habitat and Association of Species of Anopheline Larvae in South-eastern Madras. , 1940 .

[24]  George Candea,et al.  Parallel symbolic execution for automated real-world software testing , 2011, EuroSys '11.

[25]  Lionel C. Briand,et al.  Effective fault localization of automotive Simulink models: achieving the trade-off between test oracle effort and fault localization accuracy , 2018, Empirical Software Engineering.

[26]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[27]  Radu Marinescu,et al.  Detection strategies: metrics-based rules for detecting design flaws , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[28]  Manu Sridharan,et al.  Alternate and Learn: Finding Witnesses without Looking All over , 2012, CAV.

[29]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[30]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[31]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[32]  Yves Le Traon,et al.  Metallaxis‐FL: mutation‐based fault localization , 2015, Softw. Test. Verification Reliab..

[33]  Peter Zoeteweij,et al.  An Evaluation of Similarity Coefficients for Software Fault Localization , 2006, 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06).

[34]  Fernando Brito e Abreu,et al.  Candidate metrics for object-oriented software within a taxonomy framework , 1994, J. Syst. Softw..

[35]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[36]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[37]  Andreas Zeller,et al.  Locating causes of program failures , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[38]  W. Eric Wong,et al.  The DStar Method for Effective Software Fault Localization , 2014, IEEE Transactions on Reliability.

[39]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[40]  Zheng Li,et al.  Faster mutation-based fault localization with a novel mutation execution strategy , 2015, 2015 IEEE Eighth International Conference on Software Testing, Verification and Validation Workshops (ICSTW).

[41]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[42]  Rajiv Gupta,et al.  Fault localization using value replacement , 2008, ISSTA '08.

[43]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[44]  A. Zeller Isolating cause-effect chains from computer programs , 2002, SIGSOFT '02/FSE-10.

[45]  Cristian Cadar,et al.  Shadow of a Doubt: Testing for Divergences between Software Versions , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[46]  Myra B. Cohen,et al.  Hybrid Directed Test Suite Augmentation: An Interleaving Framework , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[47]  Moonzoo Kim,et al.  Precise Concolic Unit Testing of C Programs Using Extended Units and Symbolic Alarm Filtering , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[48]  Chih-Jen Lin,et al.  Large-Scale Linear RankSVM , 2014, Neural Computation.

[49]  Rui Abreu,et al.  Zoltar: A Toolset for Automatic Fault Localization , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[50]  Gregg Rothermel,et al.  A Scalable Distributed Concolic Testing Approach: An Empirical Evaluation , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[51]  Lars Grunske,et al.  A learning-to-rank based fault localization approach using likely invariants , 2016, ISSTA.

[52]  Shin Hong,et al.  DEMINER: test generation for high test coverage through mutant exploration , 2019 .

[53]  Steven P. Reiss,et al.  Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[54]  Gregg Rothermel,et al.  A Hybrid Directed Test Suite Augmentation Technique , 2011, 2011 IEEE 22nd International Symposium on Software Reliability Engineering.

[55]  Maurice H. Halstead,et al.  Elements of software science (Operating and programming systems series) , 1977 .

[56]  Shin Hong,et al.  Mutation-Based Fault Localization for Real-World Multilingual Programs (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[57]  Kai-Yuan Cai,et al.  Effective Fault Localization using Code Coverage , 2007, 31st Annual International Computer Software and Applications Conference (COMPSAC 2007).

[58]  Auri Marcelo Rizzo Vincenzi,et al.  Proteum: a family of tools to support specification and program testing based on mutation , 2001 .

[59]  Shin Yoo,et al.  Ask the Mutants: Mutating Faulty Programs for Fault Localization , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[60]  Rui Abreu,et al.  Threats to the validity and value of empirical assessments of the accuracy of coverage-based fault locators , 2013, ISSTA.

[61]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[62]  Marcelo d'Amorim,et al.  Prevalence of Single-Fault Fixes and Its Impact on Fault Localization , 2017, 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[63]  Yue Jia,et al.  MILU: A Customizable, Runtime-Optimized Higher Order Mutation Testing Tool for the Full C Language , 2008, Testing: Academic & Industrial Conference - Practice and Research Techniques (taic part 2008).

[64]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[65]  Mark Harman,et al.  Human Competitiveness of Genetic Programming in Spectrum-Based Fault Localisation , 2017, ACM Trans. Softw. Eng. Methodol..

[66]  Shin Hong,et al.  Target-driven compositional concolic testing with function summary refinement for effective bug detection , 2019, ESEC/SIGSOFT FSE.

[67]  Shin Hong,et al.  Invasive Software Testing: Mutating Target Programs to Diversify Test Exploration for High Test Coverage , 2018, 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST).

[68]  René Just,et al.  The major mutation framework: efficient and scalable mutation analysis for Java , 2014, ISSTA 2014.

[69]  Shin Yoo,et al.  FLUCCS: using code and change metrics to improve fault localization , 2017, ISSTA.

[70]  Sarfraz Khurshid,et al.  Injecting mechanical faults to localize developer faults for evolving software , 2013, OOPSLA.

[71]  Lingfeng Bao,et al.  “Automated Debugging Considered Harmful” Considered Harmful: A User Study Revisiting the Usefulness of Spectra-Based Fault Localization Techniques with Professionals Using Real Bugs from Large Systems , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[72]  Martin Monperrus,et al.  Learning to Combine Multiple Ranking Metrics for Fault Localization , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[73]  C. Daniel One-at-a-Time Plans , 1973 .

[74]  Richard Torkar,et al.  Software fault prediction metrics: A systematic literature review , 2013, Inf. Softw. Technol..

[75]  Myra B. Cohen,et al.  Directed test suite augmentation: techniques and tradeoffs , 2010, FSE '10.

[76]  Rui Abreu,et al.  A Survey on Software Fault Localization , 2016, IEEE Transactions on Software Engineering.

[77]  Michael D. Ernst,et al.  Evaluating and Improving Fault Localization , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[78]  Lee Naish,et al.  A model for spectra-based software diagnosis , 2011, TSEM.

[79]  Moonzoo Kim,et al.  Concolic Testing for High Test Coverage and Reduced Human Effort in Automotive Industry , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[80]  Arie van Deursen,et al.  A test-suite diagnosability metric for spectrum-based fault localization approaches , 2017, ICSE.

[81]  Sandeep Kumar,et al.  A study on software fault prediction techniques , 2019, Artificial Intelligence Review.

[82]  Shin Yoo,et al.  Evolving Human Competitive Spectra-Based Fault Localisation Techniques , 2012, SSBSE.

[83]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .