Mulr4FL: Effective Fault Localization of Evolution Software Based on Multivariate Logistic Regression Model

Fault localization is indeed tedious and costly work during software maintenance. Studies indicate that combining both structural features and behavior characteristics of programs can be beneficial for improving the efficiency of fault locating. In this paper, we proposed a framework, called Mulr4FL, for fault localization using a multivariate logistic regression model that combined both static and dynamic features collected from the program under debugging. Firstly, the hybrid metrics data set, with both program structural features and behavior characteristics combined, is constructed by static program analyzing and dynamically tracing that runs with a designed metrics set. Meanwhile, the fault information of the legacy program is also obtained from the bug tracking system. Secondly, Bivariate logistic analysis is performed to filter the metrics that are significantly related to faults, and then with the selected metrics and their measurements, a multivariate logistic regression model was constructed and trained. Finally, based on the trained logistic model, we conduct the multivariate logistic analysis on the features of the evolved software and predict the buggy class methods. An empirical study was conducted based on a set of benchmarks that are used widely in program debugging research. The results indicate that the Mulr4FL can significantly improve the effectiveness of locating faults in contrast to 5 baseline techniques.

[1]  C. Manjula,et al.  Deep neural network based hybrid approach for software defect prediction using software metrics , 2018, Cluster Computing.

[2]  Xiao-Yuan Jing,et al.  Progress on approaches to software defect prediction , 2018, IET Softw..

[3]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[4]  David W. Binkley,et al.  Program slicing , 2008, 2008 Frontiers of Software Maintenance.

[5]  Greg Nelson,et al.  Extended static checking for Java , 2002, PLDI '02.

[6]  H. Cleve,et al.  Locating causes of program failures , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[7]  Michael I. Jordan,et al.  Scalable statistical bug isolation , 2005, PLDI '05.

[8]  E. Jaynes On the rationale of maximum-entropy methods , 1982, Proceedings of the IEEE.

[9]  Taghi M. Khoshgoftaar,et al.  An empirical study of predicting software faults with case-based reasoning , 2006, Software Quality Journal.

[10]  Ruchika Malhotra,et al.  A systematic review of machine learning techniques for software fault prediction , 2015, Appl. Soft Comput..

[11]  D. Hosmer,et al.  Goodness of fit tests for the multiple logistic regression model , 1980 .

[12]  Jian Li,et al.  Software Defect Prediction via Convolutional Neural Network , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[13]  Xiangyu Zhang,et al.  Locating faulty code by multiple points slicing , 2007, Softw. Pract. Exp..

[14]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[15]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[16]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[17]  Jaafar Zubairu Maitama,et al.  Spectrum-based Fault Localization Techniques Application on Multiple-Fault Programs: A Review , 2020 .

[18]  Rui Abreu,et al.  Multiple fault localization of software programs: A systematic literature review , 2020, Inf. Softw. Technol..

[19]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[20]  Tadayoshi Fushiki,et al.  Estimation of prediction error by using K-fold cross-validation , 2011, Stat. Comput..

[21]  Bin Li,et al.  IPSETFUL: an iterative process of selecting test cases for effective fault localization by exploring concept lattice of program spectra , 2016, Frontiers of Computer Science.

[22]  Adam A. Porter,et al.  Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis , 1988, IEEE Trans. Software Eng..

[23]  Nazanin Bayati Chaleshtari,et al.  SMBFL: slice-based cost reduction of mutation-based fault localization , 2020, Empirical Software Engineering.

[24]  Lee Naish,et al.  A model for spectra-based software diagnosis , 2011, TSEM.

[25]  Shin Yoo,et al.  Precise Learn-to-Rank Fault Localization Using Dynamic and Static Features of Target Programs , 2019, ACM Trans. Softw. Eng. Methodol..

[26]  Richard H. Carver,et al.  An Evaluation of the MOOD Set of Object-Oriented Software Metrics , 1998, IEEE Trans. Software Eng..

[27]  Taghi M. Khoshgoftaar,et al.  Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study , 2004, Empirical Software Engineering.

[28]  Tibor Gyimóthy,et al.  An efficient relevant slicing method for debugging , 1999, ESEC/FSE-7.

[29]  Lionel C. Briand,et al.  Assessing the Applicability of Fault-Proneness Models Across Object-Oriented Software Projects , 2002, IEEE Trans. Software Eng..

[30]  Yuming Zhou,et al.  Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models , 2016, SIGSOFT FSE.

[31]  Chao Liu,et al.  SOBER: statistical model-based bug localization , 2005, ESEC/FSE-13.

[32]  Xiangyu Zhang,et al.  Experimental evaluation of using dynamic slices for fault location , 2005, AADEBUG'05.

[33]  James H. Andrews,et al.  Evaluating the Accuracy of Fault Localization Techniques , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[34]  W. Eric Wong,et al.  Software Fault Localization , 2010, Encyclopedia of Software Engineering.

[35]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[36]  Shujuan Jiang,et al.  HSFal: Effective fault localization using hybrid spectrum of full slices and execution slices , 2014, J. Syst. Softw..

[37]  Jian Xu,et al.  A general noise-reduction framework for fault localization of Java programs , 2013, Inf. Softw. Technol..

[38]  D. Pregibon,et al.  Graphical Methods for Assessing Logistic Regression Models , 1984 .

[39]  Peter Zoeteweij,et al.  An Evaluation of Similarity Coefficients for Software Fault Localization , 2006, 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06).

[40]  M. Schroeder A practical guide to object-oriented metrics , 1999 .

[41]  Baowen Xu,et al.  A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization , 2013, TSEM.

[42]  Chao Liu,et al.  Statistical Debugging: A Hypothesis Testing-Based Approach , 2006, IEEE Transactions on Software Engineering.

[43]  Steven P. Reiss,et al.  Fault localization with nearest neighbor queries , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[44]  Saeed Parsa,et al.  FPA-FL: Incorporating Static Fault-proneness Analysis into Statistical Fault Localization , 2017, J. Syst. Softw..

[45]  Kai-Yuan Cai,et al.  Effective Fault Localization using Code Coverage , 2007, 31st Annual International Computer Software and Applications Conference (COMPSAC 2007).

[46]  Earl R. Babbie,et al.  The practice of social research , 1969 .

[47]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[48]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[49]  Ran Wang,et al.  Combining Spectrum-Based Fault Localization and Statistical Debugging: An Empirical Study , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[50]  Sarfraz Khurshid,et al.  An Empirical Study of Boosting Spectrum-Based Fault Localization via PageRank , 2021, IEEE Transactions on Software Engineering.

[51]  J. David Morgenthaler,et al.  Using FindBugs on production software , 2007, OOPSLA '07.

[52]  Wei Li,et al.  DeepFL: integrating multiple fault diagnosis dimensions for deep fault localization , 2019, ISSTA.

[53]  Song Wang,et al.  Automatically Learning Semantic Features for Defect Prediction , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[54]  Rui Abreu,et al.  A Survey on Software Fault Localization , 2016, IEEE Transactions on Software Engineering.

[55]  Carol Withrow,et al.  Prediction and control of ADA software defects , 1990, J. Syst. Softw..