Predicting Crash Fault Residence via Simplified Deep Forest Based on A Reduced Feature Set

The software inevitably encounters the crash, which will take developers a large amount of effort to find the fault causing the crash (short for crashing fault). Developing automatic methods to identify the residence of the crashing fault is a crucial activity for software quality assurance. Researchers have proposed methods to predict whether the crashing fault resides in the stack trace based on the features collected from the stack trace and faulty code, aiming at saving the debugging effort for developers. However, previous work usually neglected the feature preprocessing operation towards the crash data and only used traditional classification models. In this paper, we propose a novel crashing fault residence prediction framework, called ConDF, which consists of a consistency based feature subset selection method and a state-of-the-art deep forest model. More specifically, first, the feature selection method is used to obtain an optimal feature subset and reduce the feature dimension by reserving the representative features. Then, a simplified deep forest model is employed to build the classification model on the reduced feature set. The experiments on seven open source software projects show that our ConDF method performs significantly better than 17 baseline methods on three performance indicators.

[1]  Sunghun Kim,et al.  Reducing Features to Improve Bug Prediction , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[2]  Andrian Marcus,et al.  On the Use of Stack Traces to Improve Text Retrieval-Based Bug Localization , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[3]  Jin Liu,et al.  The Impact of Feature Selection on Defect Prediction Performance: An Empirical Comparison , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[4]  Ji Feng,et al.  Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.

[5]  Sunghun Kim,et al.  Reducing Features to Improve Code Change-Based Bug Prediction , 2013, IEEE Transactions on Software Engineering.

[6]  Ricardo Massa Ferreira Lima,et al.  GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation , 2010, Inf. Softw. Technol..

[7]  Xiaohong Zhang,et al.  Imbalanced metric learning for crashing fault residence prediction , 2020, J. Syst. Softw..

[8]  Xiang Chen,et al.  FECS: A Cluster Based Feature Selection Method for Software Fault Prediction with Noises , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[9]  A. Panichella,et al.  A guided genetic algorithm for automated crash reproduction , 2017, ICSE 2017.

[10]  Huan Liu,et al.  Consistency Based Feature Selection , 2000, PAKDD.

[11]  Xiang Chen,et al.  A Two-Stage Data Preprocessing Approach for Software Fault Prediction , 2014, 2014 Eighth International Conference on Software Security and Reliability.

[12]  Rui Abreu,et al.  A Survey on Software Fault Localization , 2016, IEEE Transactions on Software Engineering.

[13]  Hongming Zhu,et al.  Feature selection for software effort estimation with localized neighborhood mutual information , 2019, Cluster Computing.

[14]  Liang Gong,et al.  Locating Crashing Faults based on Crash Stack Traces , 2014, ArXiv.

[15]  Jin Liu,et al.  Identifying Crashing Fault Residence Based on Cross Project Model , 2019, 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE).

[16]  Mozhan Soltani,et al.  A benchmark-based evaluation of search-based crash reproduction , 2019, Empirical Software Engineering.

[17]  Daniel Neagu,et al.  Improving analogy software effort estimation using fuzzy feature subset selection algorithm , 2008, PROMISE '08.

[18]  C. Manjula,et al.  Deep neural network based hybrid approach for software defect prediction using software metrics , 2018, Cluster Computing.

[19]  Xiang Chen,et al.  Improving defect prediction with deep forest , 2019, Inf. Softw. Technol..

[20]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[21]  Lu Zhang,et al.  Boosting Bug-Report-Oriented Fault Localization with Segmentation and Stack-Trace Analysis , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[22]  Martin Monperrus,et al.  Crash reproduction via test case mutation: let existing test cases help , 2015, ESEC/SIGSOFT FSE.

[23]  Ning Chen,et al.  STAR: Stack Trace Based Automatic Crash Reproduction via Symbolic Execution , 2015, IEEE Transactions on Software Engineering.

[24]  Rongxin Wu,et al.  CrashLocator: locating crashing faults based on crash stacks , 2014, ISSTA 2014.

[25]  Wenbo Zheng,et al.  Software Defect Prediction Model Based on Improved Deep Forest and AutoEncoder by Forest , 2019, SEKE.

[26]  Ming Wen,et al.  ChangeLocator: locate crash-inducing changes based on crash reports , 2017, Empirical Software Engineering.

[27]  Hongyu Zhang,et al.  Does the fault reside in a stack trace? Assisting crash localization by predicting crashing fault residence , 2019, J. Syst. Softw..

[28]  Alain Abran,et al.  Investigating heterogeneous ensembles with filter feature selection for software effort estimation , 2017, IWSM-Mensura.

[29]  A. Hamou-Lhadj,et al.  A bug reproduction approach based on directed model checking and crash traces , 2017, J. Softw. Evol. Process..

[30]  Xiang Chen,et al.  FECAR: A Feature Selection Framework for Software Defect Prediction , 2014, 2014 IEEE 38th Annual Computer Software and Applications Conference.

[31]  Ying Zou,et al.  Cross-Project Defect Prediction Using a Connectivity-Based Unsupervised Classifier , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[32]  Abdelwahab Hamou-Lhadj,et al.  Automatic prediction of the severity of bugs using stack traces , 2016, CASCON.

[33]  Rahil Sarikhani,et al.  Improvement of effort estimation accuracy in software projects using a feature selection approach , 2016 .

[34]  Can Cui,et al.  A Novel Feature Selection Method for Software Fault Prediction Model , 2019, 2019 Annual Reliability and Maintainability Symposium (RAMS).

[35]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[36]  Shane McIntosh,et al.  A Large-Scale Study of the Impact of Feature Selection Techniques on Defect Classification Models , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[37]  Yuxiang Shen,et al.  An empirical study on pareto based multi-objective feature selection for software defect prediction , 2019, J. Syst. Softw..

[38]  Abdelwahab Hamou-Lhadj,et al.  JCHARMING: A bug reproduction approach using crash traces and directed model checking , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).