Improving defect prediction with deep forest

Abstract Context Software defect prediction is important to ensure the quality of software. Nowadays, many supervised learning techniques have been applied to identify defective instances (e.g., methods, classes, and modules). Objective However, the performance of these supervised learning techniques are still far from satisfactory, and it will be important to design more advanced techniques to improve the performance of defect prediction models. Method We propose a new deep forest model to build the defect prediction model (DPDF). This model can identify more important defect features by using a new cascade strategy, which transforms random forest classifiers into a layer-by-layer structure. This design takes full advantage of ensemble learning and deep learning. Results We evaluate our approach on 25 open source projects from four public datasets (i.e., NASA, PROMISE, AEEEM and Relink). Experimental results show that our approach increases AUC value by 5% compared with the best traditional machine learning algorithms. Conclusion The deep strategy in DPDF is effective for software defect prediction.

[1]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[2]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[3]  Dino Isa,et al.  PIPELINE DEFECT PREDICTION USING SUPPORT VECTOR MACHINES , 2009, Appl. Artif. Intell..

[4]  Jin Liu,et al.  Dictionary learning based software defect prediction , 2014, ICSE.

[5]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  David Lo,et al.  Supervised vs Unsupervised Models: A Holistic Look at Effort-Aware Just-in-Time Defect Prediction , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[7]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[8]  Zhihua Cui,et al.  A model for software defect prediction using support vector machine based on CBA , 2016, Int. J. Intell. Syst. Technol. Appl..

[9]  Xiang Chen,et al.  FECAR: A Feature Selection Framework for Software Defect Prediction , 2014, 2014 IEEE 38th Annual Computer Software and Applications Conference.

[10]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[11]  Audris Mockus,et al.  Towards building a universal defect prediction model with rank transformed predictors , 2016, Empirical Software Engineering.

[12]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[13]  Ahmed E. Hassan,et al.  An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[14]  Kai Zhang,et al.  How security bugs are fixed and what can be improved: an empirical study with Mozilla , 2018, Science China Information Sciences.

[15]  Ruchika Malhotra,et al.  An empirical framework for defect prediction using machine learning techniques with Android software , 2016, Appl. Soft Comput..

[16]  William Marsh,et al.  On the effectiveness of early life cycle defect prediction with Bayesian Nets , 2008, Empirical Software Engineering.

[17]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[18]  Jin Liu,et al.  MICHAC: Defect Prediction via Feature Selection Based on Maximal Information Coefficient with Hierarchical Agglomerative Clustering , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[19]  Xiang Chen,et al.  MULTI: Multi-objective effort-aware just-in-time software defect prediction , 2018, Inf. Softw. Technol..

[20]  Hareton K. N. Leung,et al.  MSR4SM: Using topic models to effectively mining software repositories for software maintenance tasks , 2015, Inf. Softw. Technol..

[21]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[22]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[23]  Michele Lanza,et al.  An extensive comparison of bug prediction approaches , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[24]  Ying Zou,et al.  Cross-Project Defect Prediction Using a Connectivity-Based Unsupervised Classifier , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Jun Yang,et al.  Defect Prediction on Unlabeled Datasets by Using Unsupervised Clustering , 2016, 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[27]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[28]  Ye Yang,et al.  An investigation on the feasibility of cross-project defect prediction , 2012, Automated Software Engineering.

[29]  Bin Li,et al.  An Empirical Study on Real Bugs for Machine Learning Programs , 2017, 2017 24th Asia-Pacific Software Engineering Conference (APSEC).

[30]  Chris Cheadle,et al.  Application of z-score transformation to Affymetrix data. , 2003, Applied bioinformatics.

[31]  Mohammad Alshayeb,et al.  Software defect prediction using ensemble learning on selected features , 2015, Inf. Softw. Technol..

[32]  Xiaoyuan Jing,et al.  Multiple kernel ensemble learning for software defect prediction , 2015, Automated Software Engineering.

[33]  Jin Liu,et al.  The Impact of Feature Selection on Defect Prediction Performance: An Empirical Comparison , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[34]  Lu Wang,et al.  Construct Bug Knowledge Graph for Bug Resolution , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[35]  Hareton K. N. Leung,et al.  Change impact analysis and changeability assessment for a change proposal: An empirical study ☆☆ , 2014, J. Syst. Softw..

[36]  Olcay Taner Yildiz,et al.  Software defect prediction using Bayesian networks , 2012, Empirical Software Engineering.

[37]  Florin Gorunescu,et al.  Data Mining - Concepts, Models and Techniques , 2011, Intelligent Systems Reference Library.

[38]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[39]  Xiaobing Sun,et al.  Enhancing developer recommendation with supplementary information via mining historical commits , 2017, J. Syst. Softw..

[40]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[41]  Bin Li,et al.  Bug Localization for Version Issues With Defect Patterns , 2019, IEEE Access.

[42]  Shane McIntosh,et al.  Automated Parameter Optimization of Classification Techniques for Defect Prediction Models , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[43]  D. Cox Two further applications of a model for binary regression , 1958 .

[44]  Rongxin Wu,et al.  ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[45]  Tracy Hall,et al.  Researcher Bias: The Use of Machine Learning in Software Defect Prediction , 2014, IEEE Transactions on Software Engineering.

[46]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[47]  Ji Feng,et al.  Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.

[48]  Ayse Basar Bener,et al.  Defect prediction from static code features: current results, limitations, new approaches , 2010, Automated Software Engineering.

[49]  Yuming Zhou,et al.  Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models , 2016, SIGSOFT FSE.

[50]  Zhaowei Shang,et al.  Negative samples reduction in cross-company software defects prediction , 2015, Inf. Softw. Technol..

[51]  Brendan Murphy,et al.  Can developer-module networks predict failures? , 2008, SIGSOFT '08/FSE-16.

[52]  Shane McIntosh,et al.  Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[53]  Xiao-Yuan Jing,et al.  Label propagation based semi-supervised learning for software defect prediction , 2016, Automated Software Engineering.

[54]  Jaechang Nam,et al.  CLAMI: Defect Prediction on Unlabeled Datasets (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[55]  David Lo,et al.  Cross-project build co-change prediction , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[56]  Zibin Zheng,et al.  Online Reliability Prediction via Motifs-Based Dynamic Bayesian Networks for Service-Oriented Systems , 2017, IEEE Transactions on Software Engineering.

[57]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[58]  Rajiv Shivpuri,et al.  On line prediction of surface defects in hot bar rolling based on Bayesian hierarchical modeling , 2015, J. Intell. Manuf..

[59]  Xiao-Yuan Jing,et al.  Heterogeneous Defect Prediction Through Multiple Kernel Learning and Ensemble Learning , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[60]  Ömer Faruk Arar,et al.  A feature dependent Naive Bayes approach and its application to the software defect prediction problem , 2017, Appl. Soft Comput..

[61]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[62]  Baowen Xu,et al.  An Improved SDA Based Defect Prediction Framework for Both Within-Project and Cross-Project Class-Imbalance Problems , 2017, IEEE Transactions on Software Engineering.

[63]  David Lo,et al.  File-Level Defect Prediction: Unsupervised vs. Supervised Models , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[64]  Hoh Peter In,et al.  Developer Micro Interaction Metrics for Software Defect Prediction , 2016, IEEE Transactions on Software Engineering.

[65]  Steffen Herbold,et al.  Comments on ScottKnottESD in Response to “An Empirical Comparison of Model Validation Techniques for Defect Prediction Models” , 2017, IEEE Transactions on Software Engineering.

[66]  David Lo,et al.  HYDRA: Massively Compositional Model for Cross-Project Defect Prediction , 2016, IEEE Transactions on Software Engineering.

[67]  Rakesh Rana,et al.  Analyzing defect inflow distribution and applying Bayesian inference method for software defect prediction in large software projects , 2016, J. Syst. Softw..

[68]  Jian Li,et al.  Software Defect Prediction via Convolutional Neural Network , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[69]  Ünal Çavusoglu,et al.  A novel defect prediction method for web pages using k-means++ , 2015, Expert Syst. Appl..

[70]  Premkumar T. Devanbu,et al.  Recalling the "imprecision" of cross-project defect prediction , 2012, SIGSOFT FSE.

[71]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[72]  Xinli Yang,et al.  Deep Learning for Just-in-Time Defect Prediction , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[73]  Tim Menzies,et al.  Is "Better Data" Better Than "Better Data Miners"? , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[74]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[75]  Bojan Cukic,et al.  Defect Prediction between Software Versions with Active Learning and Dimensionality Reduction , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[76]  Jun Wang,et al.  Compressed C4.5 Models for Software Defect Prediction , 2012, 2012 12th International Conference on Quality Software.

[77]  Song Wang,et al.  Automatically Learning Semantic Features for Defect Prediction , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[78]  Bin Li,et al.  IPSETFUL: an iterative process of selecting test cases for effective fault localization by exploring concept lattice of program spectra , 2016, Frontiers of Computer Science.

[79]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[80]  Baowen Xu,et al.  Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning , 2015, ESEC/SIGSOFT FSE.

[81]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[82]  Guangchun Luo,et al.  Transfer learning for cross-company software defect prediction , 2012, Inf. Softw. Technol..