Software defect prediction based on kernel PCA and weighted extreme learning machine

Abstract Context Software defect prediction strives to detect defect-prone software modules by mining the historical data. Effective prediction enables reasonable testing resource allocation, which eventually leads to a more reliable software. Objective The complex structures and the imbalanced class distribution in software defect data make it challenging to obtain suitable data features and learn an effective defect prediction model. In this paper, we propose a method to address these two challenges. Method We propose a defect prediction framework called KPWE that combines two techniques, i.e., Kernel Principal Component Analysis (KPCA) and Weighted Extreme Learning Machine (WELM). Our framework consists of two major stages. In the first stage, KPWE aims to extract representative data features. It leverages the KPCA technique to project the original data into a latent feature space by nonlinear mapping. In the second stage, KPWE aims to alleviate the class imbalance. It exploits the WELM technique to learn an effective defect prediction model with a weighting-based scheme. Results We have conducted extensive experiments on 34 projects from the PROMISE dataset and 10 projects from the NASA dataset. The experimental results show that KPWE achieves promising performance compared with 41 baseline methods, including seven basic classifiers with KPCA, five variants of KPWE, eight representative feature selection methods with WELM, 21 imbalanced learning methods. Conclusion In this paper, we propose KPWE, a new software defect prediction framework that considers the feature extraction and class imbalance issues. The empirical study on 44 software projects indicate that KPWE is superior to the baseline methods in most cases.

[1]  Sudeep D. Thepade,et al.  Novel data mining based image classification with Bayes, Tree, Rule, Lazy and Function Classifiers using fractional row mean of Cosine, Sine and Walsh column transformed images , 2015, 2015 International Conference on Communication, Information & Computing Technology (ICCICT).

[2]  Yiqiang Chen,et al.  Weighted extreme learning machine for imbalance learning , 2013, Neurocomputing.

[3]  Akito Monden,et al.  The Effects of Over and Under Sampling on Fault-prone Module Detection , 2007, ESEM 2007.

[4]  Rajeev R. Raje,et al.  An Empirical Comparison of Machine Learning Techniques for Software Defect Prediction , 2014, BICT.

[5]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[6]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[7]  R. Fletcher Practical Methods of Optimization , 1988 .

[8]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[9]  Hao Chen,et al.  Kernel Based Asymmetric Learning for Software Defect Prediction , 2012, IEICE Trans. Inf. Syst..

[10]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[11]  E. James Whitehead,et al.  Efficient bug prediction and fix suggestions , 2013 .

[12]  Han Zhao,et al.  Extreme learning machine: algorithm, theory and applications , 2013, Artificial Intelligence Review.

[13]  Tao Feng,et al.  A Novel PCA-BP Fuzzy Neural Network Model for Software Defect Prediction , 2012 .

[14]  Daoqiang Zhang,et al.  Two-Stage Cost-Sensitive Learning for Software Defect Prediction , 2014, IEEE Transactions on Reliability.

[15]  Qinbao Song,et al.  Using Coding-Based Ensemble Learning to Improve Software Defect Prediction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[17]  Zhaowei Shang,et al.  Negative samples reduction in cross-company software defects prediction , 2015, Inf. Softw. Technol..

[18]  Shane McIntosh,et al.  Predicting Build Co-changes with Source Code Change and Commit Categories , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[19]  Ping Guo,et al.  Software Defect Prediction Using Fuzzy Support Vector Regression , 2010, ISNN.

[20]  Shane McIntosh,et al.  Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[21]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[22]  George W. Irwin,et al.  Improved Structure Optimization for Fuzzy-Neural Networks , 2012, IEEE Transactions on Fuzzy Systems.

[23]  Baowen Xu,et al.  Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction , 2018, Automated Software Engineering.

[24]  Bruce Christianson,et al.  The misuse of the NASA metrics data program data sets for automated software defect prediction , 2011, EASE.

[25]  Jun Zheng,et al.  Cost-sensitive boosting neural networks for software defect prediction , 2010, Expert Syst. Appl..

[26]  Jin Liu,et al.  Dictionary learning based software defect prediction , 2014, ICSE.

[27]  Xiaoyuan Jing,et al.  Multiple kernel ensemble learning for software defect prediction , 2015, Automated Software Engineering.

[28]  Chee Kheong Siew,et al.  Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , 2006, IEEE Transactions on Neural Networks.

[29]  Jin Liu,et al.  The Impact of Feature Selection on Defect Prediction Performance: An Empirical Comparison , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[30]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[31]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[32]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[33]  Xin Yao,et al.  A Learning-to-Rank Approach to Software Defect Prediction , 2015, IEEE Transactions on Reliability.

[34]  Akito Monden,et al.  The Significant Effects of Data Sampling Approaches on Software Defect Prioritization and Classification , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[35]  Jing Peng,et al.  Kernel indexing for relevance feedback image retrieval , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[36]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[37]  Guang-Bin Huang,et al.  Trends in extreme learning machines: A review , 2015, Neural Networks.

[38]  Jaechang Nam,et al.  CLAMI: Defect Prediction on Unlabeled Datasets , 2015, ASE 2015.

[39]  Xiao-Yuan Jing,et al.  Label propagation based semi-supervised learning for software defect prediction , 2016, Automated Software Engineering.

[40]  Akito Monden,et al.  Impact of the Distribution Parameter of Data Sampling Approaches on Software Defect Prediction Models , 2017, 2017 24th Asia-Pacific Software Engineering Conference (APSEC).

[41]  Jaechang Nam,et al.  CLAMI: Defect Prediction on Unlabeled Datasets (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[42]  David Lo,et al.  Cross-project build co-change prediction , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[43]  Andreas Zeller,et al.  Predicting defects using change genealogies , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).

[44]  Bernhard Schölkopf,et al.  Iterative kernel principal component analysis for image modeling , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Glenford J. Myers,et al.  Art of Software Testing , 1979 .

[46]  Fuzhen Zhuang,et al.  Learning deep representations via extreme learning machines , 2015, Neurocomputing.

[47]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[48]  Feng Yang,et al.  Software Quality Prediction Method with Hybrid Applying Principal Components Analysis and Wavelet Neural Network and Genetic Algorithm , 2011 .

[49]  Taghi M. Khoshgoftaar,et al.  Application of neural networks to software quality modeling of a very large telecommunications system , 1997, IEEE Trans. Neural Networks.

[50]  Baowen Xu,et al.  Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning , 2015, ESEC/SIGSOFT FSE.

[51]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[52]  Ruchika Malhotra,et al.  A systematic review of machine learning techniques for software fault prediction , 2015, Appl. Soft Comput..

[53]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[54]  Ajalmar R. da Rocha Neto,et al.  Classification with reject option for software defect prediction , 2016, Appl. Soft Comput..

[55]  Ying Ma,et al.  On Software Defect Prediction Using Machine Learning , 2014, J. Appl. Math..

[56]  Taghi M. Khoshgoftaar,et al.  Improving tree-based models of software quality with principal components analysis , 2000, Proceedings 11th International Symposium on Software Reliability Engineering. ISSRE 2000.

[57]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[58]  Xiao-Yuan Jing,et al.  On the Multiple Sources and Privacy Preservation Issues for Heterogeneous Defect Prediction , 2019, IEEE Transactions on Software Engineering.

[59]  Taghi M. Khoshgoftaar,et al.  Choosing software metrics for defect prediction: an investigation on feature selection techniques , 2011, Softw. Pract. Exp..

[60]  Jongmoon Baik,et al.  Value-cognitive boosting with a support vector machine for cross-project defect prediction , 2014, Empirical Software Engineering.

[61]  Tong-Seng Quah,et al.  Application of neural networks for software quality prediction using object-oriented metrics , 2005, J. Syst. Softw..

[62]  Rainer Koschke,et al.  Effort-Aware Defect Prediction Models , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[63]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[64]  Md Zahidul Islam,et al.  Knowledge Discovery through SysFor - a Systematically Developed Forest of Multiple Decision Trees , 2011, AusDM.

[65]  Ying Ma,et al.  Asymmetric Learning Based on Kernel Partial Least Squares for Software Defect Prediction , 2012, IEICE Trans. Inf. Syst..

[66]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[67]  A. P. Nikora,et al.  How simple is software defect detection , 2003 .

[68]  Sunghun Kim,et al.  Reducing Features to Improve Bug Prediction , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[69]  Guangchun Luo,et al.  Transfer learning for cross-company software defect prediction , 2012, Inf. Softw. Technol..

[70]  David Lo,et al.  An Empirical Study of Classifier Combination for Cross-Project Defect Prediction , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[71]  M. Shanthi,et al.  EXTREME LEARNING MACHINE ALGORITHM AND ITS APPLICATION , .

[72]  Jens Grabowski,et al.  A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches , 2018, IEEE Transactions on Software Engineering.

[73]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[74]  Taghi M. Khoshgoftaar,et al.  Cost-sensitive boosting in software quality modeling , 2002, 7th IEEE International Symposium on High Assurance Systems Engineering, 2002. Proceedings..

[75]  Sunghun Kim,et al.  Reducing Features to Improve Code Change-Based Bug Prediction , 2013, IEEE Transactions on Software Engineering.

[76]  Charles R. Johnson Matrix theory and applications , 1990 .

[77]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[78]  Ruchika Malhotra,et al.  An empirical framework for defect prediction using machine learning techniques with Android software , 2016, Appl. Soft Comput..

[79]  Marian Jureczko,et al.  Using Object-Oriented Design Metrics to Predict Software Defects 1* , 2010 .

[80]  Md Zahidul Islam,et al.  Cost Sensitive Decision Forest and Voting for Software Defect Prediction , 2014, PRICAI.

[81]  Yue Jiang,et al.  Techniques for evaluating fault prediction models , 2008, Empirical Software Engineering.

[82]  Tracy Hall,et al.  Researcher Bias: The Use of Machine Learning in Software Defect Prediction , 2014, IEEE Transactions on Software Engineering.

[83]  Abraham Bernstein,et al.  Predicting defect densities in source code files with decision tree learners , 2006, MSR '06.

[84]  Md Zahidul Islam,et al.  Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem , 2015, Inf. Syst..

[85]  Guo-Zheng Li,et al.  An asymmetric classifier based on partial least squares , 2010, Pattern Recognit..

[86]  Jeng-Shyang Pan,et al.  Kernel Principal Component Analysis (KPCA)-Based Face Recognition , 2014 .

[87]  Donald E. Neumann An Enhanced Neural Network Technique for Software Risk Analysis , 2002, IEEE Trans. Software Eng..

[88]  Qinbao Song,et al.  A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[89]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[90]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[91]  Andrea De Lucia,et al.  Cross-project defect prediction models: L'Union fait la force , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[92]  Yuxiang Shen,et al.  Applying Feature Selection to Software Defect Prediction Using Multi-objective Optimization , 2017, 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC).

[93]  Shane McIntosh,et al.  A Large-Scale Study of the Impact of Feature Selection Techniques on Defect Classification Models , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).