Asymmetric Learning Based on Kernel Partial Least Squares for Software Defect Prediction

Software defect prediction is an essential part of software quality analysis and has been extensively studied in the domain of software-reliability engineering [1]–[5]. However, As pointed out by Menzies et al. [2] and Khoshgoftaar et al. [4], the class imbalance problem encountered in realworld data sets often degrades the performance of defect predictors. The software defect data set is class imbalanced when the majority of defects in a software system are located in a small percentage of the program modules. Existing approaches to solving the class imbalance problem mainly include data-level and algorithm-level methods, which are compared in [4]. Their results show that the algorithm-level method AdaBoost almost always outperforms even the best data-level methods in software defect prediction. Most recently, Qu et al. [6] proposed an asymmetric classifier APLSC, which is based on linear partial least squares, to tackle the class imbalance problem. In this paper, we develop a kernel based asymmetric learning method, called Asymmetric Kernel Partial Least Squares Classification (AKPLSC), which is able to nonlinearly extract the favorable features and retrieve the loss caused by class imbalance problem.

[1]  Taghi M. Khoshgoftaar,et al.  Improving Software-Quality Predictions With Data Sampling and Boosting , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[2]  Roman Rosipal,et al.  Kernel PLS-SVC for Linear and Nonlinear Classification , 2003, ICML.

[3]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[4]  Guangchun Luo,et al.  Transfer learning for cross-company software defect prediction , 2012, Inf. Softw. Technol..

[5]  Guo-Zheng Li,et al.  An asymmetric classifier based on partial least squares , 2010, Pattern Recognit..

[6]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[7]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[8]  Taghi M. Khoshgoftaar,et al.  Using regression trees to classify fault-prone software modules , 2002, IEEE Trans. Reliab..