Software defect prediction using cost-sensitive neural network

Software defect prediction model was built by Artificial Neural Network (ANN).ANN connection weights were optimized by Artificial Bee Colony (ABC).Parametric cost-sensitivity feature was added to ANN by using a new error function.Model was applied to five publicly available datasets from the NASA repository.Results were compared with other cost-sensitive and non-cost-sensitive studies. The software development life cycle generally includes analysis, design, implementation, test and release phases. The testing phase should be operated effectively in order to release bug-free software to end users. In the last two decades, academicians have taken an increasing interest in the software defect prediction problem, several machine learning techniques have been applied for more robust prediction. A different classification approach for this problem is proposed in this paper. A combination of traditional Artificial Neural Network (ANN) and the novel Artificial Bee Colony (ABC) algorithm are used in this study. Training the neural network is performed by ABC algorithm in order to find optimal weights. The False Positive Rate (FPR) and False Negative Rate (FNR) multiplied by parametric cost coefficients are the optimization task of the ABC algorithm. Software defect data in nature have a class imbalance because of the skewed distribution of defective and non-defective modules, so that conventional error functions of the neural network produce unbalanced FPR and FNR results. The proposed approach was applied to five publicly available datasets from the NASA Metrics Data Program repository. Accuracy, probability of detection, probability of false alarm, balance, Area Under Curve (AUC), and Normalized Expected Cost of Misclassification (NECM) are the main performance indicators of our classification approach. In order to prevent random results, the dataset was shuffled and the algorithm was executed 10 times with the use of n-fold cross-validation in each iteration. Our experimental results showed that a cost-sensitive neural network can be created successfully by using the ABC optimization algorithm for the purpose of software defect prediction.

[1]  Dervis Karaboga,et al.  Artificial Bee Colony (ABC) Optimization Algorithm for Training Feed-Forward Neural Networks , 2007, MDAI.

[2]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007 .

[3]  S. Kanmani,et al.  Object-oriented software fault prediction using neural networks , 2007, Inf. Softw. Technol..

[4]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[5]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[6]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[7]  Yuming Zhou,et al.  Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults , 2006, IEEE Transactions on Software Engineering.

[8]  Khaled El Emam,et al.  Comparing case-based reasoning classifiers for predicting high risk software components , 2001, J. Syst. Softw..

[9]  Hongfang Liu,et al.  An investigation of the effect of module size on defect prediction using static measures , 2005, PROMISE@ICSE.

[10]  Donald E. Neumann An Enhanced Neural Network Technique for Software Risk Analysis , 2002, IEEE Trans. Software Eng..

[11]  Xin Yao,et al.  Evolutionary Artificial Neural Networks , 1993, Int. J. Neural Syst..

[12]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[13]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Tong-Seng Quah,et al.  Application of neural networks for software quality prediction using object-oriented metrics , 2005, J. Syst. Softw..

[15]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.

[16]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[17]  Taghi M. Khoshgoftaar,et al.  Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study , 2004, Empirical Software Engineering.

[18]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[19]  Qinbao Song,et al.  Using Coding-Based Ensemble Learning to Improve Software Defect Prediction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[20]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[21]  Dervis Karaboga,et al.  A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm , 2007, J. Glob. Optim..

[22]  S. Dick,et al.  Applying Novel Resampling Strategies To Software Defect Prediction , 2007, NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society.

[23]  Adam A. Porter,et al.  Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis , 1988, IEEE Trans. Software Eng..

[24]  Barry W. Boehm,et al.  Understanding and Controlling Software Costs , 1988, IEEE Trans. Software Eng..

[25]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[26]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[27]  Taghi M. Khoshgoftaar,et al.  Application of neural networks to software quality modeling of a very large telecommunications system , 1997, IEEE Trans. Neural Networks.

[28]  Aurora Trinidad Ramirez Pozo,et al.  A symbolic fault-prediction model based on multiobjective particle swarm optimization , 2010, J. Syst. Softw..

[29]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[30]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[31]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[32]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[33]  Ling Xu,et al.  Ordering Effects in Clustering , 1992, ML.

[34]  Derviş Karaboğa,et al.  NEURAL NETWORKS TRAINING BY ARTIFICIAL BEE COLONY ALGORITHM ON PATTERN CLASSIFICATION , 2009 .

[35]  Ayse Basar Bener,et al.  Defect prediction from static code features: current results, limitations, new approaches , 2010, Automated Software Engineering.

[36]  W. J. Conover,et al.  Practical Nonparametric Statistics , 1972 .

[37]  Banu Diri,et al.  Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem , 2009, Inf. Sci..

[38]  Yue Jiang,et al.  Techniques for evaluating fault prediction models , 2008, Empirical Software Engineering.

[39]  Taghi M. Khoshgoftaar,et al.  Analogy-Based Practical Classification Rules for Software Quality Estimation , 2003, Empirical Software Engineering.

[40]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[41]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[42]  Lionel C. Briand,et al.  Predicting fault-prone components in a java legacy system , 2006, ISESE '06.

[43]  Jun Zheng,et al.  Cost-sensitive boosting neural networks for software defect prediction , 2010, Expert Syst. Appl..

[44]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[45]  Evangelos E. Milios,et al.  Using Unsupervised Learning to Guide Resampling in Imbalanced Data Sets , 2001, AISTATS.

[46]  Maurice H. Halstead,et al.  Elements of software science (Operating and programming systems series) , 1977 .

[47]  Khulood AlYahya,et al.  Artificial Bee Colony Training of Neural Networks , 2013, NICSO.

[48]  Dervis Karaboga,et al.  A novel clustering approach: Artificial Bee Colony (ABC) algorithm , 2011, Appl. Soft Comput..

[49]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[50]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[51]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[52]  Taghi M. Khoshgoftaar,et al.  Classification-tree models of software-quality over multiple releases , 2000, IEEE Trans. Reliab..

[53]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[54]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[55]  Bart Baesens,et al.  Mining software repositories for comprehensible software fault prediction models , 2008, J. Syst. Softw..

[56]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[57]  Thomas Ragg,et al.  Using machine learning for estimating the defect content after an inspection , 2004, IEEE Transactions on Software Engineering.

[58]  Abraham Kandel,et al.  Data mining in software metrics databases , 2004, Fuzzy Sets Syst..

[59]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[60]  Edward B. Allen,et al.  GP-based software quality prediction , 1998 .

[61]  Dervis Karaboga,et al.  A comparative study of Artificial Bee Colony algorithm , 2009, Appl. Math. Comput..

[62]  Letha H. Etzkorn,et al.  Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes , 2007, IEEE Transactions on Software Engineering.