An Approach for the Prediction of Number of Software Faults Based on the Dynamic Selection of Learning Techniques

Determining the most appropriate learning technique(s) is vital for the accurate and effective software fault prediction (SFP). Earlier techniques used for SFP have reported varying performance for different software projects and none of them has always reported the best performance across different projects. The problem of varying performance can be solved by using an approach, which partitions the fault dataset into different module subsets, trains learning techniques for each subset, and integrates the outcomes of all the learning techniques. This paper presents an approach that dynamically selects learning techniques to predict the number of software faults. For a given testing module, the presented approach first locates its neighbor module subset that contained modules similar to testing module using a distance function and then chooses the best learning technique in the region of that module subset to make the prediction for testing module. The learning technique is selected based on its past performance in the region of module subset. We have performed an evaluation of the proposed approach using fault datasets garnered from the PROMISE data repository and Eclipse bug data repository. Experimental results showed that the proposed approach led to an improved performance when predicting the number of faults in software systems.

[1]  Ludmila I. Kuncheva,et al.  Switching between selection and fusion in combining classifiers: an experiment , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[2]  Mahendra Tiwari,et al.  Performance analysis of Data Mining algorithms in Weka , 2012 .

[3]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[4]  Christopher J. Merz,et al.  Dynamical Selection of Learning Algorithms , 1995, AISTATS.

[5]  Xindong Wu,et al.  Effective classification of noisy data streams with attribute-oriented dynamic classifier selection , 2006, Knowledge and Information Systems.

[6]  Calvin R. Maurer,et al.  A Linear Time Algorithm for Computing Exact Euclidean Distance Transforms of Binary Images in Arbitrary Dimensions , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Sandeep Kumar,et al.  Predicting Number of Faults in Software System using Genetic Programming , 2015, SCSE.

[8]  Tony Gorschek,et al.  Genetic programming for cross-release fault count predictions in large and complex software projects , 2010 .

[9]  Haidar Osman On the Non-Generalizability in Bug Prediction , 2016, SATToSE.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Sandeep Kumar,et al.  Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems , 2017, Knowl. Based Syst..

[12]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[13]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[14]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[15]  Tim Menzies,et al.  Better cross company defect prediction , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[16]  Stephen G. MacDonell Establishing relationships between specification size and software process effort in CASE environments , 1997, Inf. Softw. Technol..

[17]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[18]  Thom Baguley,et al.  Serious stats: a guide to advanced statistics for the behavioral sciences , 2012 .

[19]  Karthik Thyagarajan Iyer Computational complexity of data mining algorithms used in fraud detection , 2015 .

[20]  Michael J. Pazzani,et al.  Classification and regression by combining models , 1998 .

[21]  Liguo Yu,et al.  Using Negative Binomial Regression Analysis to Predict Software Faults: A Study of Apache Ant , 2012 .

[22]  Paul C. Smits,et al.  Multiple classifier systems for supervised remote sensing image classification based on dynamic classifier selection , 2002, IEEE Trans. Geosci. Remote. Sens..

[23]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[24]  Luiz Eduardo Soares de Oliveira,et al.  Dynamic selection of classifiers - A comprehensive review , 2014, Pattern Recognit..

[25]  Xin Yao,et al.  Can cross-company data improve performance in software effort estimation? , 2012, PROMISE '12.

[26]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[27]  Sandeep Kumar,et al.  Towards an ensemble based system for predicting the number of software faults , 2017, Expert Syst. Appl..

[28]  Elaine J. Weyuker,et al.  Looking for bugs in all the right places , 2006, ISSTA '06.

[29]  V. R. Sarma Dhulipala,et al.  The Study and Analysis of Classification Algorithm for Animal Kingdom Dataset , 2013 .

[30]  Carlos Soares,et al.  Ensemble Learning: A Study on Different Variants of the Dynamic Selection Approach , 2009, MLDM.

[31]  Lei Huang,et al.  Centered Weight Normalization in Accelerating Training of Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Shane McIntosh,et al.  Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[33]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[34]  Yuming Zhou,et al.  Empirical analysis of network measures for effort-aware fault-proneness prediction , 2016, Inf. Softw. Technol..

[35]  Taghi M. Khoshgoftaar,et al.  Using regression trees to classify fault-prone software modules , 2002, IEEE Trans. Reliab..

[36]  Xin Yao,et al.  Online Class Imbalance Learning and its Applications in Fault Detection , 2013, Int. J. Comput. Intell. Appl..

[37]  Marian Jureczko,et al.  Significance of Different Software Metrics in Defect Prediction , 2011 .

[38]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[39]  Raed Shatnawi,et al.  The effectiveness of software metrics in identifying error-prone classes in post-release software evolution process , 2008, J. Syst. Softw..

[40]  Sandeep Kumar,et al.  A decision tree logic based recommendation system to select software fault prediction techniques , 2017, Computing.

[41]  Taghi M. Khoshgoftaar,et al.  Count Models for Software Quality Estimation , 2007, IEEE Transactions on Reliability.

[42]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[43]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[44]  Elliot Soloway,et al.  Where the bugs are , 1985, CHI '85.

[45]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[46]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[47]  Luís Torgo,et al.  SMOTE for Regression , 2013, EPIA.

[48]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[49]  Forrest Shull,et al.  Local versus Global Lessons for Defect Prediction and Effort Estimation , 2013, IEEE Transactions on Software Engineering.

[50]  Yutao Ma,et al.  An empirical study on predicting defect numbers , 2015, SEKE.

[51]  H. Abdi Holm's Sequential Bonferroni Procedure , 2010 .

[52]  Taghi M. Khoshgoftaar,et al.  Empirical case studies of combining software quality classification models , 2003, Third International Conference on Quality Software, 2003. Proceedings..

[53]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[54]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[55]  Xin Yao,et al.  A Learning-to-Rank Approach to Software Defect Prediction , 2015, IEEE Transactions on Reliability.

[56]  Hamoud I. Aljamaan,et al.  An empirical study of bagging and boosting ensembles for identifying faulty classes in object-oriented software , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[57]  David Lo,et al.  Automated Bug Report Field Reassignment and Refinement Prediction , 2016, IEEE Transactions on Reliability.

[58]  Ayse Basar Bener,et al.  An industrial case study of classifier ensembles for locating software defects , 2011, Software Quality Journal.

[59]  Cullen Schaffer,et al.  Selecting a classification method by cross-validation , 1993, Machine Learning.

[60]  Li Ming,et al.  Software Defect Prediction: Software Defect Prediction , 2008 .

[61]  Elena Deza,et al.  Encyclopedia of Distances , 2014 .

[62]  Witold Pedrycz,et al.  Identification of defect-prone classes in telecommunication software systems using design metrics , 2006, Inf. Sci..

[63]  Zachary Blanks,et al.  Ensemble Methods in Machine Learning: An Algorithmic Approach to Derive Distinctive Behaviors of Criminal Activity Applied to the Poaching Domain , 2017 .

[64]  Sandeep Kumar,et al.  An empirical study of some software fault prediction techniques for the number of faults prediction , 2017, Soft Comput..

[65]  Naonori Ueda,et al.  Optimal Linear Combination of Neural Networks for Improving Classification Performance , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[66]  Taghi M. Khoshgoftaar,et al.  Stability of filter- and wrapper-based software metric selection techniques , 2014, Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014).

[67]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[68]  Taghi M. Khoshgoftaar,et al.  A Comprehensive Empirical Study of Count Models for Software Fault Prediction , 2007, IEEE Transactions on Reliability.

[69]  Xin Yao,et al.  Resampling-Based Ensemble Methods for Online Class Imbalance Learning , 2015, IEEE Transactions on Knowledge and Data Engineering.

[70]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[71]  O. Nelles Nonlinear System Identification: From Classical Approaches to Neural Networks and Fuzzy Models , 2000 .

[72]  Tracy Hall,et al.  Software defect prediction: do different classifiers find the same defects? , 2017, Software Quality Journal.

[73]  W. Afzal,et al.  prediction of fault count data using genetic programming , 2008, 2008 IEEE International Multitopic Conference.

[74]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[75]  Kevin W. Bowyer,et al.  Combination of Multiple Classifiers Using Local Accuracy Estimates , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[76]  Ayse Basar Bener,et al.  Defect prediction from static code features: current results, limitations, new approaches , 2010, Automated Software Engineering.