Ensemble of Software Defect Predictors: an AHP-Based Evaluation Method

Classification algorithms that help to identify software defects or faults play a crucial role in software risk management. Experimental results have shown that ensemble of classifiers are often more accurate and robust to the effects of noisy data, and achieve lower average error rate than any of the constituent classifiers. However, inconsistencies exist in different studies and the performances of learning algorithms may vary using different performance measures and under different circumstances. Therefore, more research is needed to evaluate the performance of ensemble algorithms in software defect prediction. The goal of this paper is to assess the quality of ensemble methods in software defect prediction with the analytic hierarchy process (AHP), which is a multicriteria decision-making approach that prioritizes decision alternatives based on pairwise comparisons. Through the application of the AHP, this study compares experimentally the performance of several popular ensemble methods using 13 different performance metrics over 10 public-domain software defect datasets from the NASA Metrics Data Program (MDP) repository. The results indicate that ensemble methods can improve the classification results of software defect prediction in general and AdaBoost gives the best results. In addition, tree and rule based classifiers perform better in software defect prediction than other types of classifiers included in the experiment. In terms of single classifier, K-nearest-neighbor, C4.5, and Naive Bayes tree ranked higher than other classifiers.

[1]  Fatemeh Zahedi,et al.  The Analytic Hierarchy Process—A Survey of the Method and its Applications , 1986 .

[2]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[3]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[4]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[5]  Keith Phalp,et al.  An investigation of machine learning based prediction systems , 2000, J. Syst. Softw..

[6]  Abhijit S. Pandya,et al.  Application of neural networks for predicting program faults , 1995, Ann. Softw. Eng..

[7]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[8]  Han-Lin Li,et al.  Ranking Decision Alternatives by Integrated DEA, AHP and Gower Plot Techniques , 2008, Int. J. Inf. Technol. Decis. Mak..

[9]  Ian Witten,et al.  Data Mining , 2000 .

[10]  Taghi M. Khoshgoftaar,et al.  Classification-tree models of software-quality over multiple releases , 2000, IEEE Trans. Reliab..

[11]  Taghi M. Khoshgoftaar,et al.  Using regression trees to classify fault-prone software modules , 2002, IEEE Trans. Reliab..

[12]  Thomas L. Saaty,et al.  Multicriteria Decision Making: The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation , 1990 .

[13]  Taghi M. Khoshgoftaar,et al.  Analogy-Based Practical Classification Rules for Software Quality Estimation , 2003, Empirical Software Engineering.

[14]  Honggang Wang,et al.  Empirical Evaluation of Classifiers for Software Risk Management , 2009, Int. J. Inf. Technol. Decis. Mak..

[15]  Venkata U. B. Challagulla,et al.  Empirical Assessment of Machine Learning Based Software Defect Prediction Techniques , 2008, Int. J. Artif. Intell. Tools.

[16]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[17]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[18]  Martin J. Shepperd,et al.  Comparing Software Prediction Techniques Using Simulation , 2001, IEEE Trans. Software Eng..

[19]  Taghi M. Khoshgoftaar,et al.  The Detection of Fault-Prone Programs , 1992, IEEE Trans. Software Eng..

[20]  Adam A. Porter,et al.  Evaluating techniques for generating metric-based classification trees , 1990, J. Syst. Softw..

[21]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[22]  Thomas L. Saaty,et al.  Extending the Measurement of tangibles to Intangibles , 2009, Int. J. Inf. Technol. Decis. Mak..

[23]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[24]  Edward B. Allen,et al.  Case-Based Software Quality Prediction , 2000, Int. J. Softw. Eng. Knowl. Eng..

[25]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[26]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[27]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[28]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[29]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[30]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[31]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[32]  T. Saaty How to Make a Decision: The Analytic Hierarchy Process , 1990 .

[33]  Gang Kou,et al.  A simple method to improve the consistency ratio of the pair-wise comparison matrix in ANP , 2011, Eur. J. Oper. Res..

[34]  Taghi M. Khoshgoftaar,et al.  Application of neural networks to software quality modeling of a very large telecommunications system , 1997, IEEE Trans. Neural Networks.

[35]  Zhengxin Chen,et al.  A Descriptive Framework for the Field of Data Mining and Knowledge Discovery , 2008, Int. J. Inf. Technol. Decis. Mak..

[36]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[37]  Tim Menzies,et al.  Assessing Predictors of Software Defects , 2004 .

[38]  Ingunn Myrtveit,et al.  A Controlled Experiment to Assess the Benefits of Estimating with Analogy and Regression Models , 1999, IEEE Trans. Software Eng..

[39]  Ingunn Myrtveit,et al.  Reliability and validity in comparative studies of software prediction models , 2005, IEEE Transactions on Software Engineering.

[40]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[41]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[43]  Hideo Tanaka,et al.  Interval Evaluations in the Analytic Hierarchy Process By Possibility Analysis , 2001, Comput. Intell..

[44]  Thomas L. Saaty,et al.  DECISION MAKING WITH THE ANALYTIC HIERARCHY PROCESS , 2008 .

[45]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[46]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[47]  Dimitris K. Despotis,et al.  A min-max Goal Programming Approach to Priority Derivation in AHP with Interval Judgements , 2008, Int. J. Inf. Technol. Decis. Mak..

[48]  Kai Ming Ting,et al.  A Study of AdaBoost with Naive Bayesian Classifiers: Weakness and Improvement , 2003, Comput. Intell..

[49]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[50]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.

[51]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[52]  Honggang Wang,et al.  User preferences based software defect detection algorithms selection using MCDM , 2012, Inf. Sci..

[53]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[54]  Khaled El Emam,et al.  Comparing case-based reasoning classifiers for predicting high risk software components , 2001, J. Syst. Softw..

[55]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[56]  William Ho,et al.  Integrated analytic hierarchy process and its applications - A literature review , 2008, Eur. J. Oper. Res..

[57]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[58]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[59]  Janyce Wiebe,et al.  RECOGNIZING STRONG AND WEAK OPINION CLAUSES , 2006, Comput. Intell..

[60]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[61]  José Hernández-Orallo,et al.  An experimental comparison of performance measures for classification , 2009, Pattern Recognit. Lett..

[62]  T. L. Saaty A Scaling Method for Priorities in Hierarchical Structures , 1977 .

[63]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[64]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[65]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[66]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.