Software Defect Prediction Using Supervised Machine Learning and Ensemble Techniques: A Comparative Study

An essential objective of software development is to locate and fix defects ahead of schedule that could be expected under diverse circumstances. Many software development activities are performed by individuals, which may lead to different software bugs over the development to occur, causing disappointments in the not-so-distant future. Thus, the prediction of software defects in the first stages has become a primary interest in the field of software engineering. Various software defect prediction (SDP) approaches that rely on software metrics have been proposed in the last two decades. Bagging, support vector machines (SVM), decision tree (DS), and random forest (RF) classifiers are known to perform well to predict defects. This paper studies and compares these supervised machine learning and ensemble classifiers on 10 NASA datasets. The experimental results showed that, in the majority of cases, RF was the best performing classifier compared to the others.

[1]  Guangchun Luo,et al.  Transfer learning for cross-company software defect prediction , 2012, Inf. Softw. Technol..

[2]  Mustafa Hammad,et al.  Software Bug Prediction using Machine Learning Approach , 2018 .

[3]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[4]  Ruchika Malhotra,et al.  Software defect prediction using neural networks , 2014, Proceedings of 3rd International Conference on Reliability, Infocom Technologies and Optimization.

[5]  Yue Jiang,et al.  Variance Analysis in Software Fault Prediction Models , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[6]  M. Lilly Florence,et al.  Software defect prediction techniques using metrics based on neural network classifier , 2018, Cluster Computing.

[7]  Rongxin Wu,et al.  Dealing with noise in defect prediction , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[8]  N. Kalaivani,et al.  Overview of Software Defect Prediction using Machine Learning Algorithms , 2018 .

[9]  Rajesh Bhatia,et al.  Taxonomy of machine learning algorithms in software fault prediction using object oriented metrics , 2018 .

[10]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[11]  Vandana Bhattacherjee,et al.  Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm , 2012, IEEE Transactions on Knowledge and Data Engineering.

[12]  Hoh Peter In,et al.  Micro interaction metrics for defect prediction , 2011, ESEC/FSE '11.

[13]  Hossam Faris,et al.  Hybrid SMOTE-Ensemble Approach for Software Defect Prediction , 2017, CSOC.

[14]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[15]  Banu Diri,et al.  Software Fault Prediction of Unlabeled Program Modules , 2009 .

[16]  Taghi M. Khoshgoftaar,et al.  How Many Software Metrics Should be Selected for Defect Prediction? , 2011, FLAIRS.

[17]  Qinbao Song,et al.  A General Software Defect-Proneness Prediction Framework , 2011, IEEE Transactions on Software Engineering.

[18]  Bruce Christianson,et al.  Using the Support Vector Machine as a Classification Method for Software Defect Prediction with Static Code Metrics , 2009, EANN.

[19]  Wenyuan Liu,et al.  Comparative Study on Defect Prediction Algorithms of Supervised Learning Software Based on Imbalanced Classification Data Sets , 2018, 2018 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[20]  Nagy Ramadan,et al.  Early Prediction of Software Defect using Ensemble Learning: A Comparative Study , 2018, International Journal of Computer Applications.

[21]  Euyseok Hong,et al.  Software Fault Prediction Model using Clustering Algorithms Determining the Number of Clusters Automatically , 2014 .

[22]  Maurice H. Halstead,et al.  Elements of software science (Operating and programming systems series) , 1977 .

[23]  K. Punitha,et al.  Software defect prediction using software metrics - A survey , 2013, 2013 International Conference on Information Communication and Embedded Systems (ICICES).

[24]  Xiaohong Su,et al.  An Empirical Study on Software Defect Prediction Using Over-Sampling by SMOTE , 2018, Int. J. Softw. Eng. Knowl. Eng..

[25]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[26]  Muhammad Abuzar Fahiem,et al.  A Review on Machine Learning Techniques for Software Defect Prediction , 2018 .

[27]  Logan Perreault,et al.  Using Classifiers for Software Defect Detection , 2016 .

[28]  Qing Sun,et al.  Software defect prediction via transfer learning based neural network , 2015, 2015 First International Conference on Reliability Systems Engineering (ICRSE).

[29]  Hongyu Zhang,et al.  An investigation of the relationships between lines of code and defects , 2009, 2009 IEEE International Conference on Software Maintenance.

[30]  Li Zhang,et al.  Software Defect Prediction Using Non-Negative Matrix Factorization , 2011, J. Softw..

[31]  Ping Guo,et al.  Software Defect Prediction Using Fuzzy Support Vector Regression , 2010, ISNN.

[32]  Shomona Gracia Jacob,et al.  Improved Random Forest Algorithm for Software Defect Prediction through Data Mining Techniques , 2015 .

[33]  Luiz Fernando Capretz,et al.  Benchmarking Machine Learning Technologies for Software Defect Detection , 2015, ArXiv.

[34]  R. Shatnawi Improving software fault-prediction for imbalanced data , 2012, 2012 International Conference on Innovations in Information Technology (IIT).

[35]  Bart Baesens,et al.  Mining software repositories for comprehensible software fault prediction models , 2008, J. Syst. Softw..

[36]  Sandeep Kumar,et al.  A Decision Tree Regression based Approach for the Number of Software Faults Prediction , 2016, ACM SIGSOFT Softw. Eng. Notes.

[37]  Jian Li,et al.  Software Defect Prediction via Convolutional Neural Network , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[38]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[39]  José Javier Dolado,et al.  Preliminary comparison of techniques for dealing with imbalance in software defect prediction , 2014, EASE '14.

[40]  Sandeep Kumar,et al.  An empirical study of some software fault prediction techniques for the number of faults prediction , 2017, Soft Comput..

[41]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[42]  Niclas Ohlsson,et al.  Predicting Fault-Prone Software Modules in Telephone Switches , 1996, IEEE Trans. Software Eng..

[43]  Marian Jureczko,et al.  Using Object-Oriented Design Metrics to Predict Software Defects 1* , 2010 .

[44]  Rakesh Kumar,et al.  Software Bug Prediction System Using Neural Network , 2016 .

[45]  Venkata U. B. Challagulla,et al.  Empirical assessment of machine learning based software defect prediction techniques , 2005, 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems.

[46]  Gagandeep,et al.  Improved approach for software defect prediction using artificial neural networks , 2016, 2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO).

[47]  Dharmendra Lal Gupta,et al.  Software bug prediction using object-oriented metrics , 2017 .

[48]  Sanjay Kumar Dubey,et al.  Software Defect Prediction Models for Quality Improvement : A Literature Study , 2012 .

[49]  Rainer Koschke,et al.  Revisiting the evaluation of defect prediction models , 2009, PROMISE '09.

[50]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[51]  Anuradha Chug,et al.  Software Defect Prediction Using Supervised Learning Algorithm and Unsupervised Learning Algorithm , 2013 .

[52]  Lei Ting Software defect prevention based on defect classification and defect prediction , 2013 .

[53]  Mustafa ElNainay,et al.  Software bug prediction using weighted majority voting techniques , 2018, Alexandria Engineering Journal.

[54]  Hui Wang,et al.  Software Defects Classification Prediction Based On Mining Software Repository , 2014 .

[55]  Yue Jiang,et al.  Fault Prediction using Early Lifecycle Data , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[56]  S. Dick,et al.  Applying Novel Resampling Strategies To Software Defect Prediction , 2007, NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society.

[57]  Richard Torkar,et al.  Software fault prediction metrics: A systematic literature review , 2013, Inf. Softw. Technol..

[58]  Anuradha Chug,et al.  Software defect prediction analysis using machine learning algorithms , 2017, 2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence.

[59]  Qinbao Song,et al.  A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction , 2019, IEEE Transactions on Software Engineering.

[60]  Arif Ali Khan,et al.  Performance Evaluation of Ensemble Methods For Software Fault Prediction: An Experiment , 2015, ASWEC.