Software defect prediction using Bayesian networks

There are lots of different software metrics discovered and used for defect prediction in the literature. Instead of dealing with so many metrics, it would be practical and easy if we could determine the set of metrics that are most important and focus on them more to predict defectiveness. We use Bayesian networks to determine the probabilistic influential relationships among software metrics and defect proneness. In addition to the metrics used in Promise data repository, we define two more metrics, i.e. NOD for the number of developers and LOCQ for the source code quality. We extract these metrics by inspecting the source code repositories of the selected Promise data repository data sets. At the end of our modeling, we learn the marginal defect proneness probability of the whole software system, the set of most effective metrics, and the influential relationships among metrics and defectiveness. Our experiments on nine open source Promise data repository data sets show that response for class (RFC), lines of code (LOC), and lack of coding quality (LOCQ) are the most effective metrics whereas coupling between objects (CBO), weighted method per class (WMC), and lack of cohesion of methods (LCOM) are less effective metrics on defect proneness. Furthermore, number of children (NOC) and depth of inheritance tree (DIT) have very limited effect and are untrustworthy. On the other hand, based on the experiments on Poi, Tomcat, and Xalan data sets, we observe that there is a positive correlation between the number of developers (NOD) and the level of defectiveness. However, further investigation involving a greater number of projects is needed to confirm our findings.

[1]  Gary D. Boetticher Nearest neighbor sampling for better defect prediction , 2005, ACM SIGSOFT Softw. Eng. Notes.

[2]  Bart Baesens,et al.  Toward Comprehensible Software Fault Prediction Models Using Bayesian Network Classifiers , 2013, IEEE Transactions on Software Engineering.

[3]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[4]  Martin J. Shepperd,et al.  Comparing Software Prediction Techniques Using Simulation , 2001, IEEE Trans. Software Eng..

[5]  Qinbao Song,et al.  Software defect association mining and defect correction effort prediction , 2006 .

[6]  Chris F. Kemerer,et al.  Towards a metrics suite for object oriented design , 2017, OOPSLA '91.

[7]  William Marsh,et al.  Predicting software defects in varying development lifecycles using Bayesian nets , 2007, Inf. Softw. Technol..

[8]  Sunghun Kim,et al.  Reducing Features to Improve Bug Prediction , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[9]  Clemente Izurieta,et al.  Effects of the number of developers on code quality in open source software: a case study , 2010, ESEM '10.

[10]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[11]  Brian Henderson-Sellers,et al.  Object-Oriented Metrics , 1995, TOOLS.

[12]  Joanne Bechta Dugan,et al.  Empirical Analysis of Software Fault Content and Fault Proneness Using Bayesian Methods , 2007, IEEE Transactions on Software Engineering.

[13]  Qinbao Song,et al.  Software defect association mining and defect correction effort prediction , 2006, IEEE Transactions on Software Engineering.

[14]  D HerbslebJames,et al.  Two case studies of open source software development , 2002 .

[15]  Du Zhang,et al.  APPLYING MACHINE LEARNING ALGORITHMS IN SOFTWARE DEVELOPMENT , 2000 .

[16]  Tim Menzies,et al.  Special issue on repeatable results in software engineering prediction , 2012, Empirical Software Engineering.

[17]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[18]  J. PaiGanesh,et al.  Empirical Analysis of Software Fault Content and Fault Proneness Using Bayesian Methods , 2007 .

[19]  Muhammad Dhiauddin Mohamed Suffian,et al.  Establishing a defect prediction model using a combination of product metrics as predictors via Six Sigma methodology , 2010, 2010 International Symposium on Information Technology.

[20]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[21]  Walter F. Tichy,et al.  Proceedings 25th International Conference on Software Engineering , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[22]  Victor R. Basili,et al.  The influence of organizational structure on software quality , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[23]  Norman E. Fenton,et al.  Software Measurement: Uncertainty and Causal Modeling , 2002, IEEE Softw..

[24]  Jianfeng Du,et al.  An Intelligent Model for Software Project Risk Prediction , 2009, 2009 International Conference on Information Management, Innovation Management and Industrial Engineering.

[25]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[26]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[27]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[28]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[29]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[30]  Yuming Zhou,et al.  Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults , 2006, IEEE Transactions on Software Engineering.

[31]  Martin Neil,et al.  Using Bayesian networks to predict software defects and reliability , 2008 .

[32]  Javam C. Machado,et al.  The prediction of faulty classes using object-oriented design metrics , 2001, J. Syst. Softw..

[33]  Ingunn Myrtveit,et al.  Reliability and validity in comparative studies of software prediction models , 2005, IEEE Transactions on Software Engineering.

[34]  Harald C. Gall,et al.  Tracking concept drift of software projects using defect prediction quality , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[35]  Arashdeep Kaur,et al.  Early Software Fault Prediction Using Real Time Defect Data , 2009, 2009 Second International Conference on Machine Vision.

[36]  Hongfang Liu,et al.  An investigation of the effect of module size on defect prediction using static measures , 2005, ACM SIGSOFT Softw. Eng. Notes.

[37]  Sousuke Amasaki,et al.  A Bayesian belief network for assessing the likelihood of fault content , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[38]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[39]  Tim Menzies,et al.  Local vs. global models for effort estimation and defect prediction , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[40]  Taghi M. Khoshgoftaar,et al.  The Detection of Fault-Prone Programs , 1992, IEEE Trans. Software Eng..

[41]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[42]  Jean-Jacques Gras,et al.  Improving fault prediction using Bayesian networks for the development of embedded software applications: Research Articles , 2006 .

[43]  Cong Jin,et al.  Applications of Support Vector Mathine and Unsupervised Learning for Predicting Maintainability Using Object-Oriented Metrics , 2010, 2010 Second International Conference on Multimedia and Information Technology.

[44]  Taghi M. Khoshgoftaar,et al.  Predicting fault-prone modules with case-based reasoning , 1997, Proceedings The Eighth International Symposium on Software Reliability Engineering.

[45]  Abhijit S. Pandya,et al.  Application of neural networks for predicting program faults , 1995, Ann. Softw. Eng..

[46]  Parag C. Pendharkar,et al.  An empirical study of the impact of team size on software development effort , 2007, Inf. Technol. Manag..

[47]  Tong-Seng Quah,et al.  Application of neural network for predicting software development faults using object-oriented design metrics , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[48]  Jean-Jacques Gras,et al.  Improving fault prediction using Bayesian networks for the development of embedded software applications , 2006, Softw. Test. Verification Reliab..

[49]  Adam A. Porter,et al.  Empirical studies of software engineering: a roadmap , 2000, ICSE '00.

[50]  Premkumar T. Devanbu,et al.  Ecological inference in empirical software engineering , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[51]  Taghi M. Khoshgoftaar,et al.  Modeling software quality: the Software Measurement Analysis and Reliability Toolkit , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[52]  Ioannis Stamelos,et al.  Software Process Modeling with Bayesian Belief Networks , 2004 .