Predicting Defect-Prone Software Modules Using Shifted-Scaled Dirichlet Distribution

Effective prediction of defect-prone software modules enables software developers to avoid the expensive costs in resources and efforts they might expense, and focus efficiently on quality assurance activities. Different classification methods have been applied previously to categorize a module in a system into two classes; defective or non-defective. Among the successful approaches, finite mixture modeling has been efficiently applied for solving this problem. This paper proposes the shifted-scaled Dirichlet model (SSDM) and evaluates its capability in predicting defect-prone software modules in the context of four NASA datasets. The results indicate that the prediction performance of SSDM is competitive to some previously used generative models.

[1]  Emad Shihab,et al.  An Exploration of Challenges Limiting Pragmatic Software Defect Prediction , 2012 .

[2]  Nizar Bouguila,et al.  A Dirichlet Process Mixture of Generalized Dirichlet Distributions for Proportional Data Modeling , 2010, IEEE Transactions on Neural Networks.

[3]  G. Ronning Maximum likelihood estimation of dirichlet distributions , 1989 .

[4]  Jonathan Huang Maximum Likelihood Estimation of Dirichlet Distribution Parameters , 2005 .

[5]  Khaled El Emam,et al.  Comparing case-based reasoning classifiers for predicting high risk software components , 2001, J. Syst. Softw..

[6]  D. Ziou,et al.  A powerful finite mixture model based on the generalized Dirichlet distribution: unsupervised learning and applications , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[7]  Antonia Bertolino,et al.  Software Testing Research: Achievements, Challenges, Dreams , 2007, Future of Software Engineering (FOSE '07).

[8]  Ivan Gesteira Costa Filho Mixture Models for the Analysis of Gene Expression , 2008 .

[9]  T. Minka Estimating a Dirichlet distribution , 2012 .

[10]  Osamu Mizuno,et al.  Predicting Fault-Prone Modules by Word Occurrence in Identifiers , 2014, Software Engineering Research, Management and Applications.

[11]  Yue Jiang,et al.  Fault Prediction using Early Lifecycle Data , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[12]  Alexandre Boucher,et al.  Predicting Fault-Prone Classes in Object-Oriented Software: An Adaptation of an Unsupervised Hybrid SOM Algorithm , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[13]  Emad Shihab,et al.  Practical Software Quality Prediction , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[14]  Izzat Alsmadi,et al.  Enhance Rule Based Detection for Software Fault Prone Modules , 2012 .

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  R. Hankin A Generalization of the Dirichlet Distribution , 2010 .

[17]  Rico Krueger,et al.  A Dirichlet Process Mixture Model of Discrete Choice , 2018, 1801.06296.

[18]  Nizar Bouguila,et al.  Unsupervised learning of finite mixtures using scaled dirichlet distribution and its application to software modules categorization , 2017, 2017 IEEE International Conference on Industrial Technology (ICIT).

[19]  Victor R. Basili,et al.  Developing Interpretable Models with Optimized Set Reduction for Identifying High-Risk Software Components , 1993, IEEE Trans. Software Eng..

[20]  Hongfang Liu,et al.  Building effective defect-prediction models in practice , 2005, IEEE Software.

[21]  Zhengyu Hu,et al.  Initializing the EM Algorithm for Data Clustering and Sub-population Detection , 2015 .

[22]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[24]  Akif Günes Koru,et al.  An empirical comparison and characterization of high defect and high complexity modules , 2003, J. Syst. Softw..

[25]  Cagatay Catal,et al.  Software fault prediction: A literature review and current trends , 2011, Expert Syst. Appl..

[26]  G. Tian,et al.  Dirichlet and Related Distributions: Theory, Methods and Applications , 2011 .

[27]  Nizar Bouguila,et al.  High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture Model Based on Minimum Message Length , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.