A Bayesian approach for software quality prediction

Many statistical algorithms have been proposed for software quality prediction of fault-prone and non fault-prone program modules. The main goal of these algorithms is the improvement of software development processes. In this paper, we introduce a new software prediction algorithm. Our approach is purely Bayesian and is based on finite Dirichlet mixture models. The implementation of the Bayesian approach is done through the use of the Gibbs sampler. Experimental results are presented using simulated data, and a real application for software modules classification is also included.

[1]  Victor R. Basili,et al.  A Pattern Recognition Approach for Software Engineering Data Analysis , 1992, IEEE Trans. Software Eng..

[2]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[3]  Victor R. Basili,et al.  Developing Interpretable Models with Optimized Set Reduction for Identifying High-Risk Software Components , 1993, IEEE Trans. Software Eng..

[4]  Jean-Michel Marin,et al.  Bayesian Modelling and Inference on Mixtures of Distributions , 2005 .

[5]  J. Mosimann On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions , 1962 .

[6]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  Tze-Jie Yu,et al.  Identifying Error-Prone Software—An Empirical Study , 1985, IEEE Transactions on Software Engineering.

[9]  A. Narayanan A note on parameter estimation in the multivariate beta distribution , 1992 .

[10]  Nizar Bouguila,et al.  Novel Mixtures Based on the Dirichlet Distribution: Application to Data and Image Classification , 2003, MLDM.

[11]  Abhijit S. Pandya,et al.  A comparative study of pattern recognition techniques for quality evaluation of telecommunications software , 1994, IEEE J. Sel. Areas Commun..

[12]  Nizar Bouguila,et al.  Practical Bayesian estimation of a finite beta mixture through gibbs sampling and its applications , 2006, Stat. Comput..

[13]  Taghi M. Khoshgoftaar,et al.  The Detection of Fault-Prone Programs , 1992, IEEE Trans. Software Eng..

[14]  Taghi M. Khoshgoftaar,et al.  Early Quality Prediction: A Case Study in Telecommunications , 1996, IEEE Softw..

[15]  Nizar Bouguila,et al.  Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application , 2004, IEEE Transactions on Image Processing.

[16]  Peter Congdon,et al.  Applied Bayesian Modelling , 2003 .

[17]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[18]  Taghi M. Khoshgoftaar,et al.  A practical classification-rule for software-quality models , 2000, IEEE Trans. Reliab..

[19]  Michael R. Lyu,et al.  Handbook of software reliability engineering , 1996 .

[20]  N. L. Johnson,et al.  Continuous Multivariate Distributions: Models and Applications , 2005 .

[21]  Nizar Bouguila,et al.  Unsupervised learning of a finite discrete mixture: Applications to texture modeling and image databases summarization , 2007, J. Vis. Commun. Image Represent..

[22]  Taghi M. Khoshgoftaar,et al.  The impact of costs of misclassification on software quality modeling , 1997, Proceedings Fourth International Software Metrics Symposium.

[23]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[24]  David J. C. MacKay,et al.  A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[25]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[26]  G. Ronning Maximum likelihood estimation of dirichlet distributions , 1989 .

[27]  Nizar Bouguila,et al.  Unsupervised selection of a finite Dirichlet mixture model: an MML-based approach , 2006, IEEE Transactions on Knowledge and Data Engineering.

[28]  Adam A. Porter,et al.  Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis , 1988, IEEE Trans. Software Eng..

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .