Software quality prediction using mixture models with EM algorithm

The use of the statistical technique of mixture model analysis as a tool for early prediction of fault-prone program modules is investigated. The expectation-maximum likelihood (EM) algorithm is engaged to build the model. By only employing software size and complexity metrics, this technique can be used to develop a model for predicting software quality even without the prior knowledge of the number of faults in the modules. In addition, Akaike Information Criterion (AIC) is used to select the model number which is assumed to be the class number the program modules should be classified. The technique is successful in classifying software into fault-prone and non fault-prone modules with a relatively low error rate, providing a reliable indicator for software quality prediction.

[1]  Swapna S. Gokhale,et al.  Regression Tree Modeling For The Prediction Of Software Quality , 1997 .

[2]  N. E. Day Estimating the components of a mixture of normal distributions , 1969 .

[3]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[4]  Tze-Jie Yu,et al.  Identifying Error-Prone Software—An Empirical Study , 1985, IEEE Transactions on Software Engineering.

[5]  K. Vairavan,et al.  An Experimental Study of Software Metrics for Real-Time Software , 1985, IEEE Transactions on Software Engineering.

[6]  Daryl Pregibon,et al.  An analysis of static metrics and faults in C software , 1985, J. Syst. Softw..

[7]  Michael J. Townsend,et al.  Thomas Piketty: Capital in the twenty-first century , 2014, Public Choice.

[8]  Adam A. Porter,et al.  Empirically guided software development using metric-based classification trees , 1990, IEEE Software.

[9]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[10]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[11]  A. Cohen,et al.  Finite Mixture Distributions , 1982 .

[12]  H. Akaike A new look at the statistical model identification , 1974 .

[13]  Taghi M. Khoshgoftaar,et al.  The Detection of Fault-Prone Programs , 1992, IEEE Trans. Software Eng..

[14]  Michael R. Lyu,et al.  Handbook of software reliability engineering , 1996 .

[15]  Linda M. Ottenstein Quantitative Estimates of Debugging Requirements , 1979, IEEE Transactions on Software Engineering.

[16]  Abhijit S. Pandya,et al.  A comparative study of pattern recognition techniques for quality evaluation of telecommunications software , 1994, IEEE J. Sel. Areas Commun..

[17]  Victor R. Basili,et al.  An Empirical Study of a Syntactic Complexity Family , 1983, IEEE Transactions on Software Engineering.

[18]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[19]  Mary Shaw,et al.  Software Metrics: An Analysis and Evaluation , 1981 .

[20]  Hans-Hermann Bock,et al.  PROBABILITY MODELS AND HYPOTHESES TESTING IN PARTITIONING CLUSTER ANALYSIS , 1996 .

[21]  Taghi M. Khoshgoftaar,et al.  Predicting Software Development Errors Using Software Complexity Metrics , 1990, IEEE J. Sel. Areas Commun..

[22]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[23]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[24]  B. Everitt,et al.  Finite Mixture Distributions , 1981 .

[25]  Victor R. Basili,et al.  A Pattern Recognition Approach for Software Engineering Data Analysis , 1992, IEEE Trans. Software Eng..

[26]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[27]  Hamparsum Bozdogan MULTI-SAMPLE CLUSTER ANALYSIS AND APPROACHES TO VALIDITY STUDIES IN CLUSTERING INDIVIDUALS. , 1981 .

[28]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[29]  Adam A. Porter,et al.  Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis , 1988, IEEE Trans. Software Eng..

[30]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[31]  Vesselin Spiridonov Understanding and Controlling Software Costs - Response , 1986, IFIP Congress.

[32]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[33]  Victor R. Basili,et al.  Developing interpretable models for identifying high risk software components , 1993 .