Improving tree-based models of software quality with principal components analysis

Software quality classification models can predict which modules are to be considered fault-prone, and which are not, based on software product metrics, process metrics and execution metrics. Such predictions can be used to target improvement efforts to those modules that need them the most. Classification-tree modeling is a robust technique for building such software quality models. However, the model structure may be unstable, and accuracy may suffer when the predictors are highly correlated. This paper presents an empirical case study of four releases of a very large telecommunications system, which shows that the tree-based models can be improved by transforming the predictors with principal components analysis, so that the transformed predictors are not correlated. The case study used the regression-tree algorithm in the S-Plus package and then applied a general decision rule to classify the modules.

[1]  Norman F. Schneidewind Software metrics validation: Space Shuttle flight software example , 1995, Ann. Softw. Eng..

[2]  Daryl Pregibon,et al.  Tree-based models , 1992 .

[3]  Swapna S. Gokhale,et al.  Regression Tree Modeling For The Prediction Of Software Quality , 1997 .

[4]  Taghi M. Khoshgoftaar,et al.  Data Mining for Predictors of Software Quality , 1999, Int. J. Softw. Eng. Knowl. Eng..

[5]  Barbara A. Kitchenham,et al.  A Procedure for Analyzing Unbalanced Datasets , 1998, IEEE Trans. Software Eng..

[6]  Taghi M. Khoshgoftaar,et al.  Preparing measurements of legacy software for predicting operational faults , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[7]  Taghi M. Khoshgoftaar,et al.  LOGISTIC REGRESSION MODELING OF SOFTWARE QUALITY , 1999 .

[8]  Victor R. Basili,et al.  Developing Interpretable Models with Optimized Set Reduction for Identifying High-Risk Software Components , 1993, IEEE Trans. Software Eng..

[9]  Jeff Tian,et al.  Measurement and defect modeling for a legacy software system , 1995, Ann. Softw. Eng..

[10]  Taghi M. Khoshgoftaar,et al.  The dimensionality of program complexity , 1989, ICSE '89.

[11]  Yoichi Muraoka,et al.  Building software quality classification trees: approach, experimentation, evaluation , 1997, Proceedings The Eighth International Symposium on Software Reliability Engineering.

[12]  U. M. Feyyad Data mining and knowledge discovery: making sense out of data , 1996 .

[13]  Taghi M. Khoshgoftaar,et al.  Emerald: Software Metrics and Models on the Desktop , 1996, IEEE Softw..

[14]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[15]  Taghi M. Khoshgoftaar,et al.  MODELING SOFTWARE QUALITY WITH CLASSIFICATION TREES , 2001 .

[16]  Taghi M. Khoshgoftaar,et al.  A practical classification-rule for software-quality models , 2000, IEEE Trans. Reliab..

[17]  John C. Munson,et al.  Determining fault insertion rates for evolving software systems , 1998, Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257).

[18]  Taghi M. Khoshgoftaar,et al.  Accuracy of software quality models over multiple releases , 2000, Ann. Softw. Eng..

[19]  Taghi M. Khoshgoftaar,et al.  A neural network approach for early detection of program modules having high risk in the maintenance phase , 1995, J. Syst. Softw..

[20]  Norman F. Schneidewind Software metrics model for integrating quality control and prediction , 1997, Proceedings The Eighth International Symposium on Software Reliability Engineering.

[21]  Adam A. Porter,et al.  Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis , 1988, IEEE Trans. Software Eng..

[22]  Taghi M. Khoshgoftaar,et al.  Early Quality Prediction: A Case Study in Telecommunications , 1996, IEEE Softw..

[23]  Taghi M. Khoshgoftaar,et al.  Application of a usage profile in software quality models , 1999, Proceedings of the Third European Conference on Software Maintenance and Reengineering (Cat. No. PR00090).

[24]  Taghi M. Khoshgoftaar,et al.  Using Classification Trees for Software Quality Models: Lessons Learned , 1999, Int. J. Softw. Eng. Knowl. Eng..