PREDICTING FAULT-PRONE SOFTWARE MODULES IN EMBEDDED SYSTEMS WITH CLASSIFICATION TREES

Embedded-computer systems have become essential to life in modern society. For example, the backbone of society's information infrastructure is telecommunications. Embedded systems must have highly reliable software, so that we avoid the severe consequences of failures, intolerable down-time, and expensive repairs in remote locations. Moreover, today's fast-moving technology marketplace mandates that embedded systems evolve, resulting in multiple software releases embedded in multiple products. Software quality models can be valuable tools for software engineering of embedded systems, because some software-enhancement techniques are so expensive or time-consuming that it is not practical to apply them to all modules. Targeting such enhancement techniques is an effective way to reduce the likelihood of faults discovered in the field. Research has shown software metrics to be useful predictors of software faults. A software quality model is developed using measurements and fault data from a past release. The calibrated model is then applied to modules currently under development. Such models yield predictions on a module-by-module basis. This paper examines the Classification And Regression Trees (CART) algorithm for building tree-based models that predict which software modules have high risk of faults to be discovered during operations. CART is attractive because it emphasizes pruning to achieve robust models. This paper presents details on the CART algorithm in the context of software engineering of embedded systems. We illustrate this approach with a case study of four consecutive releases of software embedded in a large telecommunications system. The level of accuracy achieved in the case study would be useful to developers of an embedded system. The case study indicated that this model would continue to be useful over several releases as the system evolves.

[1]  Christof Ebert,et al.  Classification techniques for metric-based software development , 1996, Software Quality Journal.

[2]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[3]  Jeff Tian,et al.  Measurement and defect modeling for a legacy software system , 1995, Ann. Softw. Eng..

[4]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[5]  Taghi M. Khoshgoftaar,et al.  A practical classification-rule for software-quality models , 2000, IEEE Trans. Reliab..

[6]  Victor R. Basili,et al.  A Pattern Recognition Approach for Software Engineering Data Analysis , 1992, IEEE Trans. Software Eng..

[7]  Barbara A. Kitchenham,et al.  A Procedure for Analyzing Unbalanced Datasets , 1998, IEEE Trans. Software Eng..

[8]  Taghi M. Khoshgoftaar,et al.  The Detection of Fault-Prone Programs , 1992, IEEE Trans. Software Eng..

[9]  Taghi M. Khoshgoftaar,et al.  Early Quality Prediction: A Case Study in Telecommunications , 1996, IEEE Softw..

[10]  Taghi M. Khoshgoftaar,et al.  Emerald: Software Metrics and Models on the Desktop , 1996, IEEE Softw..

[11]  Adam A. Porter,et al.  Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis , 1988, IEEE Trans. Software Eng..

[12]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[13]  Norman F. Schneidewind Software metrics validation: Space Shuttle flight software example , 1995, Ann. Softw. Eng..

[14]  Taghi M. Khoshgoftaar,et al.  A neural network approach for early detection of program modules having high risk in the maintenance phase , 1995, J. Syst. Softw..