Fault Prediction using Early Lifecycle Data

The prediction of fault-prone modules in a software project has been the topic of many studies. In this paper, we investigate whether metrics available early in the development lifecycle can be used to identify fault-prone software modules. More precisely, we build predictive models using the metrics that characterize textual requirements. We compare the performance of requirements-based models against the performance of code-based models and models that combine requirement and code metrics. Using a range of modeling techniques and the data from three NASA projects, our study indicates that the early lifecycle metrics can play an important role in project management, either by pointing to the need for increased quality monitoring during the development or by using the models to assign verification and validation activities.

[1]  Yan Ma,et al.  Adequate and Precise Evaluation of Quality Models in Software Engineering Studies , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[2]  Tim Menzies,et al.  When can we test less? , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[3]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[4]  Taghi M. Khoshgoftaar,et al.  An application of zero-inflated Poisson regression for software fault prediction , 2001, Proceedings 12th International Symposium on Software Reliability Engineering.

[5]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[6]  Robert C. Holte,et al.  Severe Class Imbalance: Why Better Algorithms Aren't the Answer , 2005, ECML.

[7]  Alaa F. Sheta,et al.  Prediction of software reliability: a comparison between regression and neural network non-parametric models , 2001, Proceedings ACS/IEEE International Conference on Computer Systems and Applications.

[8]  Brendan Murphy,et al.  Using Historical In-Process and Product Metrics for Early Estimation of Software Failures , 2006, 2006 17th International Symposium on Software Reliability Engineering.

[9]  Darrel C. Ince,et al.  A critique of three metrics , 1994, J. Syst. Softw..

[10]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Linda H. Rosenberg,et al.  Automated Analysis of Requirement Specifications , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[13]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[14]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[15]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[16]  William Wilson,et al.  Automated quality analysis of Natural Language requirement specifications , 1996 .

[17]  Norman F. Schneidewind,et al.  Investigation of logistic regression as a discriminant of software quality , 2001, Proceedings Seventh International Software Metrics Symposium.

[18]  Venkata U. B. Challagulla,et al.  A Unified Framework for Defect Data Analysis Using the MBR Technique , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[19]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[20]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[21]  Steven R. Rakitin,et al.  Software verification and validation for practitioners and managers , 2001 .

[22]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[23]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[24]  Maurice H. Halstead,et al.  Elements of software science , 1977 .