Predicting Fault Incidence Using Software Change History

This paper is an attempt to understand the processes by which software ages. We define code to be aged or decayed if its structure makes it unnecessarily difficult to understand or change and we measure the extent of decay by counting the number of faults in code in a period of time. Using change management data from a very large, long-lived software system, we explore the extent to which measurements from the change history are successful in predicting the distribution over modules of these incidences of faults. In general, process measures based on the change history are more useful in predicting fault rates than product metrics of the code: For instance, the number of times code has been changed is a better indication of how many faults it will contain than is its length. We also compare the fault rates of code of various ages, finding that if a module is, on the average, a year older than an otherwise similar module, the older module will have roughly a third fewer faults. Our most successful model measures the fault potential of a module as the sum of contributions from all of the times the module has been changed, with large, recent changes receiving the most weight.

[1]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[2]  Niclas Ohlsson,et al.  Predicting Fault-Prone Software Modules in Telephone Switches , 1996, IEEE Trans. Software Eng..

[3]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[4]  George J. Schick,et al.  An Analysis of Competing Software Reliability Models , 1978, IEEE Transactions on Software Engineering.

[5]  R. Tibshirani,et al.  An Introduction to the Bootstrap , 1995 .

[6]  Siba N. Mohanty,et al.  Models and Measurements for Quality Assessment of Software , 1979, CSUR.

[7]  Hoang Pham Software Reliability , 1999 .

[8]  Michael R. Lyu,et al.  What is software reliability? , 1994, Proceedings of COMPASS'94 - 1994 IEEE 9th Annual Conference on Computer Assurance.

[9]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation0 , 1984, CACM.

[10]  Stephen G. Eick,et al.  Estimating software fault content before coding , 1992, International Conference on Software Engineering.

[11]  Horst Zuse,et al.  Software complexity: Measures and methods , 1990 .

[12]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation , 1993 .

[13]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[14]  Tze-Jie Yu,et al.  An Analysis of Several Software Defect Models , 1988, IEEE Trans. Software Eng..

[15]  Marc J. Rochkind,et al.  The source code control system , 1975, IEEE Transactions on Software Engineering.

[16]  Maurice H. Halstead,et al.  Elements of software science (Operating and programming systems series) , 1977 .

[17]  Norman F. Schneidewind,et al.  An Experiment in Software Error Data Collection and Analysis , 1979, IEEE Transactions on Software Engineering.

[18]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[19]  Dennis A. Christenson,et al.  Estimating the fault content of software using the fix-on-fix model , 1996 .

[20]  Les Hatton,et al.  Reexamining the Fault Density-Component Size Connection , 1997, IEEE Softw..

[21]  Taghi M. Khoshgoftaar,et al.  Regression modelling of software quality: empirical investigation☆ , 1990 .

[22]  Z. Jelinski,et al.  Software reliability Research , 1972, Statistical Computer Performance Evaluation.

[23]  Tze-Jie Yu,et al.  Identifying Error-Prone Software—An Empirical Study , 1985, IEEE Transactions on Software Engineering.

[24]  Meir M. Lehman,et al.  Program evolution: processes of software change , 1985 .