An investigation of the relationships between lines of code and defects

It is always desirable to understand the quality of a software system based on static code metrics. In this paper, we analyze the relationships between Lines of Code (LOC) and defects (including both pre-release and post-release defects). We confirm the ranking ability of LOC discovered by Fenton and Ohlsson. Furthermore, we find that the ranking ability of LOC can be formally described using Weibull functions. We can use defect density values calculated from a small percentage of largest modules to predict the number of total defects accurately. We also find that, given LOC we can predict the number of defective components reasonably well using typical classification techniques. We perform an extensive experiment using the public Eclipse dataset, and replicate the study using the NASA dataset. Our results confirm that simple static code attributes such as LOC can be useful predictors of software quality.

[1]  R. Ramakumar Engineering Reliability: Fundamentals and Applications , 1996 .

[2]  Hui Zhang Research about Software Fault Injection Technology Based on Distributed System , 2010, 2010 International Conference on Machine Vision and Human-machine Interface.

[3]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[4]  Khaled El Emam,et al.  The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics , 2001, IEEE Trans. Software Eng..

[5]  Hongyu Zhang On the Distribution of Software Faults , 2008, IEEE Transactions on Software Engineering.

[6]  Per Runeson,et al.  A Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems , 2007, IEEE Transactions on Software Engineering.

[7]  Xiuzhen Zhang,et al.  Comments on "Data Mining Static Code Attributes to Learn Defect Predictors" , 2007, IEEE Trans. Software Eng..

[8]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[9]  Xiuzhen Zhang,et al.  Predicting Defective Software Components from Code Complexity Measures , 2007 .

[10]  Taghi M. Khoshgoftaar,et al.  The necessity of assuring quality in software measurement data , 2004, 10th International Symposium on Software Metrics, 2004. Proceedings..

[11]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[12]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[13]  Ian Witten,et al.  Data Mining , 2000 .

[14]  Hongfang Liu,et al.  Building effective defect-prediction models in practice , 2005, IEEE Software.

[15]  Carol Withrow,et al.  Prediction and control of ADA software defects , 1990, J. Syst. Softw..

[16]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[17]  Tim Menzies,et al.  Assessing Predictors of Software Defects , 2004 .

[18]  Ingunn Myrtveit,et al.  Reliability and validity in comparative studies of software prediction models , 2005, IEEE Transactions on Software Engineering.

[19]  Martin Shepperd,et al.  Derivation and Validation of Software Metrics , 1993 .

[20]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[21]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[22]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[23]  Hongyu Zhang,et al.  An Empirical Study of Class Sizes for Large Java Systems , 2007, 14th Asia-Pacific Software Engineering Conference (APSEC'07).