Cyclomatic Complexity and Lines of Code: Empirical Evidence of a Stable Linear Relationship

Researchers have often commented on the high correlation between McCabe’s Cyclomatic Complexity (CC) and lines of code (LOC). Many have believed this correlation high enough to justify adjusting CC by LOC or even substituting LOC for CC. However, from an empirical standpoint the relationship of CC to LOC is still an open one. We undertake the largest statistical study of this relationship to date. Employing modern regression techniques, we find the linearity of this relationship has been severely underestimated, so much so that CC can be said to have absolutely no explanatory power of its own. This research presents evidence that LOC and CC have a stable practically perfect linear relationship that holds across programmers, languages, code paradigms (procedural versus object-oriented), and software processes. Linear models are developed relating LOC and CC. These models are verified against over 1.2 million randomly selected source files from the SourceForge code repository. These files represent software projects from three target languages (C, C++, and Java) and a variety of programmer experience levels, software architectures, and development methodologies. The models developed are found to successfully predict roughly 90% of CC’s variance by LOC alone. This suggest not only that the linear relationship between LOC and CC is stable, but the aspects of code complexity that CC measures, such as the size of the test case space, grow linearly with source code size across languages and programming paradigms.

[1]  A. Siegel Robust regression using repeated medians , 1982 .

[2]  Tim Menzies,et al.  The \{PROMISE\} Repository of Software Engineering Databases. , 2005 .

[3]  Taghi M. Khoshgoftaar,et al.  The dimensionality of program complexity , 1989, ICSE '89.

[4]  Horst Zuse,et al.  A Framework of Software Measurement , 1998 .

[5]  Taghi M. Khoshgoftaar,et al.  The Detection of Fault-Prone Programs , 1992, IEEE Trans. Software Eng..

[6]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation , 1993 .

[7]  Wei Li An empirical study of software reuse in reconstructive maintenance , 1997 .

[8]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[9]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[10]  Martin R. Woodward,et al.  A Measure of Control Flow Complexity in Program Text , 1979, IEEE Transactions on Software Engineering.

[11]  Bill Curtis,et al.  Third time charm: Stronger prediction of programmer performance by software complexity metrics , 1979, ICSE 1979.

[12]  Chris F. Kemerer,et al.  Cyclomatic Complexity Density and Software Maintenance Productivity , 1991, IEEE Trans. Software Eng..

[13]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation0 , 1984, CACM.

[14]  J. B. Bowen Are current approaches sufficient for measuring software quality? , 1978 .

[15]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[16]  F. George Wilkie,et al.  Measuring Complexity in C++ Application Software , 1998, Softw. Pract. Exp..

[17]  Rajiv D. Banker,et al.  Software complexity and maintenance costs , 1993, CACM.

[18]  Alan R. Feuer,et al.  Some Results from an Empirical Study of Computer Software , 1979, ICSE.

[19]  T. Breurch,et al.  A simple test for heteroscedasticity and random coefficient variation (econometrica vol 47 , 1979 .

[20]  M. Shepperd,et al.  A critique of cyclomatic complexity as a software metric , 1988, Softw. Eng. J..