Empirical Analysis of the Relationship between CC and SLOC in a Large Corpus of Java Methods

Measuring the internal quality of source code is one of the traditional goals of making software development into an engineering discipline. Cyclomatic Complexity (CC) is an often used source code quality metric, next to Source Lines of Code (SLOC). However, the use of the CC metric is challenged by the repeated claim that CC is redundant with respect to SLOC due to strong linear correlation. We test this claim by studying a corpus of 17.8M methods in 13K open-source Java projects. Our results show that direct linear correlation between SLOC and CC is only moderate, as caused by high variance. We observe that aggregating CC and SLOC over larger units of code improves the correlation, which explains reported results of strong linear correlation in literature. We suggest that the primary cause of correlation is the aggregation. Our conclusion is that there is no strong linear correlation between CC and SLOC of Java methods, so we do not conclude that CC is redundant with SLOC. This conclusion contradicts earlier claims from literature, but concurs with the widely accepted practice of measuring of CC next to SLOC.

[1]  Sallie M. Henry,et al.  Predicting source-code complexity at the design stage , 1990, IEEE Software.

[2]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[3]  Bill Curtis,et al.  Seven/spl plusmn/two software measurement conundrums , 1994, Proceedings of 1994 IEEE 2nd International Software Metrics Symposium.

[4]  K. Vairavan,et al.  An Experimental Investigation of Software Metrics and Their Relationship to Software Development Effort , 1989, IEEE Trans. Software Eng..

[5]  Trevor T. Moores Applying complexity measures to rule-based prolog programs , 1998, J. Syst. Softw..

[6]  Mario Cortina-Borja,et al.  Handbook of Parametric and Nonparametric Statistical Procedures, 5th edn , 2012 .

[7]  Glenford J. Myers,et al.  An extension to the cyclomatic measure of program complexity , 1977, SIGP.

[8]  Alan R. Feuer,et al.  Some Results from an Empirical Study of Computer Software , 1979, ICSE.

[9]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[10]  Darrel C. Ince,et al.  A critique of three metrics , 1994, J. Syst. Softw..

[11]  Michael B. O'Neal An empirical study of three common software complexity measures , 1993, SAC '93.

[12]  Stefano Tarantola,et al.  Sensitivity Analysis as an Ingredient of Modeling , 2000 .

[13]  Martin R. Woodward,et al.  A Measure of Control Flow Complexity in Program Text , 1979, IEEE Transactions on Software Engineering.

[14]  D. Joanes,et al.  Comparing measures of sample skewness and kurtosis , 1998 .

[15]  Yahya M. Tashtoush,et al.  The Correlation among Software Complexity Metrics with Case Study , 2014, ArXiv.

[16]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[17]  Barbara A. Kitchenham,et al.  Towards a constructive quality model. Part 2: Statistical techniques for modelling software quality in the ESPRIT REQUEST project , 1987, Softw. Eng. J..

[18]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[19]  D. Landman A Large Corpus of C Source Code based on Gentoo packages , 2015 .

[20]  Changyong Feng,et al.  Log transformation: application and interpretation in biomedical research , 2013, Statistics in medicine.

[21]  Alexander Serebrenik,et al.  By no means: a study on aggregating software metrics , 2011, WETSoM '11.

[22]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation0 , 1984, CACM.

[23]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[24]  Rüdiger Lincke,et al.  Comparing software metrics tools , 2008, ISSTA '08.

[25]  SYLVIA B. SHEPPARD,et al.  First-year results from a research program on human factors in software engineering , 1979, 1979 International Workshop on Managing Requirements Knowledge (MARK).

[26]  Michael W. Godfrey,et al.  Reading Beside the Lines: Indentation as a Proxy for Complexity Metric , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[27]  Anneliese Amschler Andrews,et al.  Program Comprehension During Software Maintenance and Evolution , 1995, Computer.

[28]  T. Breurch,et al.  A simple test for heteroscedasticity and random coefficient variation (econometrica vol 47 , 1979 .

[29]  Foutse Khomh,et al.  Understanding the impact of rapid releases on software quality , 2015, Empirical Software Engineering.

[30]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[31]  M. Shepperd,et al.  A critique of cyclomatic complexity as a software metric , 1988, Softw. Eng. J..

[32]  Premkumar T. Devanbu,et al.  Ecological inference in empirical software engineering , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[33]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[34]  Jing Liu,et al.  A Hybrid Set of Complexity Metrics for Large-Scale Object-Oriented Software Systems , 2010, Journal of Computer Science and Technology.

[35]  Premkumar T. Devanbu,et al.  Sample size vs. bias in defect prediction , 2013, ESEC/FSE 2013.

[36]  Sven Apel,et al.  Views on Internal and External Validity in Empirical Software Engineering , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[37]  Anastasia Izmaylova,et al.  M3: An Open Model for Measuring Code Artifacts , 2013, ArXiv.

[38]  Paul Klint,et al.  M3: A general model for code analytics in rascal , 2015, 2015 IEEE 1st International Workshop on Software Analytics (SWAN).

[39]  Chris F. Kemerer,et al.  Determinants of software maintenance profiles: an empirical investigation , 1997, J. Softw. Maintenance Res. Pract..

[40]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[41]  Alberto Sillitti,et al.  Fault-Proneness Estimation and Java Migration: A Preliminary Case Study , 2009 .

[42]  Carlo Ghezzi,et al.  An empirical investigation into a large-scale Java open source code repository , 2010, ESEM '10.

[43]  Jesús M. González-Barahona,et al.  Towards a Theoretical Model for Software Growth , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[44]  Daniel B. Carr,et al.  Scatterplot matrix techniques for large N , 1986 .

[45]  Premkumar T. Devanbu,et al.  A large scale study of programming languages and code quality in github , 2014, SIGSOFT FSE.

[46]  Joost Visser,et al.  A Practical Model for Measuring Maintainability , 2007, 6th International Conference on the Quality of Information and Communications Technology (QUATIC 2007).

[47]  W. Manning,et al.  The logged dependent variable, heteroscedasticity, and the retransformation problem. , 1998, Journal of health economics.

[48]  Nicholas A. Kraft,et al.  Cyclomatic Complexity and Lines of Code: Empirical Evidence of a Stable Linear Relationship , 2009, J. Softw. Eng. Appl..

[49]  Dror G. Feitelson,et al.  High-MCC Functions in the Linux Kernel , 2012, 2012 20th IEEE International Conference on Program Comprehension (ICPC).

[50]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation , 1993 .

[51]  S. Henry,et al.  A methodology for integrating maintainability using software metrics , 1989, Proceedings. Conference on Software Maintenance - 1989.

[52]  Angélica Caro,et al.  A Probabilistic Approach to Web Portal's Data Quality Evaluation , 2007 .

[53]  Jeffrey C. Carver,et al.  The role of replications in Empirical Software Engineering , 2008, Empirical Software Engineering.

[54]  Narasimhaiah Gorla,et al.  Debugging Effort Estimation Using Software Metrics , 1990, IEEE Trans. Software Eng..

[55]  Bill Curtis,et al.  Third time charm: Stronger prediction of programmer performance by software complexity metrics , 1979, ICSE 1979.

[56]  Alok Mishra,et al.  An Empirical Study of Lehman's Law on Software Quality Evolution , 2013, Int. J. Softw. Informatics.

[57]  Meine van der Meulen,et al.  Correlations between Internal Software Metrics and Software Dependability in a Large Population of Small C/C++ Programs , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[58]  Michael W. Godfrey,et al.  What Does Control Flow Really Look Like? Eyeballing the Cyclomatic Complexity Metric , 2012, 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation.

[59]  D. Kafura,et al.  On the relationships among three software metrics , 1981, Measurement and evaluation of software quality.

[60]  Ahmed E. Hassan,et al.  Beyond Lines of Code: Do We Need More Complexity Metrics? , 2011, Making Software.

[61]  Matthew A. Johnson,et al.  A Study of Scala Repositories on Github , 2014 .

[62]  Claes Wohlin,et al.  Guidelines for snowballing in systematic literature studies and a replication in software engineering , 2014, EASE '14.

[63]  Craig Loehle,et al.  Proper Statistical Treatment of Species-Area Data , 1990 .

[64]  Chris F. Kemerer,et al.  Cyclomatic Complexity Density and Software Maintenance Productivity , 1991, IEEE Trans. Software Eng..

[65]  Joost Visser,et al.  Standardized code quality benchmarking for improving software maintainability , 2011, Software Quality Journal.

[66]  Khaled El Emam,et al.  The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics , 2001, IEEE Trans. Software Eng..

[67]  Norman F. Schneidewind,et al.  An Experiment in Software Error Data Collection and Analysis , 1979, IEEE Transactions on Software Engineering.

[68]  Alexander Serebrenik,et al.  You can't control the unfamiliar: A study on the relations between aggregation techniques for software metrics , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[69]  Sushil Krishna Bajracharya,et al.  Sourcerer: mining and searching internet-scale software repositories , 2008, Data Mining and Knowledge Discovery.

[70]  Juan Fernández-Ramil,et al.  A model to predict anti-regressive effort in Open Source Software , 2007, 2007 IEEE International Conference on Software Maintenance.

[71]  S. Edgell,et al.  Effect of violation of normality on the t test of the correlation coefficient. , 1984 .

[72]  Jeff Tian,et al.  Measurement and defect modeling for a legacy software system , 1995, Ann. Softw. Eng..

[73]  Adam A. Porter,et al.  Empirical studies of software engineering: a roadmap , 2000, ICSE '00.

[74]  D. Landman A Curated Corpus of Java Source Code based on Sourcerer (2014) , 2014 .

[75]  Tijs van der Storm,et al.  RASCAL: A Domain Specific Language for Source Code Analysis and Manipulation , 2009, 2009 Ninth IEEE International Working Conference on Source Code Analysis and Manipulation.

[76]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[77]  Nicolas Anquetil,et al.  Software quality metrics aggregation in industry , 2013, J. Softw. Evol. Process..

[78]  Norman F. Schneidewind Software reliability engineering process , 2006, Innovations in Systems and Software Engineering.

[79]  Claes Wohlin,et al.  On the reliability of mapping studies in software engineering , 2013, J. Syst. Softw..

[80]  Klaas Gerrit van den Berg,et al.  Software measurement and functional programming , 1995 .

[81]  Alain Abran Cyclomatic Complexity Number: Analysis of Its Design , 2010 .

[82]  Giancarlo Succi,et al.  Analysis of the Effects of Software Reuse on Customer Satisfaction in an RPG Environment , 2001, IEEE Trans. Software Eng..

[83]  Taghi M. Khoshgoftaar,et al.  Measurement of data structure complexity , 1993, J. Syst. Softw..

[84]  H. F. Li,et al.  An Empirical Study of Software Metrics , 1987, IEEE Transactions on Software Engineering.

[85]  Cornelio Yáñez-Márquez,et al.  Software development effort estimation using fuzzy logic: a case study , 2005, Sixth Mexican International Conference on Computer Science (ENC'05).

[86]  Takeshi Sunohara,et al.  Program complexity measure for software development management , 1981, ICSE '81.