Effects of measurements on correlations of software code metrics

ContextSoftware metrics play a significant role in many areas in the life-cycle of software including forecasting defects and foretelling stories regarding maintenance, cost, etc. through predictive analysis. Many studies have found code metrics correlated to each other at such a high level that such correlated code metrics are considered redundant, which implies it is enough to keep track of a single metric from a list of highly correlated metrics.ObjectiveSoftware is developed incrementally over a period. Traditionally, code metrics are measured cumulatively as cumulative sum or running sum. When a code metric is measured based on the values from individual revisions or commits without consolidating values from past revisions, indicating the natural development of software, this study identifies such a type of measure as organic. Density and average are two other ways of measuring metrics. This empirical study focuses on whether measurement types influence correlations of code metrics.MethodTo investigate the objective, this empirical study has collected 24 code metrics classified into four categories, according to the measurement types of the metrics, from 11,874 software revisions (i.e., commits) of 21 open source projects from eight well-known organizations. Kendall’s τ-B is used for computing correlations. To determine whether there is a significant difference between cumulative and organic metrics, Mann-Whitney U test, Wilcoxon signed rank test, and paired-samples sign test are performed.ResultsThe cumulative metrics are found to be highly correlated to each other with an average coefficient of 0.79. For corresponding organic metrics, it is 0.49. When individual correlation coefficients between these two measure types are compared, correlations between organic metrics are found to be significantly lower (with p < 0.01) than cumulative metrics. Our results indicate that the cumulative nature of metrics makes them highly correlated, implying cumulative measurement is a major source of collinearity between cumulative metrics. Another interesting observation is that correlations between metrics from different categories are weak.ConclusionsResults of this study reveal that measurement types may have a significant impact on the correlations of code metrics and that transforming metrics into a different type can give us metrics with low collinearity. These findings provide us a simple understanding how feature transformation to a different measurement type can produce new non-collinear input features for predictive models.

[1]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[2]  Dirk Riehle,et al.  The Total Growth of Open Source , 2008, OSS.

[3]  Steve Counsell,et al.  Power law distributions in class relationships , 2003, Proceedings Third IEEE International Workshop on Source Code Analysis and Manipulation.

[4]  Alexander Serebrenik,et al.  Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods and C functions , 2016, J. Softw. Evol. Process..

[5]  Diomidis Spinellis,et al.  Power laws in software , 2008, TSEM.

[6]  Jean-Louis Letouzey,et al.  Managing Technical Debt with the SQALE Method , 2012, IEEE Software.

[7]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[8]  Laurie A. Williams,et al.  Validating software metrics: A spectrum of philosophies , 2012, TSEM.

[9]  Per Runeson,et al.  Guidelines for conducting and reporting case study research in software engineering , 2009, Empirical Software Engineering.

[10]  Joseph Gil,et al.  On the correlation between size and metric validity , 2017, Empirical Software Engineering.

[11]  David P. Darcy,et al.  Managerial Use of Metrics for Object-Oriented Software: An Exploratory Analysis , 1998, IEEE Trans. Software Eng..

[12]  Miroslaw Staron,et al.  Unveiling anomalies and their impact on software quality in model-based automotive software revisions with software metrics and domain experts , 2016, ISSTA.

[13]  M. Kendall,et al.  Rank Correlation Methods (5th ed.). , 1992 .

[14]  Yuanfang Cai,et al.  A Case Study in Locating the Architectural Roots of Technical Debt , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[15]  Yeung Sam Hung,et al.  A comparative analysis of Spearman's rho and Kendall's tau in normal and contaminated normal models , 2013, Signal Process..

[16]  Andy P. Field,et al.  Discovering Statistics Using SPSS , 2000 .

[17]  R. Brereton,et al.  Crucial problems in regression modelling and their solutions. , 2002, The Analyst.

[18]  Janice Singer,et al.  Studying Software Engineers: Data Collection Techniques for Software Field Studies , 2005, Empirical Software Engineering.

[19]  Michele Marchesi,et al.  Power-Laws in a Large Object-Oriented Software System , 2007, IEEE Transactions on Software Engineering.

[20]  G. Shevlyakov,et al.  Robustness in Data Analysis: Criteria and Methods , 2001 .

[21]  Paul W. Oman,et al.  Using metrics to evaluate software system maintainability , 1994, Computer.

[22]  Norman F. Schneidewind,et al.  A Methodology for Validating Software Product Metrics , 2000 .

[23]  Roberto da Silva Bigonha,et al.  Identifying thresholds for object-oriented software metrics , 2012, J. Syst. Softw..

[24]  Yahya M. Tashtoush,et al.  The Correlation among Software Complexity Metrics with Case Study , 2014, ArXiv.

[25]  Rupinder Singh,et al.  Better utilization of correlation between metrics using Principal Component Analysis (PCA) , 2015, 2015 Annual IEEE India Conference (INDICON).

[26]  Richard Taylor Interpretation of the Correlation Coefficient: A Basic Review , 1990 .

[27]  Nicholas A. Kraft,et al.  Cyclomatic Complexity and Lines of Code: Empirical Evidence of a Stable Linear Relationship , 2009, J. Softw. Eng. Appl..

[28]  Ewan D. Tempero,et al.  Understanding the shape of Java software , 2006, OOPSLA '06.

[29]  Witold Pedrycz,et al.  An Empirical Exploration of the Distributions of the Chidamber and Kemerer Object-Oriented Metrics Suite , 2004, Empirical Software Engineering.

[30]  H. Park Univariate Analysis and Normality Test Using SAS, STATA, and SPSS , 2015 .

[31]  Hae-Young Kim Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and kurtosis , 2013, Restorative dentistry & endodontics.

[32]  Catherine Dehon,et al.  Influence functions of the Spearman and Kendall correlation measures , 2010, Stat. Methods Appl..

[33]  Ahmed E. Hassan,et al.  Understanding the impact of code and process metrics on post-release defects: a case study on the Eclipse project , 2010, ESEM '10.

[34]  Sallie M. Henry,et al.  Predicting source-code complexity at the design stage , 1990, IEEE Software.

[35]  Andrea Janes,et al.  A Continuous Software Quality Monitoring Approach for Small and Medium Enterprises , 2017, ICPE Companion.

[36]  Yuming Zhou,et al.  Examining the Potentially Confounding Effect of Class Size on the Associations between Object-Oriented Metrics and Change-Proneness , 2009, IEEE Transactions on Software Engineering.

[37]  M. Kendall,et al.  Rank Correlation Methods , 1949 .

[38]  Laura Lehtola,et al.  The challenge of release planning , 2011, 2011 Fifth International Workshop on Software Product Management (IWSPM).

[39]  Rüdiger Lincke,et al.  Comparing software metrics tools , 2008, ISSTA '08.

[40]  Xianggui Qu,et al.  Multivariate Data Analysis , 2007, Technometrics.

[41]  Mehwish Riaz,et al.  A systematic review of software maintainability prediction and metrics , 2009, ESEM 2009.

[42]  Meine van der Meulen,et al.  Correlations between Internal Software Metrics and Software Dependability in a Large Population of Small C/C++ Programs , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[43]  D. Kafura,et al.  On the relationships among three software metrics , 1981, Measurement and evaluation of software quality.

[44]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[45]  Christian Berger,et al.  Correlations of software code metrics: an empirical study , 2017, IWSM-Mensura.