Reading Beside the Lines: Indentation as a Proxy for Complexity Metric

Maintainers face the daunting task of wading through a collection of both new and old revisions, trying to ferret out revisions which warrant personal inspection. One can rank revisions by size/lines of code (LOC), but often, due to the distribution of the size of changes, revisions will be of similar size. If we can't rank revisions by LOC perhaps we can rank by Halstead's and McCabe's complexity metrics? However, these metrics are problematic when applied to code fragments (revisions) written in multiple languages: special parsers are required which may not support the language or dialect used; analysis tools may not understand code fragments. We propose using the statistical moments of indentation as a lightweight, language independent, revision/diff friendly metric which actually proxies classical complexity metrics. We have extensively evaluated our approach against the entire CVS histories of the 278 of the most popular and most active SourceForge projects. We found that our results are linearly correlated and rank-correlated with traditional measures of complexity, suggesting that measuring indentation is a cheap and accurate proxy for code complexity of revisions. Thus ranking revisions by the standard deviation and summation of indentation will be very similar to ranking revisions by complexity.

[1]  Jesús M. González-Barahona,et al.  Towards a Theoretical Model for Software Growth , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[2]  Paul W. Oman,et al.  Using metrics to evaluate software system maintainability , 1994, Computer.

[3]  Daniel M. Germán,et al.  Measuring fine-grained change in software: towards modification-aware change metrics , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[4]  Xin Chen,et al.  Shared information and program plagiarism detection , 2004, IEEE Transactions on Information Theory.

[5]  Donia Scott,et al.  Document Structure , 2003, CL.

[6]  James J. Filliben,et al.  NIST/SEMATECH e-Handbook of Statistical Methods; Chapter 1: Exploratory Data Analysis , 2003 .

[7]  Curtis R. Cook,et al.  Typographic style is more than cosmetic , 1990, CACM.

[8]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[9]  R. E. Berry,et al.  A style analysis of C programs , 1985, CACM.

[10]  Narasimhaiah Gorla,et al.  Debugging Effort Estimation Using Software Metrics , 1990, IEEE Trans. Software Eng..

[11]  Paul W. Oman,et al.  Construction and testing of polynomials predicting software maintainability , 1994, J. Syst. Softw..

[12]  Ben Shneiderman,et al.  Program indentation and comprehensibility , 1983, CACM.

[13]  Damian Conway,et al.  Perl Best Practices , 2005 .

[14]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[15]  Robert F. Mathis,et al.  Flow trace of a structured program , 1975, SIGP.

[16]  Maurice H. Halstead,et al.  Elements of software science (Operating and programming systems series) , 1977 .