How to aggregate software metrics?

Maintaining a software system resembles renovating a house: it usually takes longer and costs more than planned. Similarly to a house owner identifying potential problems before renovation, a software owner should assess maintainability of software before renovating or extending it. To measure maintainability one often applies metrics, associating software artifacts with numbers. Unfortunately, metrics are commonly measured at method or class level, and fail to provide an adequate picture of the entire system maintainability. Continuing the analogy, metrics detail the state of every brick but obscure the assessment in the multitude of details. To see the forest of a software system for the trees of individual measurements, one uses aggregation techniques such as the mean, median, sum, or, recently, Gini, Theil, Kolm, Atkinson, and Hoover indices. A formal comparison of these techniques has been missing until now. We present an extensive correlation study of the aforementioned techniques, applied to size (e.g., number of lines of code, semicolons, or statements) and complexity (e.g., percentage of branching statements, depth of inheritance tree, or number of children) metrics. We conducted an empirical evaluation on the 106 open source Java systems comprising the Qualitas Corpus. We observed, e.g., that size and complexity metrics aggregated by Gini, Theil, Hoover, and Atkinson strongly correlate, while mean and Kolm correlate on size but not on complexity metrics [1]. Based on our study a software owner can chose appropriate aggregation technique depending on, e.g., presence of negative values, or relative importance of high/low values.

[1]  Alexander Serebrenik,et al.  You can't control the unfamiliar: A study on the relations between aggregation techniques for software metrics , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).