An analysis of static metrics and faults in C software

Abstract In this empirical study, we evaluate the extent to which a set of software measures are correlated with the number of faults and the total estimated repair effort for a large software system. The measures we use are basic counts reflecting program size and structure and metrics proposed by McCabe and Halstead. The effect of program size has a major influence on these metrics, and we present a suitable method of adjusting the metrics for size. In modeling faults or repair effort as a function of one variable, a number of measures individually explain approximately one-quarter of the variation observed in the fault data. No one measure does significantly better than size in explaining the variation in faults found across software units, and thus multiple variable models are necessary to find metrics of importance in addition to program size. The “best” multivariate model explains approximately one-half the variation in the fault data. The metrics included in this model (in addition to size) are: the ratio of block comments to total lines of code, the number of decisions per function, and the relative vocabulary of program variables and operators. These metrics have potential for future use in the quality control of software.