An investigation of the effect of module size on defect prediction using static measures

We used several machine learning algorithms to predict the d ef ctive modules in five NASA products, namely, CM1, JM1, KC1, KC2, and PC1. A set of static measures were used as predictor variables. While doing so, we observed that a large porti on of the modules were small, as measured by lines of code (LOC). When we experimented on the data subsets created by partitio ning according to module size, we obtained higher prediction per formance for the subsets that include larger modules. We also pe rformed defect prediction using class-level data for KC1 rat her han method-level data. In this case, the use of class-level data resulted in improved prediction performance compared to using metho dlevel data. These findings suggest that quality assurance ac tivities can be guided even better if defect predictions are made by us ing data that belong to larger modules.