Software metric classification trees help guide the maintenance of large-scale systems

The 80:20 rule states that approximately 20% of a software system is responsible for 80% of its errors. The authors propose an automated method for generating empirically-based models of error-prone software objects. These models are intended to help localize the troublesome 20%. The method uses a recursive algorithm to automatically generate classification trees whose nodes are multivalued functions based on software metrics. The purpose of the classification trees is to identify components that are likely to be error prone or costly, so that developers can focus their resources accordingly. A feasibility study was conducted using 16 NASA projects. On average, the classification trees correctly identified 79.3% of the software modules that had high development effort or faults.<<ETX>>