Learning to recognize reusable software modules using an inductive classification system

An inductive classification system that uses inductive learning techniques, namely, variations on J.R. Quinlan's (1986) ID3 system to produce a decision tree that can be used to distinguish between reusable and nonreusable code modules, is described. To accomplish that task, a set of high-level concepts used by software engineers to characterize structurally understandable code is identified. Each of these concepts is operationalized in terms of code complexity metrics that can be easily calculated during the compilation process. These metrices are related to various aspects of the program structure including its coupling, cohesion, data structure, control structure, and documentation. The decision tree produced for a sample of 81 Pascal programs outperformed similar classification efforts by a group of professional programmers.<<ETX>>