A Scheme for Feature Construction and a Comparison of Empirical Methods

A class of concept learning algorithms CL augments standard similarity-based techniques by performing feature construction based on the SBL output. Pagallo and Hausslcr's FRINGE, Pagallo's extension Symmetric FRINGE (Sym-Fringe) and a refinement we call DCFringe are all instances of this class using decision trees as their underlying representation. These methods use patterns at the fringe of the tree to guide their construction, but DCFringe uses limited construction of conjunction and disjunction. Experiments with small DNF and CNF concepts show that DCFringe outperforms both the purely conjunctive FRINGE and the less restrictive SymFringe, in terms of accuracy, conciseness, and efficiency. Further, the gain of these methods is linked to the size of the training set. We discuss the apparent limitation of current methods to concepts exhibiting a low degree of feature interaction, and suggest ways to alleviate it. This leads to a feature construction approach based on a wider variety of patterns restricted by statistical measures and optional knowledge.