Feature extraction has traditionally been a manual process and something of an art. Methods derived from statistics and linear systems theory have been proposed, but by general consensus effective feature extraction remains a difficult problem. Recently W. Tackett (1993) showed that genetic programming (GP) can be effective in automatically constructing features for identifying potential targets in digital images with high accuracy. From a basis set of simple arithmetic functions, he was able to construct numerical features that outperformed manually-constructed features when used as inputs to several classifiers, including a binary-tree classifier and a multi-layer perceptron trained by back-propagation. Seeking a more generic feature-construction procedure, we developed a GP-based algorithm to extract features in a variety of domains and for most classification methods, including decision trees, feed-forward neural networks, and Bayesian classifiers. We have tested the technique with success by extracting features for three different types of problems: Boolean functions with binary features, a NASA telemetry problem with multiple classes and real-valued time-series inputs, and a wine variety classification problem with real-valued features from the UCI Machine Learning repository. We formally define the feature-construction method and show in some detail how it applies to specific classification problems.<<ETX>>
[1]
J. Ross Quinlan,et al.
Learning Efficient Classification Procedures and Their Application to Chess End Games
,
1983
.
[2]
Laveen N. Kanal,et al.
Problem-Solving Models and Search Strategies for Pattern Recognition
,
1979,
IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3]
David Haussler,et al.
Learnability and the Vapnik-Chervonenkis dimension
,
1989,
JACM.
[4]
Hitoshi Iba,et al.
Genetic programming using a minimum description length principle
,
1994
.
[5]
HausslerDavid,et al.
Boolean Feature Discovery in Empirical Learning
,
1990
.
[6]
Walter Alden Tackett,et al.
Genetic Programming for Feature Discovery and Image Discrimination
,
1993,
ICGA.