Constructing high order perceptrons with genetic algorithms

Constructive induction, which is defined to be the process of constructing new and useful features from existing ones, has been extensively studied in the literature. Since the number of possible high order features for any given learning problem is exponential in the number of input attributes (where the order of a feature is defined to be the number of attributes of which it is composed), the main problem faced by constructive induction is in selecting which features to use out of this exponentially large set of potential features. For any feature set chosen the desirable characteristics are minimality and generalization performance. The paper uses a combination of genetic algorithms and linear programming techniques to generate feature sets. The genetic algorithm searches for higher order features while at the same time seeking to minimize the size of the feature set in order to produce a feature set with good generalization accuracy. The features chosen are used as inputs to a high order perceptron network which is trained with an interior point linear programming method. Performance on a holdout set is used in conjunction with complexity penalization in order to insure that the final feature set generated by the genetic algorithm does not overfit the training data.

[1]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, STOC '84.

[2]  David C. Wilkins,et al.  Using apprenticeship techniques to guide constructive induction , 1994 .

[3]  L. Darrell Whitley,et al.  Optimizing Neural Networks Using FasterMore Accurate Genetic Search , 1989, ICGA.

[4]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[5]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[6]  Renée Elio,et al.  An incremental deductive strategy for controlling constructive induction in learning from examples , 1991, Machine Learning.

[7]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[8]  G. Wahba Smoothing noisy data with spline functions , 1975 .

[9]  Nada Lavrac,et al.  The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains , 1986, AAAI.

[10]  Nicholas J. Redding,et al.  Constructive higher-order network that is polynomial time , 1993, Neural Networks.

[11]  Paulo J. G. Lisboa,et al.  Translation, rotation, and scale invariant pattern recognition by high-order neural networks and moment classifiers , 1992, IEEE Trans. Neural Networks.

[12]  Dennis J. Volper,et al.  Quadratic function nodes: Use, structure and training , 1990, Neural Networks.

[13]  Tim Andersen Learning and Generalization with Bounded Order Critical Feature Sets , 1993 .

[14]  M. Thompson Selection of Variables in Multiple Regression: Part I. A Review and Evaluation , 1978 .

[15]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[16]  Tariq Samad,et al.  Towards the Genetic Synthesisof Neural Networks , 1989, ICGA.

[17]  Paul W. Baim A Method for Attribute Selection in Inductive Learning Systems , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Salvatore D. Morgera Toward a Fundamental Theory of Optimal Feature Selection: Part II-Implementation and Computational Complexit , 1987, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[20]  Adam Kowalczyk,et al.  Developing higher-order networks with empirically selected units , 1994, IEEE Trans. Neural Networks.

[21]  Colin Giles,et al.  Learning, invariance, and generalization in high-order neural networks. , 1987, Applied optics.