Structural adaptation of parsimonious higher-order neural classifiers

Abstract We exploit the potential of parsimonious higher-order neural classifiers for reduction of hardware expenses, speedup in learning, and robust generalization. Specifically, our neuron model allows for computation of input products of potentially unlimited order. Structural adaptation of the topology is achieved by two alternative algorithms, that ultimately allocate resources for the relevant nonlinear interactions only. At the same time, the problem of combinatorial explosion of higher-order terms is kept in check. The first algorithm, being a deterministic pruning variant, starts with the ultimate higher-order neuron, and performs an iterated process of weight elimination. The second algorithm, implementing a stochastic search, explores the space of sparse topologies. It starts with a randomly allocated set of higher-order terms, and modifies resource allocation, while keeping the size of the architecture fixed. Two challenging classification benchmarks were chosen to demonstrate the excellent performance of the presented approach: first, the two-spirals separation problem, and second the left-/right-shift classification problem for binary strings. Our simulation results show that the proposed model may be a powerful tool for a variety of hard classification problems.

[1]  David E. Rumelhart,et al.  Product Units: A Computationally Powerful and Biologically Plausible Extension to Backpropagation Networks , 1989, Neural Computation.

[2]  Anthony N. Burkitt,et al.  Refined Pruning Techniques for Feed-forward Neural Networks , 1992, Complex Syst..

[3]  Tal Grossman,et al.  The CHIR Algorithm for Feed Forward Networks with Binary Weights , 1989, NIPS.

[4]  Gerald Fahner,et al.  A higher order unit that performs arbitrary Boolean functions , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[5]  Richard Rohwer,et al.  The "Moving Targets" Training Algorithm , 1989, NIPS.

[6]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[7]  Eric B. Baum,et al.  Constructing Hidden Units Using Examples and Queries , 1990, NIPS.

[8]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[9]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .

[10]  Anders Krogh,et al.  A Cost Function for Internal Representations , 1989, NIPS.

[11]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[12]  K. Lang,et al.  Learning to tell two spirals apart , 1988 .

[13]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[14]  Colin Giles,et al.  Learning, invariance, and generalization in high-order neural networks. , 1987, Applied optics.

[15]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[16]  E. R. Caianiello Problems for Nets and Nets for Problems , 1988 .

[17]  Tony Bell Learning in Higher-Order "Artificial Dendritic Trees" , 1989, NIPS.

[18]  Eric B. Baum,et al.  Supervised Learning of Probability Distributions by Neural Networks , 1987, NIPS.

[19]  K. R. Rao,et al.  Orthogonal Transforms for Digital Signal Processing , 1979, IEEE Transactions on Systems, Man and Cybernetics.

[20]  Yaser S. Abu-Mostafa,et al.  The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning , 1989, Neural Computation.

[21]  Gérard Dreyfus,et al.  A new approach to the design of neural network classifiers and its application to the automatic recognition of handwritten digits , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.