An incremental learning algorithm that optimizes network size and sample size in one trial

A constructive learning algorithm is described that builds a feedforward neural network with an optimal number of hidden units to balance convergence and generalization. The method starts with a small training set and a small network, and expands the training set incrementally after training. If the training does not converge, the network grows incrementally to increase its learning capacity. This process, called selective learning with flexible neural architectures (SELF), results in a construction of an optimal size network for learning all the given data using only a minimal subset of them. The author shows that the network size optimization combined with active example selection generalizes significantly better and converges faster than conventional methods.<<ETX>>

[1]  Byoung-Tak Zhang,et al.  Accelerated Learning by Active Example Selection , 1994, Int. J. Neural Syst..

[2]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[3]  Byoung-Tak Zhang,et al.  Neural networks that teach themselves through genetic discovery of novel examples , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[4]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[5]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[6]  Gerald Tesauro,et al.  Scaling and Generalization in Neural Networks: A Case Study , 1988, NIPS.

[7]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[8]  F. Smieja Neural network constructive algorithms: Trading generalization for learning efficiency? , 1993 .

[9]  Lawrence D. Jackel,et al.  Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[10]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[11]  Yamashita,et al.  Backpropagation algorithm which varies the number of hidden units , 1989 .

[12]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[13]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[14]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[15]  Shun-ichi Amari,et al.  Four Types of Learning Curves , 1992, Neural Computation.

[16]  Mark Plutowski,et al.  Selecting concise training sets from clean data , 1993, IEEE Trans. Neural Networks.

[17]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[18]  Jenq-Neng Hwang,et al.  Query-based learning applied to partially trained multilayer perceptrons , 1991, IEEE Trans. Neural Networks.

[19]  Yaser S. Abu-Mostafa,et al.  The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning , 1989, Neural Computation.