JNN, a randomized algorithm for training multilayer networks in polynomial time

Abstract From an analytical approach of the multilayer network architecture, we deduce a polynomial-time algorithm for learning from examples. We call it JNN, for “Jacobian Neural Network”. Although this learning algorithm is a randomized algorithm, it gives a correct network with probability 1. The JNN learning algorithm is defined for a wide variety of multilayer networks, computing real output vectors, from real input vectors, through one or several hidden layers, with low assumptions on the activation functions of the hidden units. Starting from an exact learning algorithm, for a given database, we propose a regularization technique which improves the performance on applications, as can be verified on several benchmark problems. Moreover, the JNN algorithm does not require a priori statements on the network architecture, since the number of hidden units, for a one-hidden-layer network, is computed by learning. Finally, we show that a modular approach allows to learn with a reduced number of weights.

[1]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[2]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[3]  Richard Nock,et al.  Twelve Numerical, Symbolic and Hybrid Supervised Classification Methods , 1998, Int. J. Pattern Recognit. Artif. Intell..

[4]  J. Stephen Judd,et al.  Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.

[5]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[6]  Trevor Hastie,et al.  Automatic Smoothing Spline Projection Pursuit , 1994 .

[7]  Somnath Mukhopadhyay,et al.  A polynomial time algorithm for the construction and training of a class of multilayer perceptrons , 1993, Neural Networks.

[8]  Eric B. Baum,et al.  On the capabilities of multilayer perceptrons , 1988, J. Complex..

[9]  Hélène Paugam-Moisy,et al.  Size of Multilayer Networks for Exact Learning: Analytic Approach , 1996, NIPS.

[10]  Jenq-Neng Hwang,et al.  Regression modeling in back-propagation and projection pursuit learning , 1994, IEEE Trans. Neural Networks.

[11]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[12]  Santosh S. Vempala,et al.  A random sampling based algorithm for learning the intersection of half-spaces , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[13]  Edith Cohen,et al.  Learning noisy perceptrons by a perceptron in polynomial time , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[14]  K. Lang,et al.  Learning to tell two spirals apart , 1988 .

[15]  Van H. Vu On the Infeasibility of Training Neural Networks with Small Mean-Sqared Error , 1998, IEEE Trans. Inf. Theory.

[16]  Petri Koistinen Asymptotic Theory for Regularization: One-Dimensional Linear Case , 1997, NIPS.

[17]  Virginia L. Stonick,et al.  Topology and Geometry of Single Hidden Layer Network, Least Squares Weight Solutions , 1995, Neural Computation.

[18]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[19]  J. Bernasconi,et al.  Learning in neural networks , 1990 .

[20]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[21]  Van H. Vu On the Infeasibility of Training Neural Networks with Small Squared Errors , 1997, NIPS.