An Efficient EM-based Training Algorithm for Feedforward Neural Networks

A fast training algorithm is developed for two-layer feedforward neural networks based on a probabilistic model for hidden representations and the EM algorithm. The algorithm decomposes training the original two-layer networks into training a set of single neurons. The individual neurons are then trained via a linear weighted regression algorithm. Significant improvement on training speed has been made using this algorithm for several bench-mark problems. Copyright 1997 Elsevier Science Ltd. All Rights Reserved.

[1]  Bernard Widrow,et al.  Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[2]  Arthur Nádas,et al.  Binary classification by stochastic neural nets , 1995, IEEE Trans. Neural Networks.

[3]  Anders Krogh,et al.  A Cost Function for Internal Representations , 1989, NIPS.

[4]  John S. Denker,et al.  Neural Networks for Computing , 1998 .

[5]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[6]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[7]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[8]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[9]  Esther Levin,et al.  A statistical approach to learning and generalization in layered neural networks , 1989, COLT '89.

[10]  M. Niranjan,et al.  A Dynamic Neural Network Architecture by Sequential Partitioning of the Input Space , 1994, Neural Computation.

[11]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994 .

[12]  William J. Byrne,et al.  Alternating minimization and Boltzmann machine learning , 1992, IEEE Trans. Neural Networks.

[13]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[14]  David Saad,et al.  Learning by Choice of Internal Representations: An Energy Minimization Approach , 1990, Complex Syst..

[15]  Shun-ichi Amari,et al.  Information geometry of Boltzmann machines , 1992, IEEE Trans. Neural Networks.

[16]  Esther Levin,et al.  A statistical approach to learning and generalization in layered neural networks , 1989, Proc. IEEE.

[17]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[18]  Shun-ichi Amari,et al.  The EM Algorithm and Information Geometry in Neural Network Learning , 1995, Neural Computation.

[19]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[20]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[22]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[23]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .