论文信息 - Fast training of recurrent networks based on the EM algorithm

Fast training of recurrent networks based on the EM algorithm

In this work, a probabilistic model is established for recurrent networks. The expectation-maximization (EM) algorithm is then applied to derive a new fast training algorithm for recurrent networks through mean-field approximation. This new algorithm converts training a complicated recurrent network into training an array of individual feedforward neurons. These neurons are then trained via a linear weighted regression algorithm. The training time has been improved by five to 15 times on benchmark problems.

Chuanyi Ji | Sheng Ma

[1] Michael I. Jordan,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[2] David Saad,et al. Learning by Choice of Internal Representations: An Energy Minimization Approach , 1990, Complex Syst..

[3] Michael C. Mozer,et al. A Unified Gradient-Descent/Clustering Architecture for Finite State Machine Induction , 1993, NIPS.

[4] Jing Peng,et al. An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[5] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[6] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[7] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.

[8] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[9] Yoshua Bengio,et al. The problem of learning long-term dependencies in recurrent networks , 1993, IEEE International Conference on Neural Networks.

[10] Geoffrey E. Hinton,et al. A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[11] Anders Krogh,et al. A Cost Function for Internal Representations , 1989, NIPS.

[12] Peter Tiño,et al. Learning and Extracting Initial Mealy Automata with a Modular Neural Network Model , 1995, Neural Comput..

[13] C. Lee Giles,et al. An experimental comparison of recurrent neural networks , 1994, NIPS.

[14] William J. Byrne,et al. Alternating minimization and Boltzmann machine learning , 1992, IEEE Trans. Neural Networks.

[15] Chuanyi Ji,et al. An Efficient EM-based Training Algorithm for Feedforward Neural Networks , 1997, Neural Networks.

[16] Jun Zhang. The mean field theory in EM procedures for Markov random fields , 1992, IEEE Trans. Signal Process..

[17] Esther Levin,et al. A statistical approach to learning and generalization in layered neural networks , 1989, Proc. IEEE.

[18] C. Lee Giles,et al. Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.

[19] Leo Breiman,et al. Hinging hyperplanes for regression, classification, and function approximation , 1993, IEEE Trans. Inf. Theory.

[20] M. Niranjan,et al. A Dynamic Neural Network Architecture by Sequential Partitioning of the Input Space , 1994, Neural Computation.

[21] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[22] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[23] Scott E. Fahlman,et al. The Recurrent Cascade-Correlation Architecture , 1990, NIPS.

[24] Esther Levin,et al. A statistical approach to learning and generalization in layered neural networks , 1989, COLT '89.

[25] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[26] Shun-ichi Amari,et al. Information geometry of Boltzmann machines , 1992, IEEE Trans. Neural Networks.

[27] Michael I. Jordan,et al. Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[28] Yoshua Bengio,et al. Credit Assignment through Time: Alternatives to Backpropagation , 1993, NIPS.

[29] Duc Truong Pham,et al. Adaptive control of dynamic systems using neural networks , 1993, Proceedings of IEEE Systems Man and Cybernetics Conference - SMC.

[30] Eduardo Sontag. Systems Combining Linearity and Saturations, and Relations of “Neural Nets” , 1992 .