Fast training of recurrent networks based on the EM algorithm

In this work, a probabilistic model is established for recurrent networks. The expectation-maximization (EM) algorithm is then applied to derive a new fast training algorithm for recurrent networks through mean-field approximation. This new algorithm converts training a complicated recurrent network into training an array of individual feedforward neurons. These neurons are then trained via a linear weighted regression algorithm. The training time has been improved by five to 15 times on benchmark problems.

[1]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[2]  David Saad,et al.  Learning by Choice of Internal Representations: An Energy Minimization Approach , 1990, Complex Syst..

[3]  Michael C. Mozer,et al.  A Unified Gradient-Descent/Clustering Architecture for Finite State Machine Induction , 1993, NIPS.

[4]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[7]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[8]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[9]  Yoshua Bengio,et al.  The problem of learning long-term dependencies in recurrent networks , 1993, IEEE International Conference on Neural Networks.

[10]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[11]  Anders Krogh,et al.  A Cost Function for Internal Representations , 1989, NIPS.

[12]  Peter Tiño,et al.  Learning and Extracting Initial Mealy Automata with a Modular Neural Network Model , 1995, Neural Comput..

[13]  C. Lee Giles,et al.  An experimental comparison of recurrent neural networks , 1994, NIPS.

[14]  William J. Byrne,et al.  Alternating minimization and Boltzmann machine learning , 1992, IEEE Trans. Neural Networks.

[15]  Chuanyi Ji,et al.  An Efficient EM-based Training Algorithm for Feedforward Neural Networks , 1997, Neural Networks.

[16]  Jun Zhang The mean field theory in EM procedures for Markov random fields , 1992, IEEE Trans. Signal Process..

[17]  Esther Levin,et al.  A statistical approach to learning and generalization in layered neural networks , 1989, Proc. IEEE.

[18]  C. Lee Giles,et al.  Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.

[19]  Leo Breiman,et al.  Hinging hyperplanes for regression, classification, and function approximation , 1993, IEEE Trans. Inf. Theory.

[20]  M. Niranjan,et al.  A Dynamic Neural Network Architecture by Sequential Partitioning of the Input Space , 1994, Neural Computation.

[21]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[22]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[23]  Scott E. Fahlman,et al.  The Recurrent Cascade-Correlation Architecture , 1990, NIPS.

[24]  Esther Levin,et al.  A statistical approach to learning and generalization in layered neural networks , 1989, COLT '89.

[25]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[26]  Shun-ichi Amari,et al.  Information geometry of Boltzmann machines , 1992, IEEE Trans. Neural Networks.

[27]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[28]  Yoshua Bengio,et al.  Credit Assignment through Time: Alternatives to Backpropagation , 1993, NIPS.

[29]  Duc Truong Pham,et al.  Adaptive control of dynamic systems using neural networks , 1993, Proceedings of IEEE Systems Man and Cybernetics Conference - SMC.

[30]  Eduardo Sontag Systems Combining Linearity and Saturations, and Relations of “Neural Nets” , 1992 .