Fast training of recurrent networks based on the EM algorithm
暂无分享,去创建一个
[1] Michael I. Jordan,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.
[2] David Saad,et al. Learning by Choice of Internal Representations: An Energy Minimization Approach , 1990, Complex Syst..
[3] Michael C. Mozer,et al. A Unified Gradient-Descent/Clustering Architecture for Finite State Machine Induction , 1993, NIPS.
[4] Jing Peng,et al. An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.
[5] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[6] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .
[7] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.
[8] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[9] Yoshua Bengio,et al. The problem of learning long-term dependencies in recurrent networks , 1993, IEEE International Conference on Neural Networks.
[10] Geoffrey E. Hinton,et al. A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.
[11] Anders Krogh,et al. A Cost Function for Internal Representations , 1989, NIPS.
[12] Peter Tiño,et al. Learning and Extracting Initial Mealy Automata with a Modular Neural Network Model , 1995, Neural Comput..
[13] C. Lee Giles,et al. An experimental comparison of recurrent neural networks , 1994, NIPS.
[14] William J. Byrne,et al. Alternating minimization and Boltzmann machine learning , 1992, IEEE Trans. Neural Networks.
[15] Chuanyi Ji,et al. An Efficient EM-based Training Algorithm for Feedforward Neural Networks , 1997, Neural Networks.
[16] Jun Zhang. The mean field theory in EM procedures for Markov random fields , 1992, IEEE Trans. Signal Process..
[17] Esther Levin,et al. A statistical approach to learning and generalization in layered neural networks , 1989, Proc. IEEE.
[18] C. Lee Giles,et al. Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.
[19] Leo Breiman,et al. Hinging hyperplanes for regression, classification, and function approximation , 1993, IEEE Trans. Inf. Theory.
[20] M. Niranjan,et al. A Dynamic Neural Network Architecture by Sequential Partitioning of the Input Space , 1994, Neural Computation.
[21] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[22] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.
[23] Scott E. Fahlman,et al. The Recurrent Cascade-Correlation Architecture , 1990, NIPS.
[24] Esther Levin,et al. A statistical approach to learning and generalization in layered neural networks , 1989, COLT '89.
[25] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[26] Shun-ichi Amari,et al. Information geometry of Boltzmann machines , 1992, IEEE Trans. Neural Networks.
[27] Michael I. Jordan,et al. Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..
[28] Yoshua Bengio,et al. Credit Assignment through Time: Alternatives to Backpropagation , 1993, NIPS.
[29] Duc Truong Pham,et al. Adaptive control of dynamic systems using neural networks , 1993, Proceedings of IEEE Systems Man and Cybernetics Conference - SMC.
[30] Eduardo Sontag. Systems Combining Linearity and Saturations, and Relations of “Neural Nets” , 1992 .