Learning long-term dependencies with gradient descent is difficult
暂无分享,去创建一个
Yoshua Bengio | Paolo Frasconi | Patrice Y. Simard | Patrice Simard | Yoshua Bengio | P. Simard | P. Frasconi
[1] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.
[2] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[3] Y. L. Cun. Learning Process in an Asymmetric Threshold Network , 1986 .
[4] Yann LeCun,et al. Learning processes in an asymmetric threshold network , 1986 .
[5] Sandro Ridella,et al. Minimizing multimodal functions of continuous variables with the “simulated annealing” algorithmCorrigenda for this article is available here , 1987, TOMS.
[6] Eytan Domany,et al. Learning by Choice of Internal Representations , 1988, Complex Syst..
[7] Yann LeCun,et al. Improving the convergence of back-propagation learning with second-order methods , 1989 .
[8] M. Gori,et al. BPS: a learning algorithm for capturing the dynamic nature of speech , 1989, International 1989 Joint Conference on Neural Networks.
[9] Michael C. Mozer,et al. A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..
[10] Richard Rohwer,et al. The "Moving Targets" Training Algorithm , 1989, NIPS.
[11] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[12] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[13] Yoshua Bengio,et al. Artificial neural networks and their application to sequence recognition , 1991 .
[14] Michael C. Mozer,et al. Induction of Multiscale Temporal Structure , 1991, NIPS.
[15] Charles M. Marcus,et al. Nonlinear dynamics and stability of analog neural networks , 1991 .
[16] Giovanni Soda,et al. Local Feedback Multilayered Networks , 1992, Neural Computation.
[17] C. L. Giles,et al. Inserting rules into recurrent neural networks , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.
[18] Peter L. Bartlett,et al. Using random weights to train multilayer networks of hard-limiting units , 1992, IEEE Trans. Neural Networks.
[19] Yoshua Bengio,et al. Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.
[20] Yoshua Bengio,et al. The problem of learning long-term dependencies in recurrent networks , 1993, IEEE International Conference on Neural Networks.
[21] R. J. Gaynier,et al. A method of training multi-layer networks with heaviside characteristics using internal representations , 1993, IEEE International Conference on Neural Networks.
[22] Giovanni Soda,et al. Unified Integration of Explicit Knowledge and Learning by Example in Recurrent Networks , 1995, IEEE Trans. Knowl. Data Eng..