论文信息 - Hierarchical Recurrent Neural Networks for Long-Term Dependencies

Hierarchical Recurrent Neural Networks for Long-Term Dependencies

We have already shown that extracting long-term dependencies from sequential data is difficult, both for determimstic dynamical systems such as recurrent networks, and probabilistic models such as hidden Markov models (HMMs) or input/output hidden Markov models (IOHMMs). In practice, to avoid this problem, researchers have used domain specific a-priori knowledge to give meaning to the hidden or state variables representing past context. In this paper, we propose to use a more general type of a-priori knowledge, namely that the temporal dependencies are structured hierarchically. This implies that long-term dependencies are represented by variables with a long time scale. This principle is applied to a recurrent network which includes delays and multiple time scales. Experiments confirm the advantages of such structures. A similar approach is proposed for HMMs and IOHMMs.

Yoshua Bengio | Salah El Hihi | Yoshua Bengio

[1] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[2] L. R. Rabiner,et al. An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[3] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[4] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[5] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[7] Ingrid Daubechies,et al. The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[8] Geoffrey E. Hinton,et al. A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[9] Yann LeCun,et al. Reverse TDNN: An Architecture For Trajectory Generation , 1991, NIPS.

[10] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[11] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[12] Renato De Mori,et al. A family of parallel hidden Markov models , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13] C. L. Giles,et al. Inserting rules into recurrent neural networks , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[14] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[15] Yoshua Bengio,et al. Credit Assignment through Time: Alternatives to Backpropagation , 1993, NIPS.

[16] Yoshua Bengio,et al. Globally trained handwritten word recognizer using spatial representation, space displacement neural networks and hidden Markov models , 1993 .

[17] Yoshua Bengio,et al. An Input Output HMM Architecture , 1994, NIPS.

[18] Michael I. Jordan,et al. Boltzmann Chains and Hidden Markov Models , 1994, NIPS.

[19] N. Suaudeau. Un modele probabiliste pour integrer la dimension temporelle dans un systeme de reconnaissance automatique de parole , 1994 .

[20] P. Frasconi,et al. An EM Approach to Learning Sequential , 1994 .

[21] Yoshua Bengio,et al. Diffusion of Credit in Markovian Models , 1994, NIPS.

[22] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[23] Peter Tiňo,et al. Learning long-term dependencies is not as difficult with NARX recurrent neural networks , 1995 .

[24] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[25] Yoshua Bengio,et al. Diffusion of Context and Credit Information in Markovian Models , 1995, J. Artif. Intell. Res..

[26] Giovanni Soda,et al. Unified Integration of Explicit Knowledge and Learning by Example in Recurrent Networks , 1995, IEEE Trans. Knowl. Data Eng..