Learning Latent Multiscale Structure Using Recurrent Neural Networks

In this paper, we introduce a hierarchical recurrent neural network architecture that enables the model to adpatively capture the underlying temporal dependencies in sequences with different timescales while not using explicit boundary information. In experiments on character-level language modelling, we demonstrate that our proposed model performs significantly better than previously proposed models, achieving the state-of-the-art.

[1]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[2]  J. Kleinberg Bursty and Hierarchical Structure in Streams , 2002, Data mining and knowledge discovery.

[3]  Ying Zhang,et al.  On Multiplicative Integration with Recurrent Neural Networks , 2016, NIPS.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[6]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[7]  Quoc V. Le,et al.  HyperNetworks , 2016, ICLR.

[8]  Alex Graves,et al.  Grid Long Short-Term Memory , 2015, ICLR.

[9]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[10]  Jürgen Schmidhuber,et al.  Recurrent Highway Networks , 2016, ICML.

[11]  Kamil M Rocki,et al.  Recurrent Memory Array Structures , 2016, ArXiv.

[12]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[13]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[14]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[15]  Yoshua Bengio,et al.  A Character-level Decoder without Explicit Segmentation for Neural Machine Translation , 2016, ACL.

[16]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[17]  J. Urgen Schmidhuber,et al.  Neural sequence chunkers , 1991, Forschungsberichte, TU Munich.

[18]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[19]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[20]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[21]  Matthew V. Mahoney,et al.  Adaptive weighing of context models for lossless data compression , 2005 .