Representing Compositionality based on Multiple Timescales Gated Recurrent Neural Networks with Adaptive Temporal Hierarchy for Character-Level Language Models

A novel character-level neural language model is proposed in this paper. The proposed model incorporates a biologically inspired temporal hierarchy in the architecture for representing multiple compositions of language in order to handle longer sequences for the character-level language model. The temporal hierarchy is introduced in the language model by utilizing a Gated Recurrent Neural Network with multiple timescales. The proposed model incorporates a timescale adaptation mechanism for enhancing the performance of the language model. We evaluate our proposed model using the popular Penn Treebank and Text8 corpora. The experiments show that the use of multiple timescales in a Neural Language Model (NLM) enables improved performance despite having fewer parameters and with no additional computation requirements. Our experiments also demonstrate the ability of the adaptive temporal hierarchies to represent multiple compositonality without the help of complex hierarchical architectures and shows that better representation of the longer sequences lead to enhanced performance of the probabilistic language model.

[1]  Yoshua Bengio,et al.  Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.

[2]  Roland Memisevic,et al.  Regularizing RNNs by Stabilizing Activations , 2015, ICLR.

[3]  D. Poeppel,et al.  Cortical Tracking of Hierarchical Linguistic Structures in Connected Speech , 2015, Nature Neuroscience.

[4]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[5]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[6]  Matthew M Botvinick,et al.  Multilevel structure in behaviour and in the brain: a model of Fuster's hierarchy , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[7]  Jun Tani,et al.  Emergence of Functional Hierarchy in a Multiple Timescale Neural Network Model: A Humanoid Robot Experiment , 2008, PLoS Comput. Biol..

[8]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[9]  Yoshua Bengio,et al.  Architectural Complexity Measures of Recurrent Neural Networks , 2016, NIPS.

[10]  Ilya Sutskever,et al.  SUBWORD LANGUAGE MODELING WITH NEURAL NETWORKS , 2011 .

[11]  Quoc V. Le,et al.  HyperNetworks , 2016, ICLR.

[12]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[13]  Jun Tani,et al.  Motor primitive and sequence self-organization in a hierarchical recurrent neural network , 2004, Neural Networks.

[14]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[15]  Stefan Wermter,et al.  Adaptive Learning of Linguistic Hierarchy in a Multiple Timescale Recurrent Neural Network , 2012, ICANN.

[16]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[17]  Edward T. Bullmore,et al.  Neuroinformatics Original Research Article , 2022 .

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Katrin Kirchhoff,et al.  Factored Neural Language Models , 2006, NAACL.

[20]  Aaron C. Courville,et al.  Recurrent Batch Normalization , 2016, ICLR.

[21]  Maneesh Sahani,et al.  Regularization and nonlinearities for neural language models: when are they needed? , 2013, ArXiv.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[24]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[25]  Minho Lee,et al.  Towards Abstraction from Extraction: Multiple Timescale Gated Recurrent Unit for Summarization , 2016, Rep4NLP@ACL.