Hierarchical and lateral multiple timescales gated recurrent units with pre-trained encoder for long text classification

Abstract Text classification, using deep learning techniques, has become a research challenge in natural language processing. Most of the existing deep learning models for text classification face difficulties when the length of the input text increases. Most models work well on shorter text inputs, however, their performance degrades with the increase in the input length. In this work, we introduce a model for text classification that can alleviate this problem. We present the hierarchical and lateral multiple timescales gated recurrent units (HL-MTGRU), in combination with pre-trained encoders to address the long text classification problem. HL-MTGRU can represent multiple temporal scale dependencies for the discrimination task. By combining the slow and fast units of the HL-MTGRU, our model effectively classifies long multi-sentence texts into the desired classes. We also show that the HL-MTGRU structure helps the model to prevent degradation of performance on longer text inputs. We demonstrate that the proposed network with the help of the latest pre-trained encoders for feature extraction outperforms the conventional models on various long text classification benchmark datasets.

[1]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[2]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[3]  Ankur Bapna,et al.  The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[4]  Jun Tani,et al.  Emergence of Functional Hierarchy in a Multiple Timescale Neural Network Model: A Humanoid Robot Experiment , 2008, PLoS Comput. Biol..

[5]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[6]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[7]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10]  Wang Ling,et al.  Generative and Discriminative Text Classification with Recurrent Neural Networks , 2017, ArXiv.

[11]  Zhiwei Wang,et al.  R-Transformer: Recurrent Neural Network Enhanced Transformer , 2019, ArXiv.

[12]  Matthew M Botvinick,et al.  Multilevel structure in behaviour and in the brain: a model of Fuster's hierarchy , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[13]  Stefan Wermter,et al.  Adaptive Learning of Linguistic Hierarchy in a Multiple Timescale Recurrent Neural Network , 2012, ICANN.

[14]  Minho Lee,et al.  Towards Abstraction from Extraction: Multiple Timescale Gated Recurrent Unit for Summarization , 2016, Rep4NLP@ACL.

[15]  Christof Monz,et al.  The Importance of Being Recurrent for Modeling Hierarchical Structure , 2018, EMNLP.

[16]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[17]  Ignazio Gallo,et al.  Seeing Colors: Learning Semantic Text Encoding for Classification , 2018, ArXiv.

[18]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[19]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[20]  Xing Wang,et al.  Modeling Recurrence for Transformer , 2019, NAACL.

[21]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[22]  Quoc V. Le,et al.  Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[23]  Edward T. Bullmore,et al.  Neuroinformatics Original Research Article , 2022 .

[24]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[25]  Minho Lee,et al.  Temporal hierarchies in multilayer gated recurrent neural networks for language models , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[26]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Lihong Li,et al.  Neural Approaches to Conversational AI , 2019, Found. Trends Inf. Retr..

[29]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  D. Poeppel,et al.  Cortical Tracking of Hierarchical Linguistic Structures in Connected Speech , 2015, Nature Neuroscience.

[32]  Honglak Lee,et al.  An efficient framework for learning sentence representations , 2018, ICLR.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Jun Tani,et al.  Motor primitive and sequence self-organization in a hierarchical recurrent neural network , 2004, Neural Networks.

[35]  Minho Lee,et al.  Abstractive summarization of long texts by representing multiple compositionalities with temporal hierarchical pointer generator network , 2019, Neural Networks.

[36]  Ali Farhadi,et al.  Neural Speed Reading via Skim-RNN , 2017, ICLR.

[37]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[38]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[39]  Jason Yosinski,et al.  Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2020, ICLR.

[40]  Minho Lee,et al.  Representing Compositionality based on Multiple Timescales Gated Recurrent Neural Networks with Adaptive Temporal Hierarchy for Character-Level Language Models , 2017, Rep4NLP@ACL.

[41]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.