Learning text representation using recurrent convolutional neural network with highway layers

Recently, the rapid development of word embedding and neural networks has brought new inspiration to various NLP and IR tasks. In this paper, we describe a staged hybrid model combining Recurrent Convolutional Neural Networks (RCNN) with highway layers. The highway network module is incorporated in the middle takes the output of the bi-directional Recurrent Neural Network (Bi-RNN) module in the first stage and provides the Convolutional Neural Network (CNN) module in the last stage with the input. The experiment shows that our model outperforms common neural network models (CNN, RNN, Bi-RNN) on a sentiment analysis task. Besides, the analysis of how sequence length influences the RCNN with highway layers shows that our model could learn good representation for the long text.

[1]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[2]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[3]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[6]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[7]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[8]  Ye Zhang,et al.  A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification , 2015, IJCNLP.

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[11]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Thomas K. Landauer,et al.  On the computational basis of learning and cognition: Arguments from LSA , 2002 .

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[15]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[16]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[17]  Razvan Pascanu,et al.  Advances in optimizing recurrent networks , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.