Molding CNNs for text: non-linear, non-consecutive convolutions

The success of deep learning often derives from well-chosen operational building blocks. In this work, we revise the temporal convolution operation in CNNs to better adapt it to text processing. Instead of concatenating word representations, we appeal to tensor algebra and use low-rank n-gram tensors to directly exploit interactions between words already at the convolution stage. Moreover, we extend the n-gram convolution to non-consecutive words to recognize patterns with intervening words. Through a combination of low-rank tensors, and pattern weighting, we can efficiently evaluate the resulting convolution operation via dynamic programming. We test the resulting architecture on standard sentiment classification and news categorization tasks. Our model achieves state-of-the-art performance both in terms of accuracy and training speed. For instance, we obtain 51.2% accuracy on the fine-grained sentiment classification task.

[1]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[2]  Christoph Goller,et al.  Inductive Learning in Symbolic Domains Using Structure-Driven Recurrent Neural Networks , 1996, KI.

[3]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[5]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[6]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[7]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[8]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[9]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[10]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[11]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[12]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[13]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[14]  Dimitri Kartsaklis,et al.  A Unified Sentence Space for Categorical Distributional-Compositional Semantics: Theory and Experiments , 2012, COLING.

[15]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[17]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[18]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[19]  Jianfeng Gao,et al.  Modeling Interestingness with Deep Neural Networks , 2014, EMNLP.

[20]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[21]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[22]  Claire Cardie,et al.  Deep Recursive Neural Networks for Compositionality in Language , 2014, NIPS.

[23]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[24]  Christopher Meek,et al.  Semantic Parsing for Single-Relation Question Answering , 2014, ACL.

[25]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[26]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[27]  Regina Barzilay,et al.  Low-Rank Tensors for Scoring Dependency Structures , 2014, ACL.

[28]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[29]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[30]  Phong Le,et al.  Compositional Distributional Semantics with Long Short Term Memory , 2015, *SEMEVAL.

[31]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[32]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[33]  Xiang Zhang,et al.  Text Understanding from Scratch , 2015, ArXiv.

[34]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.