Resilient Combination of Complementary CNN and RNN Features for Text Classification through Attention and Ensembling

Natural language processing (NLP) pipelines are usually complex, including several ways of extracting features and processing the inputs and results. The difficulty of the task directly affects the complexity of the system: multiple modules work together, extracting complementary information that is needed for a good performance. In this work we focus on text classification and show that the same intuition applies to end-to-end neural NLP architectures and that the best results are obtained reliably by combining the information from different neural modules. Concretely, we combine convolution, recurrent and attention modules with ensembles and show that they are complementary. We empirically prove that the combination is robust to various and complex text classification jobs and show that it attains or surpasses the state-of-the-art on a wide variety of datasets with no changes to the architecture. In addition, we show that ensembling CNN-RNN stacks with attention improve the performance with respect to only using a subset of the aforementioned modules. These observations hold in both low and high data availability, as well as for multi-class problems.

[1]  Wenpeng Yin,et al.  Comparative Study of CNN and RNN for Natural Language Processing , 2017, ArXiv.

[2]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[3]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[4]  George E. P. Box,et al.  Empirical Model‐Building and Response Surfaces , 1988 .

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[7]  Hatem Haddad,et al.  Churn Identification in Microblogs using Convolutional Neural Networks with Structured Logical Knowledge , 2017, NUT@EMNLP.

[8]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[9]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[10]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[11]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[12]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[15]  Tong Zhang,et al.  Deep Pyramid Convolutional Neural Networks for Text Categorization , 2017, ACL.

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[18]  Sebastian Ruder,et al.  Fine-tuned Language Models for Text Classification , 2018, ArXiv.

[19]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[20]  Hao Wang,et al.  Joint RNN model for argument component boundary detection , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[21]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[22]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[23]  Hal Daumé,et al.  Target-Dependent Churn Classification in Microblogs , 2015, AAAI.

[24]  Constanza Fierro,et al.  200K+ Crowdsourced Political Arguments for a New Chilean Constitution , 2017, ArgMining@EMNLP.

[25]  Hal Daumé,et al.  Short Text Representation for Detecting Churn in Microblogs , 2016, AAAI.

[26]  G. Box,et al.  Empirical Model-Building and Response Surfaces. , 1990 .

[27]  Tong Zhang,et al.  Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings , 2016, ICML.

[28]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[29]  Ji Feng,et al.  Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.