Investigating Capsule Networks with Dynamic Routing for Text Classification

In this study, we explore capsule networks with dynamic routing for text classification. We propose three strategies to stabilize the dynamic routing process to alleviate the disturbance of some noise capsules which may contain "background" information or have not been successfully trained. A series of experiments are conducted with capsule networks on six text classification benchmarks. Capsule networks achieve state of the art on 4 out of 6 datasets, which shows the effectiveness of capsule networks for text classification. We additionally show that capsule networks exhibit significant improvement when transfer single-label to multi-label text classification over strong baseline methods. To the best of our knowledge, this is the first work that capsule networks have been empirically investigated for text modeling.

[1]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[2]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[3]  Geoffrey E. Hinton,et al.  Matrix capsules with EM routing , 2018, ICLR.

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[6]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[7]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[8]  Björn W. Schuller,et al.  Contextual Bidirectional Long Short-Term Memory Recurrent Neural Network Language Models: A Generative Approach to Sentiment Analysis , 2017, EACL.

[9]  James Zijun Wang,et al.  Determining Gains Acquired from Word Embedding Quantitatively Using Discrete Distribution Clustering , 2017, ACL.

[10]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[13]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[14]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[15]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[16]  Yang Jin,et al.  Capsule Network Performance on Complex Data , 2017, ArXiv.

[17]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[18]  Wei Zhao,et al.  Fast dynamic routing based on weighted kernel density estimation , 2018, Cognitive Internet of Things.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[21]  Xijin Tang,et al.  TFIDF, LSI and multi-word in information retrieval and text categorization , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[22]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[23]  Mohammad S. Sorower A Literature Survey on Algorithms for Multi-label Learning , 2010 .

[24]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[25]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[26]  Xiaoyan Zhu,et al.  Linguistically Regularized LSTMs for Sentiment Classification , 2016, ArXiv.

[27]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[28]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[29]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[30]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.