Few-Shot Text Classification with Induction Network

Text classification tends to struggle when data is deficient or when it needs to adapt to unseen classes. In such challenging scenarios, recent studies often use meta learning to simulate the few-shot task, in which new queries are compared to a small support set on a sample-wise level. However, this sample-wise comparison may be severely disturbed by the various expressions in the same class. Therefore, we should be able to learn a general representation of each class in the support set and then compare it to new queries. In this paper, we propose a novel Induction Network to learn such generalized class-wise representations, innovatively combining the dynamic routing algorithm with the typical meta learning framework. In this way, our model is able to induce from particularity to university, which is a more human-like learning approach. We evaluate our model on a well-studied sentiment classification dataset (English) and a real-world dialogue intent classification dataset (Chinese). Experiment results show that, on both datasets, our model significantly outperforms existing state-of-the-art models and improves the average accuracy by more than 3%, which proves the effectiveness of class-wise generalization in few-shot text classification.

[1]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[2]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[3]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[4]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[6]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Xiaoyong Du,et al.  Analogical Reasoning on Chinese Morphological and Semantic Relations , 2018, ACL.

[8]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[9]  Ramakanth Kavuluru,et al.  Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces , 2018, EMNLP.

[10]  Philip S. Yu,et al.  Zero-shot User Intent Detection via Capsule Neural Networks , 2018, EMNLP.

[11]  Weiyao Lin,et al.  Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion , 2018, AAAI.

[12]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[13]  Min Yang,et al.  Investigating Capsule Networks with Dynamic Routing for Text Classification , 2018, EMNLP.

[14]  Ping Jian,et al.  Implicit discourse relation identification based on tree structure neural network , 2017, 2017 International Conference on Asian Language Processing (IALP).

[15]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[16]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[17]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[18]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[19]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[20]  Xueqi Cheng,et al.  A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations , 2015, AAAI.

[21]  Michael I. Jordan,et al.  Unsupervised Domain Adaptation with Residual Transfer Networks , 2016, NIPS.

[22]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[23]  Yu Cheng,et al.  Diverse Few-Shot Text Classification with Multiple Metrics , 2018, NAACL.

[24]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..