论文信息 - Induction Networks for Few-Shot Text Classification

Induction Networks for Few-Shot Text Classification

Text classification tends to struggle when data is deficient or when it needs to adapt to unseen classes. In such challenging scenarios, recent studies have used meta-learning to simulate the few-shot task, in which new queries are compared to a small support set at the sample-wise level. However, this sample-wise comparison may be severely disturbed by the various expressions in the same class. Therefore, we should be able to learn a general representation of each class in the support set and then compare it to new queries. In this paper, we propose a novel Induction Network to learn such a generalized class-wise representation, by innovatively leveraging the dynamic routing algorithm in meta-learning. In this way, we find the model is able to induce and generalize better. We evaluate the proposed model on a well-studied sentiment classification dataset (English) and a real-world dialogue intent classification dataset (Chinese). Experiment results show that on both datasets, the proposed model significantly outperforms the existing state-of-the-art approaches, proving the effectiveness of class-wise generalization in few-shot text classification.

[1] António Branco,et al. Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings , 2017, ACL.

[2] Geoffrey E. Hinton,et al. Dynamic Routing Between Capsules , 2017, NIPS.

[3] Pietro Perona,et al. A Bayesian approach to unsupervised one-shot learning of object categories , 2003, ICCV 2003.

[4] Hong Yu,et al. Meta Networks , 2017, ICML.

[5] Philip S. Yu,et al. Open-world Learning and Application to Product Classification , 2018, WWW.

[6] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[7] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[8] Danqi Chen,et al. Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[9] Geoffrey E. Hinton,et al. Transforming Auto-Encoders , 2011, ICANN.

[10] Alexander J. Smola,et al. Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[11] Philip S. Yu,et al. Lifelong Domain Word Embedding via Meta-Learning , 2018, IJCAI.

[12] Yu Cheng,et al. Diverse Few-Shot Text Classification with Multiple Metrics , 2018, NAACL.

[13] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[15] Joan Bruna,et al. Few-Shot Learning with Graph Neural Networks , 2017, ICLR.

[16] Ieee Xplore,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Weiyao Lin,et al. Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion , 2018, AAAI.

[18] Min Yang,et al. Investigating Capsule Networks with Dynamic Routing for Text Classification , 2018, EMNLP.

[19] Ping Jian,et al. Implicit discourse relation identification based on tree structure neural network , 2017, 2017 International Conference on Asian Language Processing (IALP).

[20] Bowen Zhou,et al. A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[21] Matthew A. Brown,et al. Low-Shot Learning with Imprinted Weights , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22] Justin Salamon,et al. Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.

[23] John Blitzer,et al. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[24] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[26] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[27] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[28] Ramakanth Kavuluru,et al. Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces , 2018, EMNLP.

[29] Philip S. Yu,et al. Zero-shot User Intent Detection via Capsule Neural Networks , 2018, EMNLP.

[30] Fei-FeiLi,et al. One-Shot Learning of Object Categories , 2006 .

[31] Pietro Perona,et al. A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[32] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[33] Xiaoyong Du,et al. Analogical Reasoning on Chinese Morphological and Semantic Relations , 2018, ACL.

[34] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[35] Xueqi Cheng,et al. A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations , 2015, AAAI.

[36] Michael I. Jordan,et al. Unsupervised Domain Adaptation with Residual Transfer Networks , 2016, NIPS.