论文信息 - Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference - 字舞流文

Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference

Intent detection is one of the core components of goal-oriented dialog systems, and detecting out-of-scope (OOS) intents is also a practically important skill. Few-shot learning is attracting much attention to mitigate data scarcity, but OOS detection becomes even more challenging. In this paper, we present a simple yet effective approach, discriminative nearest neighbor classification with deep self-attention. Unlike softmax classifiers, we leverage BERT-style pairwise encoding to train a binary classifier that estimates the best matched training example for a user input. We propose to boost the discriminative ability by transferring a natural language inference (NLI) model. Our extensive experiments on a large-scale multi-domain intent detection task show that our method achieves more stable and accurate in-domain and OOS detection accuracy than RoBERTa-based classifiers and embedding-based nearest neighbor approaches. More notably, the NLI transfer enables our 10-shot model to perform competitively with 50-shot or even full-shot classifiers, while we can keep the inference time constant by leveraging a faster embedding retrieval model.

Philip S. Yu | Richard Socher | Caiming Xiong | Chien-Sheng Wu | Wenhao Liu | Yao Wan | Jian-Guo Zhang | Kazuma Hashimoto | R. Socher | Caiming Xiong | Jianguo Zhang | Kazuma Hashimoto | Yao Wan | Wenhao Liu | Chien-Sheng Wu

[1] Shengli Sun,et al. Hierarchical Attention Prototypical Networks for Few-Shot Text Classification , 2019, EMNLP.

[2] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Marco Turchi,et al. ESCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing , 2018, LREC.

[4] Kevin Gimpel,et al. Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations , 2017, ArXiv.

[5] Chris Callison-Burch,et al. PPDB: The Paraphrase Database , 2013, NAACL.

[6] Nitish Joshi,et al. Explore, Propose, and Assemble: An Interpretable Model for Multi-Hop Reading Comprehension , 2019, ACL.

[7] Lingjia Tang,et al. An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction , 2019, EMNLP.

[8] Kevin Gimpel,et al. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[9] Fei-FeiLi,et al. One-Shot Learning of Object Categories , 2006 .

[10] Dongyan Zhao,et al. Marrying Up Regular Expressions with Neural Networks: A Case Study for Spoken Language Understanding , 2018, ACL.

[11] Kai Zou,et al. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[12] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[13] Ming-Wei Chang,et al. Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.

[14] Matthew Henderson,et al. ConveRT: Efficient and Accurate Conversational Representations from Transformers , 2020, EMNLP.

[15] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[17] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[18] Richard Socher,et al. TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue , 2020, EMNLP.

[19] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[20] Yann LeCun,et al. Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[21] Mohit Bansal,et al. Revealing the Importance of Semantic Retrieval for Machine Reading at Scale , 2019, EMNLP.

[22] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[23] Guillaume Lample,et al. XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[24] Geoffrey E. Hinton,et al. When Does Label Smoothing Help? , 2019, NeurIPS.

[25] Quoc V. Le,et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.

[26] Raghav Gupta,et al. Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , 2020, AAAI.

[27] Ali Farhadi,et al. Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index , 2019, ACL.

[28] Richard Socher,et al. Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering , 2019, ICLR.

[29] Matteo Negri,et al. Findings of the WMT 2019 Shared Task on Automatic Post-Editing , 2019, WMT.

[30] Benjamin Roth,et al. Interpretable Question Answering on Knowledge Bases and Text , 2019, ACL.

[31] M. Marelli,et al. SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[32] Yoshimasa Tsuruoka,et al. A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[33] Orhan Firat,et al. Zero-Shot Cross-lingual Classification Using Multilingual Neural Machine Translation , 2018, ArXiv.

[34] Frank Hutter,et al. Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[35] Sam Shleifer. Low Resource Text Classification with ULMFit and Backtranslation , 2019, ArXiv.

[36] Philip S. Yu,et al. Open-world Learning and Application to Product Classification , 2018, WWW.

[37] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[38] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[39] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[40] Philip S. Yu,et al. CG-BERT: Conditional Text Generation with BERT for Generalized Few-shot Intent Detection , 2020, ArXiv.

[41] King-Sun Fu,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[43] Ruslan Salakhutdinov,et al. Question Answering from Unstructured Text by Retrieval and Comprehension , 2017, ArXiv.

[44] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[46] Sarah Jane Delany. k-Nearest Neighbour Classifiers , 2007 .

[47] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[48] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[49] Matthew Henderson,et al. Efficient Intent Detection with Dual Sentence Encoders , 2020, NLP4CONVAI.

[50] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[51] Nathanael Chambers,et al. Improving the Use of Pseudo-Words for Evaluating Selectional Preferences , 2010, ACL.

[52] Jian Sun,et al. Induction Networks for Few-Shot Text Classification , 2019, EMNLP/IJCNLP.

[53] Graham W. Taylor,et al. Learning Confidence for Out-of-Distribution Detection in Neural Networks , 2018, ArXiv.

[54] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Samuel R. Bowman,et al. Deep Learning for Natural Language Inference , 2019, NAACL.

[56] Huajun Chen,et al. Improving Few-shot Text Classification via Pretrained Language Representations , 2019, ArXiv.