GOLD: Improving Out-of-Scope Detection in Dialogues using Data Augmentation

Practical dialogue systems require robust methods of detecting out-of-scope (OOS) utterances to avoid conversational breakdowns and related failure modes. Directly training a model with labeled OOS examples yields reasonable performance, but obtaining such data is a resource-intensive process. To tackle this limited-data problem, previous methods focus on better modeling the distribution of in-scope (INS) examples. We introduce GOLD as an orthogonal technique that augments existing data to train better OOS detectors operating in low-data regimes. GOLD generates pseudo-labeled candidates using samples from an auxiliary dataset and keeps only the most beneficial candidates for training through a novel filtering mechanism. In experiments across three target benchmarks, the top GOLD model outperforms all existing methods on all key metrics, achieving relative gains of 52.4%, 48.9% and 50.3% against median baseline performance. We also analyze the unique properties of OOS data to identify key factors for optimally applying our proposed method.1

[1]  Zhangyang Wang,et al.  Self-Supervised Learning for Generalizable Out-of-Distribution Detection , 2020, AAAI.

[2]  Soroush Vosoughi,et al.  Data Boost: Text Data Augmentation through Reinforcement Learning Guided Conditional Generation , 2020, EMNLP.

[3]  Lingjia Tang,et al.  An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction , 2019, EMNLP.

[4]  Gary Geunbae Lee,et al.  Neural sentence embedding using only in-domain sentences for out-of-domain sentence detection in dialog systems , 2017, Pattern Recognit. Lett..

[5]  Ho-Jin Choi,et al.  Out-of-Domain Detection Method Based on Sentence Distance for Dialogue Systems , 2018, 2018 IEEE International Conference on Big Data and Smart Computing (BigComp).

[6]  Irina Piontkovskaya,et al.  Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain Detection , 2021, ArXiv.

[7]  Bing Liu,et al.  Breaking the Closed World Assumption in Text Classification , 2016, NAACL.

[8]  Yijia Liu,et al.  Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding , 2018, COLING.

[9]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[10]  Xia Zhu,et al.  Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-out Classifiers , 2018, ECCV.

[11]  Marzyeh Ghassemi,et al.  Improving Dialogue Breakdown Detection with Semi-Supervised Learning , 2020, ArXiv.

[12]  Young-Bum Kim,et al.  Joint Learning of Domain Classification and Out-of-Domain Detection with Dynamic Class Weighting for Satisficing False Acceptance Rates , 2018, INTERSPEECH.

[13]  Kai Zou,et al.  EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[14]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[15]  Shikib Mehri,et al.  STAR: A Schema-Guided Dialog Dataset for Transfer Learning , 2020, ArXiv.

[16]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[17]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[18]  Thomas G. Dietterich,et al.  Deep Anomaly Detection with Outlier Exposure , 2018, ICLR.

[19]  Sebastian Schuster,et al.  Cross-lingual Transfer Learning for Multilingual Task Oriented Dialog , 2018, NAACL.

[20]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[21]  Kibok Lee,et al.  Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[22]  Daphna Weinshall,et al.  Distance-based Confidence Score for Neural Network Classifiers , 2017, ArXiv.

[23]  Percy Liang,et al.  Selective Question Answering under Domain Shift , 2020, ACL.

[24]  Fabio Roli,et al.  Classification with reject option in text categorisation systems , 2003, 12th International Conference on Image Analysis and Processing, 2003.Proceedings..

[25]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[26]  Yang Yu,et al.  Out-of-Domain Detection for Low-Resource Text Classification Tasks , 2019, EMNLP.

[27]  Percy Liang,et al.  Data Recombination for Neural Semantic Parsing , 2016, ACL.

[28]  Diyi Yang,et al.  That’s So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets , 2015, EMNLP.

[29]  Asma Ben Abacha,et al.  A question-entailment approach to question answering , 2019, BMC Bioinformatics.

[30]  Jason Baldridge,et al.  PAWS: Paraphrase Adversaries from Word Scrambling , 2019, NAACL.

[31]  Jason Weston,et al.  Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[32]  Kyunghyun Cho,et al.  SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness , 2020, EMNLP.

[33]  Eyup Halit Yilmaz,et al.  KLOOS: KL Divergence-based Out-of-Scope Intent Detection in Human-to-Machine Conversations , 2020, SIGIR.

[34]  Lei Shu,et al.  DOC: Deep Open Classification of Text Documents , 2017, EMNLP.

[35]  Vincent Auvray,et al.  OodGAN: Generative Adversarial Network for Out-of-Domain Data Generation , 2021, NAACL.

[36]  Jason Weston,et al.  Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[37]  Xing Wu,et al.  Conditional BERT Contextual Augmentation , 2018, ICCS.

[38]  M. Rey The Error Is the Clue: Breakdown In Human-Machine Interaction , 2003 .

[39]  Alessandro Rinaldo,et al.  Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection , 2019, NeurIPS.

[40]  Ling Liu,et al.  Data Augmentation for Morphological Reinflection , 2017, CoNLL.

[41]  Bill Byrne,et al.  Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset , 2019, EMNLP.

[42]  John Langford,et al.  Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds , 2019, ICLR.

[43]  Sungjin Lee,et al.  Contextual Out-of-domain Utterance Handling with Counterfeit Data Augmentation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[44]  Jasper Snoek,et al.  Likelihood Ratios for Out-of-Distribution Detection , 2019, NeurIPS.

[45]  Sosuke Kobayashi,et al.  Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations , 2018, NAACL.

[46]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[47]  Yuka Kobayashi,et al.  The dialogue breakdown detection challenge: Task description, datasets, and evaluation metrics , 2016, LREC.

[48]  Christopher Potts,et al.  Posing Fair Generalization Tasks for Natural Language Inference , 2019, EMNLP.

[49]  Gary Geunbae Lee,et al.  Out-of-domain Detection based on Generative Adversarial Network , 2018, EMNLP.

[50]  Hua Xu,et al.  Deep Unknown Intent Detection with Margin Loss , 2019, ACL.

[51]  Christof Monz,et al.  Data Augmentation for Low-Resource Neural Machine Translation , 2017, ACL.

[52]  Arash Einolghozati,et al.  Likelihood Ratios and Generative Classifiers for Unsupervised Out-of-Domain Detection In Task Oriented Dialog , 2019, AAAI.

[53]  Ana Paula Appel,et al.  Improving Out-of-Scope Detection in Intent Classification by Using Embeddings of the Word Graph Space of the Classes , 2020, EMNLP.

[54]  Jacob Andreas,et al.  Task-Oriented Dialogue as Dataflow Synthesis , 2020, Transactions of the Association for Computational Linguistics.

[55]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[56]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[57]  Mohit Bansal,et al.  Automatically Learning Data Augmentation Policies for Dialogue Tasks , 2019, EMNLP.

[58]  Minlie Huang,et al.  Out-of-Domain Detection for Natural Language Understanding in Dialog Systems , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[59]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[60]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.