Likelihood Ratios and Generative Classifiers for Unsupervised Out-of-Domain Detection In Task Oriented Dialog

The task of identifying out-of-domain (OOD) input examples directly at test-time has seen renewed interest recently due to increased real world deployment of models. In this work, we focus on OOD detection for natural language sentence inputs to task-based dialog systems. Our findings are three-fold: First, we curate and release ROSTD (Real Out-of-Domain Sentences From Task-oriented Dialog) - a dataset of 4K OOD examples for the publicly available dataset from (Schuster et al. 2019). In contrast to existing settings which synthesize OOD examples by holding out a subset of classes, our examples were authored by annotators with apriori instructions to be out-of-domain with respect to the sentences in an existing dataset. Second, we explore likelihood ratio based approaches as an alternative to currently prevalent paradigms. Specifically, we reformulate and apply these approaches to natural language inputs. We find that they match or outperform the latter on all datasets, with larger improvements on non-artificial OOD benchmarks such as our dataset. Our ablations validate that specifically using likelihood ratios rather than plain likelihood is necessary to discriminate well between OOD and in-domain data. Third, we propose learning a generative classifier and computing a marginal likelihood (ratio) for OOD detection. This allows us to use a principled likelihood while at the same time exploiting training-time labels. We find that this approach outperforms both simple likelihood (ratio) based and other prior approaches. We are hitherto the first to investigate the use of generative classifiers for OOD detection at test-time.

[1]  Hua Xu,et al.  Deep Unknown Intent Detection with Margin Loss , 2019, ACL.

[2]  Wang Ling,et al.  Generative and Discriminative Text Classification with Recurrent Neural Networks , 2017, ArXiv.

[3]  Eric Jang,et al.  Generative Ensembles for Robust Anomaly Detection , 2018, ArXiv.

[4]  Sebastian Schuster,et al.  Cross-lingual Transfer Learning for Multilingual Task Oriented Dialog , 2018, NAACL.

[5]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[6]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[7]  Alexander A. Alemi,et al.  WAIC, but Why? Generative Ensembles for Robust Anomaly Detection , 2018 .

[8]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[9]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[12]  Thomas G. Dietterich,et al.  Deep Anomaly Detection with Outlier Exposure , 2018, ICLR.

[13]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[14]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[15]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[18]  Chris Dyer,et al.  Notes on Noise Contrastive Estimation and Negative Sampling , 2014, ArXiv.

[19]  Jasper Snoek,et al.  Likelihood Ratios for Out-of-Distribution Detection , 2019, NeurIPS.

[20]  Yee Whye Teh,et al.  Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[21]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[22]  Kibok Lee,et al.  Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[23]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[24]  Francesco Caltagirone,et al.  Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces , 2018, ArXiv.