A post-processing method for detecting unknown intent of dialogue system via pre-trained deep neural network classifier

Abstract With the maturity and popularity of dialogue systems, detecting user’s unknown intent in dialogue systems has become an important task. It is also one of the most challenging tasks since we can hardly get examples, prior knowledge or the exact numbers of unknown intents. In this paper, we propose SofterMax and deep novelty detection (SMDN), a simple yet effective post-processing method for detecting unknown intent in dialogue systems based on pre-trained deep neural network classifiers. Our method can be flexibly applied on top of any classifiers trained in deep neural networks without changing the model architecture. We calibrate the confidence of the softmax outputs to compute the calibrated confidence score (i.e., SofterMax) and use it to calculate the decision boundary for unknown intent detection. Furthermore, we feed the feature representations learned by the deep neural networks into traditional novelty detection algorithm to detect unknown intents from different perspectives. Finally, we combine the methods above to perform the joint prediction. Our method classifies examples that differ from known intents as unknown and does not require any examples or prior knowledge of it. We have conducted extensive experiments on three benchmark dialogue datasets. The results show that our method can yield significant improvements compared with the state-of-the-art baselines 1 .

[1]  Chongsheng Zhang,et al.  An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme , 2018, Knowl. Based Syst..

[2]  Terrance E. Boult,et al.  Towards Open Set Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yee Whye Teh,et al.  Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Jürgen Schmidhuber,et al.  Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.

[6]  Hua Xu,et al.  Deep Unknown Intent Detection with Margin Loss , 2019, ACL.

[7]  Giuseppe De Pietro,et al.  Multilingual POS tagging by a composite deep architecture based on character-level features and on-the-fly enriched Word Embeddings , 2019, Knowl. Based Syst..

[8]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[9]  Anderson Rocha,et al.  Toward Open Set Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Chih-Li Huo,et al.  Slot-Gated Modeling for Joint Slot Filling and Intent Prediction , 2018, NAACL.

[11]  Lei Shu,et al.  DOC: Deep Open Classification of Text Documents , 2017, EMNLP.

[12]  Daniel W. Gerlich,et al.  A deep learning and novelty detection framework for rapid phenotyping in high-content screening , 2017, bioRxiv.

[13]  Yun Lei,et al.  Using Context Information for Dialog Act Classification in DNN Framework , 2017, EMNLP.

[14]  Yang Yu,et al.  Open Category Classification by Adversarial Sample Generation , 2017, IJCAI.

[15]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[16]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[17]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[18]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[19]  Franck Dernoncourt,et al.  Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks , 2016, NAACL.

[20]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[21]  Xiaogang Wang,et al.  DeepID-Net: Deformable deep convolutional neural networks for object detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[23]  Bing Liu,et al.  Breaking the Closed World Assumption in Text Classification , 2016, NAACL.

[24]  Jugal Kalita,et al.  Open Set Text Classification using Convolutional Neural Networks , 2017 .

[25]  Andreas Stolcke,et al.  Switchboard Discourse Language Modeling Project (Final Report) , 1997 .

[26]  Hongxia Jin,et al.  A Bi-Model Based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling , 2018, NAACL.

[27]  Ramón López-Cózar,et al.  Using knowledge on word-islands to improve the performance of spoken dialogue systems , 2015, Knowl. Based Syst..

[28]  Young-Bum Kim,et al.  Joint Learning of Domain Classification and Out-of-Domain Detection with Dynamic Class Weighting for Satisficing False Acceptance Rates , 2018, INTERSPEECH.

[29]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  Pavel Král,et al.  Unsupervised Dialogue Act Induction using Gaussian Mixtures , 2016, EACL.