Uncertainty-Aware Reliable Text Classification

Deep neural networks have significantly contributed to the success in predictive accuracy for classification tasks. However, they tend to make over-confident predictions in real-world settings, where domain shifting and out-of-distribution (OOD) examples exist. Most research on uncertainty estimation focuses on computer vision because it provides visual validation on uncertainty quality. However, few have been presented in the natural language process domain. Unlike Bayesian methods that indirectly infer uncertainty through weight uncertainties, current evidential uncertainty-based methods explicitly model the uncertainty of class probabilities through subjective opinions. They further consider inherent uncertainty in data with different root causes, vacuity (i.e., uncertainty due to a lack of evidence) and dissonance (i.e., uncertainty due to conflicting evidence). In our paper, we firstly apply evidential uncertainty in OOD detection for text classification tasks. We propose an inexpensive framework that adopts both auxiliary outliers and pseudo off-manifold samples to train the model with prior knowledge of a certain class, which has high vacuity for OOD samples. Extensive empirical experiments demonstrate that our model based on evidential uncertainty outperforms other counterparts for detecting OOD examples. Our approach can be easily deployed to traditional recurrent neural networks and fine-tuned pre-trained transformers.

[1]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[2]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[3]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[4]  Barbara Plank,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , 2011 .

[5]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[6]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[7]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[9]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[10]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[11]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[12]  Audun Jøsang,et al.  Subjective Logic , 2016, Artificial Intelligence: Foundations, Theory, and Algorithms.

[13]  Khalil Sima'an,et al.  Multi30K: Multilingual English-German Image Descriptions , 2016, VL@ACL.

[14]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[15]  Anima Anandkumar,et al.  Deep Active Learning for Named Entity Recognition , 2017, Rep4NLP@ACL.

[16]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[17]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[18]  Max Welling,et al.  Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[19]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[20]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[21]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[22]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[23]  Murat Sensoy,et al.  Evidential Deep Learning to Quantify Classification Uncertainty , 2018, NeurIPS.

[24]  Audun Jøsang,et al.  Uncertainty Characteristics of Subjective Opinions , 2018, 2018 21st International Conference on Information Fusion (FUSION).

[25]  Jieping Ye,et al.  Learning Adversarial Networks for Semi-Supervised Text Classification via Policy Gradient , 2018, KDD.

[26]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[27]  Dustin Tran,et al.  Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches , 2018, ICLR.

[28]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[29]  Zachary C. Lipton,et al.  Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study , 2018, EMNLP.

[30]  William Yang Wang,et al.  Quantifying Uncertainties in Natural Language Processing Tasks , 2018, AAAI.

[31]  Xuchao Zhang,et al.  Mitigating Uncertainty in Document Classification , 2019, NAACL.

[32]  Sameer Singh,et al.  Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[33]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[34]  Rick Salay,et al.  Out-of-distribution Detection in Classifiers via Generation , 2019, ArXiv.

[35]  Thomas G. Dietterich,et al.  Deep Anomaly Detection with Outlier Exposure , 2018, ICLR.

[36]  Gopinath Chennupati,et al.  On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks , 2019, NeurIPS.

[37]  Bernt Schiele,et al.  Disentangling Adversarial Robustness and Generalization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Feng Chen,et al.  Quantifying Classification Uncertainty using Regularized Evidential Neural Networks , 2019, ArXiv.

[39]  Matthias Hein,et al.  Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[41]  Vipin Kumar,et al.  Towards Robust and Discriminative Sequential Data Learning: When and How to Perform Adversarial Training? , 2019, KDD.

[42]  Chao Zhang,et al.  Calibrated Fine-Tuning for Pre-trained Language Models via Manifold Smoothing , 2020, EMNLP.

[43]  Lidia S. Chao,et al.  Uncertainty-Aware Curriculum Learning for Neural Machine Translation , 2020, ACL.

[44]  Matthias Hein,et al.  Towards neural networks that provably know when they don't know , 2020, ICLR.

[45]  Wei-Cheng Chang,et al.  Taming Pretrained Transformers for Extreme Multi-label Text Classification , 2019, KDD.

[46]  Walid Maalej,et al.  Word-Level Uncertainty Estimation for Black-Box Text Classifiers using RNNs , 2020, COLING.

[47]  Dawn Song,et al.  Pretrained Transformers Improve Out-of-Distribution Robustness , 2020, ACL.

[48]  Murat Sensoy,et al.  Uncertainty-Aware Deep Classifiers Using Generative Models , 2020, AAAI.

[49]  Out-of-Domain Detection for Natural Language Understanding in Dialog Systems , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[50]  Chang-Tien Lu,et al.  Towards More Accurate Uncertainty Estimation in Text Classification , 2020, EMNLP.

[51]  Xujiang Zhao,et al.  Multidimensional Uncertainty-Aware Evidential Neural Networks , 2020, AAAI.