Towards More Accurate Uncertainty Estimation in Text Classification

The uncertainty measurement of classified results is especially important in areas requiring limited human resources for higher accuracy. For instance, data-driven algorithms diagnosing diseases need accurate uncertainty score to decide whether additional but limited quantity of experts are needed for rectification. However, few uncertainty models focus on improving the performance of text classification where human resources are involved. To achieve this, we aim at generating accurate uncertainty score by improving the confidence of winning scores. Thus, a model called MSD, which includes three independent components as “mix-up”, “self-ensembling”, “distinctiveness score”, is proposed to improve the accuracy of uncertainty score by reducing the effect of overconfidence of winning score and considering the impact of different categories of uncertainty simultaneously. MSD can be applied with different Deep Neural Networks. Extensive experiments with ablation setting are conducted on four real-world datasets, on which, competitive results are obtained.

[1]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[2]  Andrey Malinin,et al.  Reverse KL-Divergence Training of Prior Networks: Improved Uncertainty and Adversarial Robustness , 2019, NeurIPS.

[3]  Aaron Klein,et al.  Learning Curve Prediction with Bayesian Neural Networks , 2016, ICLR.

[4]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[5]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[6]  Jeremiah Liu,et al.  Accurate Uncertainty Estimation and Decomposition in Ensemble Learning , 2019, NeurIPS.

[7]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[8]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[9]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[11]  Yaniv Gurwicz,et al.  Modeling Uncertainty by Learning a Hierarchy of Deep Neural Connections , 2019, NeurIPS.

[12]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[13]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[14]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[15]  Gopinath Chennupati,et al.  On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks , 2019, NeurIPS.

[16]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[17]  Dustin Tran,et al.  Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches , 2018, ICLR.

[18]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19]  Xuchao Zhang,et al.  Mitigating Uncertainty in Document Classification , 2019, NAACL.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[22]  Kaiming He,et al.  Exploring Randomly Wired Neural Networks for Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Huanbo Luan,et al.  Improving Back-Translation with Uncertainty-based Confidence Estimation , 2019, EMNLP.

[24]  Max Welling,et al.  Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[25]  Mirella Lapata,et al.  Confidence Modeling for Neural Semantic Parsing , 2018, ACL.

[26]  Tengyu Ma,et al.  Verified Uncertainty Calibration , 2019, NeurIPS.

[27]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[28]  Il-Chul Moon,et al.  Adversarial Dropout for Supervised and Semi-supervised Learning , 2017, AAAI.

[29]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[30]  Marcin Andrychowicz,et al.  Parameter Space Noise for Exploration , 2017, ICLR.

[31]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[32]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[33]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[34]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[35]  Jasper Snoek,et al.  Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[36]  Timothy Baldwin,et al.  Modelling Uncertainty in Collaborative Document Quality Assessment , 2019, EMNLP.

[37]  Mohammed Jabreel,et al.  Target-Dependent Sentiment Analysis of Tweets Using Bidirectional Gated Recurrent Neural Networks , 2018 .

[38]  William Yang Wang,et al.  Quantifying Uncertainties in Natural Language Processing Tasks , 2018, AAAI.