DEUP: Direct Epistemic Uncertainty Prediction

Epistemic uncertainty is the part of out-of-sample prediction error due to the lack of knowledge of the learner. Whereas previous work was focusing on model variance, we propose a principled approach for directly estimating epistemic uncertainty by learning to predict generalization error and subtracting an estimate of aleatoric uncertainty, i.e., intrinsic unpredictability. This estimator of epistemic uncertainty includes the effect of model bias and can be applied in non-stationary learning environments arising in active learning or reinforcement learning. In addition to demonstrating these properties of Direct Epistemic Uncertainty Prediction (DEUP), we illustrate its advantage against existing methods for uncertainty estimation on downstream tasks including sequential model optimization and reinforcement learning. We also evaluate the quality of uncertainty estimates from DEUP for probabilistic classification of images and for estimating uncertainty about synergistic drug combinations.

[1]  Jasper Snoek,et al.  Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors , 2020, ICML.

[2]  Peter I. Frazier,et al.  A Tutorial on Bayesian Optimization , 2018, ArXiv.

[3]  M. Welling,et al.  A New Perspective on Uncertainty Quantification of Deep Ensembles , 2020 .

[4]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[5]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[6]  Jing Tang,et al.  DrugComb: an integrative cancer drug combination data portal , 2019, Nucleic Acids Res..

[7]  Daniel R. Jiang,et al.  BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization , 2020, NeurIPS.

[8]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[9]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[10]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[11]  Sébastien Destercke,et al.  Epistemic Uncertainty Sampling , 2019, DS.

[12]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[13]  Joost R. van Amersfoort,et al.  Simple and Scalable Epistemic Uncertainty Estimation Using a Single Deep Deterministic Neural Network , 2020, ICML 2020.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  T. Cihlar,et al.  Current status and prospects of HIV treatment. , 2016, Current opinion in virology.

[16]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[17]  Tor Lattimore,et al.  Behaviour Suite for Reinforcement Learning , 2019, ICLR.

[18]  David Lopez-Paz,et al.  Single-Model Uncertainties for Deep Learning , 2018, NeurIPS.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Dustin Tran,et al.  Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness , 2020, NeurIPS.

[21]  Andrew Gordon Wilson,et al.  A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.

[22]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[23]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[24]  Angela N. Brooks,et al.  A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles , 2017, Cell.

[25]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[26]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[27]  Alexander Golbraikh,et al.  A Novel Automated Lazy Learning QSAR (ALL-QSAR) Approach: Method Development, Applications, and Virtual Screening of Chemical Databases Using Validated ALL-QSAR Models , 2006, J. Chem. Inf. Model..

[28]  Tom Rainforth,et al.  On Statistical Bias In Active Learning: How and When To Fix It , 2021, ICLR.

[29]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[30]  Herman Yeger,et al.  Combination therapy in combating cancer , 2017, Oncotarget.

[31]  Sandra Hirche,et al.  Real-time Uncertainty Decomposition for Online Learning Control , 2020, ArXiv.

[32]  Jos'e Miguel Hern'andez-Lobato,et al.  Depth Uncertainty in Neural Networks , 2020, NeurIPS.

[33]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[34]  Jing Tang,et al.  Drug combination sensitivity scoring facilitates the discovery of synergistic and efficacious drug combinations in cancer , 2019, PLoS Comput. Biol..

[35]  A. Kiureghian,et al.  Aleatory or epistemic? Does it matter? , 2009 .

[36]  Filip De Turck,et al.  #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[37]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[38]  Finale Doshi-Velez,et al.  Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning , 2017, ICML.

[39]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[40]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[41]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Philip S. Yu,et al.  Active Learning: A Survey , 2014, Data Classification: Algorithms and Applications.

[44]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[45]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[46]  D. Rus,et al.  Deep Evidential Regression , 2019, NeurIPS.

[47]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[48]  Ortwin Renn,et al.  The Social Amplification of Risk: A Conceptual Framework , 1988 .

[49]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[50]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[51]  In So Kweon,et al.  Learning Loss for Active Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Sebastian Tschiatschek,et al.  Successor Uncertainties: exploration and uncertainty in temporal difference learning , 2018, NeurIPS.

[53]  Dustin Tran,et al.  BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning , 2020, ICLR.

[54]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[55]  Eyke Hüllermeier,et al.  Aleatoric and Epistemic Uncertainty with Random Forests , 2020, IDA.

[56]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[57]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[58]  Andrew Gordon Wilson,et al.  Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.