Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift

Modern neural networks have proven to be powerful function approximators, providing state-of-the-art performance in a multitude of applications. They however fall short in their ability to quantify confidence in their predictions - this is crucial in high-stakes applications that involve critical decision-making. Bayesian neural networks (BNNs) aim at solving this problem by placing a prior distribution over the network's parameters, thereby inducing a posterior distribution that encapsulates predictive uncertainty. While existing variants of BNNs based on Monte Carlo dropout produce reliable (albeit approximate) uncertainty estimates over in-distribution data, they tend to exhibit over-confidence in predictions made on target data whose feature distribution differs from the training data, i.e., the covariate shift setup. In this paper, we develop an approximate Bayesian inference scheme based on posterior regularisation, wherein unlabelled target data are used as "pseudo-labels" of model confidence that are used to regularise the model's loss on labelled source data. We show that this approach significantly improves the accuracy of uncertainty quantification on covariate-shifted data sets, with minimal modification to the underlying model architecture. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.

[1]  M. Cooperberg,et al.  Impact of age at diagnosis on prostate cancer treatment and survival. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[2]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[3]  Tolga Tasdizen,et al.  Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning , 2016, NIPS.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Viv Bewick,et al.  Statistics review 13: Receiver operating characteristic curves , 2004, Critical care.

[6]  S Vijayakumar,et al.  The impact of age and comorbidity on survival outcomes and treatment patterns in prostate cancer , 2005, Prostate Cancer and Prostatic Diseases.

[7]  Alex Kendall,et al.  Concrete Dropout , 2017, NIPS.

[8]  Max Welling,et al.  Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[9]  Gavriel Salomon,et al.  T RANSFER OF LEARNING , 1992 .

[10]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[11]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[12]  P. Babb,et al.  Patterns and trends in prostate cancer incidence, survival, prevalence and mortality. Part I: international comparisons , 2002, BJU international.

[13]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[14]  Ning Chen,et al.  Bayesian inference with posterior regularization and applications to infinite latent SVMs , 2012, J. Mach. Learn. Res..

[15]  Adam P Dicker,et al.  Comparative analysis of prostate‐specific antigen free survival outcomes for patients with low, intermediate and high risk prostate cancer treatment by radical therapy. Results from the Prostate Cancer Results Study Group , 2012, BJU international.

[16]  Stefano Ermon,et al.  Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance , 2018, NeurIPS.

[17]  Jasper Snoek,et al.  Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[20]  Mark J. F. Gales,et al.  Predictive Uncertainty Estimation via Prior Networks , 2018, NeurIPS.

[21]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[22]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[23]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[24]  Wouter M. Kouw,et al.  A Review of Domain Adaptation without Target Labels , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Edward R. Dougherty,et al.  Optimal Bayesian Transfer Learning , 2018, IEEE Transactions on Signal Processing.

[26]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[27]  Thomas Wiegel,et al.  EAU guidelines on prostate cancer. Part 1: screening, diagnosis, and treatment of clinically localised disease. , 2011, European urology.

[28]  Masashi Sugiyama,et al.  Mixture Regression for Covariate Shift , 2006, NIPS.

[29]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[30]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[31]  Freddie Laker,et al.  Prostate cancer in the UK , 1997, Journal of the Royal Society of Health.

[32]  P. Ghadjar,et al.  Comparative analysis of prostate‐specific antigen free survival outcomes for patients with low, intermediate and high risk prostate cancer treatment by radical therapy. Results from the Prostate Cancer Results Study Group , 2012, BJU international.

[33]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[34]  Rajat Raina,et al.  Constructing informative priors using transfer learning , 2006, ICML.

[35]  Bernt Schiele,et al.  Transfer Learning in a Transductive Setting , 2013, NIPS.

[36]  S. Devesa,et al.  International trends and patterns of prostate cancer incidence and mortality , 2000, International journal of cancer.

[37]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[38]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.