Sampling-free Uncertainty Estimation in Gated Recurrent Units with Applications to Normative Modeling in Neuroimaging

There has recently been a concerted effort to derive mechanisms in vision and machine learning systems to offer uncertainty estimates of the predictions they make. Clearly, there are benefits to a system that is not only accurate but also has a sense for when it is not. Existing proposals center around Bayesian interpretations of modern deep architectures - these are effective but can often be computationally demanding. We show how classical ideas in the literature on exponential families on probabilistic networks provide an excellent starting point to derive uncertainty estimates in Gated Recurrent Units (GRU). Our proposal directly quantifies uncertainty deterministically, without the need for costly sampling-based estimation. We show that while uncertainty is quite useful by itself in computer vision and machine learning, we also demonstrate that it can play a key role in enabling statistical analysis with deep networks in neuroimaging studies with normative modeling methods. To our knowledge, this is the first result describing sampling-free uncertainty estimation for powerful sequential models such as GRUs.

[1]  Stefanos D. Kollias,et al.  Deep Bayesian Uncertainty Estimation for Adaptation and Self-Annotation of Food Packaging Images , 2018, ArXiv.

[2]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[3]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[4]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[5]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[6]  Olivier Salvado,et al.  MRI-guided prostate radiation therapy planning: Investigation of dosimetric accuracy of MRI-based dose planning. , 2011, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[7]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[8]  Ullrich Köthe,et al.  Analyzing Inverse Problems with Invertible Neural Networks , 2018, ICLR.

[9]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[10]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[11]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[12]  Oriol Vinyals,et al.  Bayesian Recurrent Neural Networks , 2017, ArXiv.

[13]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[14]  Volker Tresp,et al.  Predicting Clinical Events by Combining Static and Dynamic Information Using Recurrent Neural Networks , 2016, 2016 IEEE International Conference on Healthcare Informatics (ICHI).

[15]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[16]  Peter Cheeseman,et al.  Bayesian Methods for Adaptive Models , 2011 .

[17]  Lenz Belzner,et al.  Uncertainty-Based Out-of-Distribution Detection in Deep Reinforcement Learning , 2019, ArXiv.

[18]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[19]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[20]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[21]  Sterling C. Johnson,et al.  Cerebrospinal fluid biomarkers of neurofibrillary tangles and synaptic dysfunction are associated with longitudinal decline in white matter connectivity: A multi-resolution graph analysis , 2018, NeuroImage: Clinical.

[22]  Sterling C. Johnson,et al.  Amyloid burden and neural function in people at risk for Alzheimer's Disease , 2014, Neurobiology of Aging.

[23]  Sterling C. Johnson,et al.  Associations Between Positron Emission Tomography Amyloid Pathology and Diffusion Tensor Imaging Brain Connectivity in Pre-Clinical Alzheimer's Disease , 2019, Brain Connect..

[24]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[25]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[26]  I. Rezek,et al.  Understanding Heterogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control Studies , 2016, Biological Psychiatry.

[27]  Karol Gregor,et al.  Temporal Difference Variational Auto-Encoder , 2018, ICLR.

[28]  P. Sachdev,et al.  Diffusion tensor imaging in mild cognitive impairment and Alzheimer's disease: a review , 2008, Current opinion in neurology.

[29]  Carl E. Rasmussen,et al.  Healing the relevance vector machine through augmentation , 2005, ICML.

[30]  M. Greicius,et al.  Resting-state functional connectivity reflects structural connectivity in the default mode network. , 2009, Cerebral cortex.

[31]  J. J. Ryan,et al.  Rey Auditory-Verbal Learning Test performance of patients with and without memory impairment. , 1984, Journal of clinical psychology.

[32]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[33]  Gerard R. Ridgway,et al.  Individualized Gaussian process-based prediction and detection of local and global gray matter abnormalities in elderly subjects , 2014, NeuroImage.

[34]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[35]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[36]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[37]  Moo K. Chung,et al.  Multi-resolution statistical analysis of brain connectivity graphs in preclinical Alzheimer's disease , 2015, NeuroImage.

[38]  David M. Blei,et al.  Deep Exponential Families , 2014, AISTATS.

[39]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[40]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[41]  Anna Varentsova,et al.  Development of a high angular resolution diffusion imaging human brain template , 2014, NeuroImage.

[42]  Ole Winther,et al.  Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[43]  Giovanni Montana,et al.  Longitudinal detection of radiological abnormalities with time-modulated LSTM , 2018, DLMIA/ML-CDS@MICCAI.

[44]  Moncef Gabbouj,et al.  Video Ladder Networks , 2016, ArXiv.

[45]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[46]  Hong Yu,et al.  Bidirectional RNN for Medical Event Detection in Electronic Health Records , 2016, NAACL.

[47]  A. Kiureghian,et al.  Aleatory or epistemic? Does it matter? , 2009 .

[48]  Amitabha Mukerjee,et al.  Contextual RNN-GANs for Abstract Reasoning Diagram Generation , 2016, AAAI.

[49]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.