Toward a Kernel-Based Uncertainty Decomposition Framework for Data and Models

Abstract This letter introduces a new framework for quantifying predictive uncertainty for both data and models that relies on projecting the data into a gaussian reproducing kernel Hilbert space (RKHS) and transforming the data probability density function (PDF) in a way that quantifies the flow of its gradient as a topological potential field (quantified at all points in the sample space). This enables the decomposition of the PDF gradient flow by formulating it as a moment decomposition problem using operators from quantum physics, specifically Schrödinger's formulation. We experimentally show that the higher-order moments systematically cluster the different tail regions of the PDF, thereby providing unprecedented discriminative resolution of data regions having high epistemic uncertainty. In essence, this approach decomposes local realizations of the data PDF in terms of uncertainty moments. We apply this framework as a surrogate tool for predictive uncertainty quantification of point-prediction neural network models, overcoming various limitations of conventional Bayesian-based uncertainty quantification methods. Experimental comparisons with some established methods illustrate performance advantages that our framework exhibits.

[1]  Michael I. Jordan,et al.  Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[4]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[5]  C. R. Rao,et al.  Some statistical methods for comparison of growth curves. , 1958 .

[6]  Pierre Flener Introduction to Uncertainty Quantification (UQ) , 2015 .

[7]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[8]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9]  C. Hermite Œuvres de Charles Hermite: Sur un nouveau développement en série des fonctions , 2009 .

[10]  Mohamed Zaki,et al.  Uncertainty in Neural Networks: Bayesian Ensembling , 2018, ArXiv.

[11]  U. Grenander Stochastic processes and statistical inference , 1950 .

[12]  Kwangwon Ahn,et al.  Modeling stock return distributions with a quantum harmonic oscillator , 2017 .

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[15]  Henri Theil,et al.  Simultaneous equation estimation based on maximum entropy moments , 1980 .

[16]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[17]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[18]  Weifeng Liu,et al.  Correntropy: Properties and Applications in Non-Gaussian Signal Processing , 2007, IEEE Transactions on Signal Processing.

[19]  V. P. Belavkin,et al.  A new wave equation for a continuous nondemolition measurement , 1989 .

[20]  Ralph C. Smith,et al.  Uncertainty Quantification: Theory, Implementation, and Applications , 2013 .

[21]  Jian-Wei Zhang,et al.  Quantum Brownian motion model for the stock market , 2014, 1405.3512.

[22]  Leslie Greengard,et al.  The Fast Gauss Transform , 1991, SIAM J. Sci. Comput..

[23]  B. Roy Frieden,et al.  Science from Fisher Information: A Unification , 2004 .

[24]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[25]  Deniz Erdogmus,et al.  Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.

[26]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[27]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[28]  Robert Tibshirani,et al.  A Comparison of Some Error Estimates for Neural Network Models , 1996, Neural Computation.

[29]  Gomes de Freitas,et al.  Bayesian methods for neural networks , 2000 .

[30]  G. Crooks On Measures of Entropy and Information , 2015 .

[31]  Runze Li,et al.  Design and Modeling for Computer Experiments , 2005 .

[32]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[33]  Stephen Stigler P.S. Laplace, Théorie analytique des probabilités, first edition (1812); Essai philosophique sur les probabilités, first edition (1814) , 2005 .

[34]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[35]  B. Frieden Science from Fisher Information , 2004 .

[36]  T. Sullivan Introduction to Uncertainty Quantification , 2015 .

[37]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[38]  Badong Chen,et al.  Quantized Kernel Least Mean Square Algorithm , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[39]  N. Cox Statistical Models in Engineering , 1970 .

[40]  Finale Doshi-Velez,et al.  Latent Projection BNNs: Avoiding weight-space pathologies by learning latent representations of neural network weights , 2018, ArXiv.

[41]  Vijay Kumar,et al.  Learning Decentralized Controllers for Robot Swarms with Graph Neural Networks , 2019, CoRL.

[42]  D. A. Edwards The mathematical foundations of quantum mechanics , 1979, Synthese.

[43]  Alex Lamb,et al.  Deep Learning for Classical Japanese Literature , 2018, ArXiv.

[44]  S. Bergman The kernel function and conformal mapping , 1950 .

[45]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[46]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[47]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[48]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[49]  Andy J. Keane,et al.  Engineering Design via Surrogate Modelling - A Practical Guide , 2008 .

[50]  Weifeng Liu,et al.  Kernel Adaptive Filtering: A Comprehensive Introduction , 2010 .

[51]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[52]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[53]  J. Nagel Bayesian techniques for inverse uncertainty quantification , 2017 .

[54]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[55]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[56]  Jose C. Principe,et al.  Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.

[57]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[58]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[59]  B Roy Frieden,et al.  Quantifying system order for full and partial coarse graining. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[60]  E. Parzen STATISTICAL INFERENCE ON TIME SERIES BY RKHS METHODS. , 1970 .

[61]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  K. Karhunen Zur Spektraltheorie stochastischer prozesse , 1946 .

[63]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[64]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.