Predictive Complexity Priors

Specifying a Bayesian prior is notoriously difficult for complex models such as neural networks. Reasoning about parameters is made challenging by the high-dimensionality and over-parameterization of the space. Priors that seem benign and uninformative can have unintuitive and detrimental effects on a model's predictions. For this reason, we propose predictive complexity priors: a functional prior that is defined by comparing the model's predictions to those of a reference function. Although originally defined on the model outputs, we transfer the prior to the model parameters via a change of variables. The traditional Bayesian workflow can then proceed as usual. We apply our predictive complexity prior to modern machine learning tasks such as reasoning over neural network depth and sharing of statistical strength for few-shot learning.

[1]  George E. P. Box,et al.  Sampling and Bayes' inference in scientific modelling and robustness , 1980 .

[2]  Andrew Gelman,et al.  The Prior Can Often Only Be Understood in the Context of the Likelihood , 2017, Entropy.

[3]  Padhraic Smyth,et al.  Dropout as a Structured Shrinkage Prior , 2018, ICML.

[4]  Pierre Baldi,et al.  Understanding Dropout , 2013, NIPS.

[5]  Haavard Rue,et al.  Constructing Priors that Penalize the Complexity of Gaussian Random Fields , 2015, Journal of the American Statistical Association.

[6]  Sebastian Nowozin,et al.  How Good is the Bayes Posterior in Deep Neural Networks Really? , 2020, ICML.

[7]  Justin Bayer,et al.  Bayesian Learning of Neural Network Architectures , 2019, AISTATS.

[8]  James G. Scott,et al.  Handling Sparsity via the Horseshoe , 2009, AISTATS.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  F. Caramelo,et al.  Using Resistin, glucose, age and BMI to predict the presence of breast cancer , 2018, BMC Cancer.

[11]  Max Welling,et al.  The Deep Weight Prior , 2018, ICLR.

[12]  Massimo Ventrucci,et al.  PC priors for residual correlation parameters in one-factor mixed models , 2019, Statistical Methods & Applications.

[13]  Matthew W. Hoffman,et al.  Modular Meta-Learning with Shrinkage , 2020, NeurIPS.

[14]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[15]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[16]  H. Rue,et al.  Penalized complexity priors for degrees of freedom in Bayesian P-splines , 2015, 1511.05748.

[17]  G. Casella An Introduction to Empirical Bayes Data Analysis , 1985 .

[18]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[19]  José Miguel Hernández-Lobato,et al.  Variational Implicit Processes , 2018, ICML.

[20]  Alexandre Lacoste,et al.  TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[21]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[22]  Klamer Schutte,et al.  The Functional Neural Process , 2019, NeurIPS.

[23]  Christian P. Robert,et al.  The Bayesian choice , 1994 .

[24]  Daniel Flam-Shepherd Mapping Gaussian Process Priors to Bayesian Neural Networks , 2017 .

[25]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Yang Song,et al.  MintNet: Building Invertible Neural Networks with Masked Convolutions , 2019, NeurIPS.

[27]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[28]  Aki Vehtari,et al.  On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior , 2016, AISTATS.

[29]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[30]  Andrew Gordon Wilson,et al.  Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning , 2019, ICLR.

[31]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[32]  Minsuk Shin,et al.  Scalable Bayesian Variable Selection Using Nonlocal Prior Densities in Ultrahigh-dimensional Settings. , 2015, Statistica Sinica.

[33]  Sertac Karaman,et al.  Invertibility of Convolutional Generative Networks from Partial Measurements , 2018, NeurIPS.

[34]  Nadja Klein,et al.  Scale-Dependent Priors for Variance Parameters in Structured Additive Distributional Regression , 2016 .

[35]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[36]  Christoph H. Lampert,et al.  Functional vs. parametric equivalence of ReLU networks , 2020, ICLR.

[37]  P. Gustafson,et al.  Conservative prior distributions for variance parameters in hierarchical models , 2006 .

[38]  Haavard Rue,et al.  Penalised Complexity Priors for Stationary Autoregressive Processes , 2016, 1608.08941.

[39]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[40]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[41]  Sebastian Nowozin,et al.  Deterministic Variational Inference for Robust Bayesian Neural Networks , 2018, ICLR.

[42]  Thiago G. Martins,et al.  Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors , 2014, 1403.4630.

[43]  Padhraic Smyth,et al.  Learning Priors for Invariance , 2018, AISTATS.

[44]  Guodong Zhang,et al.  Functional Variational Bayesian Neural Networks , 2019, ICLR.

[45]  Dustin Tran,et al.  Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches , 2018, ICLR.

[46]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004 .

[47]  David J. C. MacKay,et al.  BAYESIAN NON-LINEAR MODELING FOR THE PREDICTION COMPETITION , 1996 .

[48]  J. Bernardo Reference Posterior Distributions for Bayesian Inference , 1979 .

[49]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[50]  Aki Vehtari,et al.  Sparsity information and regularization in the horseshoe and other shrinkage priors , 2017, 1707.01694.

[51]  Nal Kalchbrenner,et al.  Bayesian Inference for Large Scale Image Classification , 2019, ArXiv.

[52]  Dustin Tran,et al.  Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..

[53]  Max Welling,et al.  Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets , 2014, ICML.

[54]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[55]  Jos'e Miguel Hern'andez-Lobato,et al.  Variational Depth Search in ResNets , 2020, ArXiv.

[56]  Tim Pearce,et al.  Expressive Priors in Bayesian Neural Networks: Kernel Combinations and Periodic Functions , 2019, UAI.

[57]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  V. Johnson,et al.  On the use of non‐local prior densities in Bayesian hypothesis tests , 2010 .