Towards Expressive Priors for Bayesian Neural Networks: Poisson Process Radial Basis Function Networks

While Bayesian neural networks have many appealing characteristics, current priors do not easily allow users to specify basic properties such as expected lengthscale or amplitude variance. In this work, we introduce Poisson Process Radial Basis Function Networks, a novel prior that is able to encode amplitude stationarity and input-dependent lengthscale. We prove that our novel formulation allows for a decoupled specification of these properties, and that the estimated regression function is consistent as the number of observations tends to infinity. We demonstrate its behavior on synthetic and real examples.

[1]  W. Wong,et al.  Probability inequalities for likelihood ratios and convergence rates of sieve MLEs , 1995 .

[2]  Nando de Freitas,et al.  Robust Full Bayesian Learning for Radial Basis Networks , 2001, Neural Computation.

[3]  Anders Öberg,et al.  Algorithms for approximation of invariant measures for IFS , 2005 .

[4]  Herbert K. H. Lee Consistency of posterior distributions for neural networks , 2000, Neural Networks.

[5]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[6]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[7]  Herbert K. H. Lee,et al.  Bayesian nonparametrics via neural networks , 2004, ASA-SIAM series on statistics and applied probability.

[8]  M. R. Leadbetter Poisson Processes , 2011, International Encyclopedia of Statistical Science.

[9]  Cem Anil,et al.  Sorting out Lipschitz function approximation , 2018, ICML.

[10]  Zheng Rong Yang,et al.  Bayesian Radial Basis Function Neural Network , 2005, IDEAL.

[11]  Christopher K. I. Williams Computing with Infinite Networks , 1996, NIPS.

[12]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[13]  Mark Robinson,et al.  Priors for Bayesian Neural Networks , 2001 .

[14]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[15]  J. Møller,et al.  Log Gaussian Cox Processes , 1998 .

[16]  C. C. Homes,et al.  Bayesian Radial Basis Functions of Variable Dimension , 1998, Neural Computation.

[17]  Peter Müller,et al.  Issues in Bayesian Analysis of Neural Network Models , 1998, Neural Computation.

[18]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[19]  Ryan P. Adams,et al.  Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities , 2009, ICML '09.

[20]  Martin D. Buhmann,et al.  Radial Basis Functions , 2021, Encyclopedia of Mathematical Geosciences.

[21]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[22]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[23]  D. Freedman On the Asymptotic Behavior of Bayes' Estimates in the Discrete Case , 1963 .

[24]  Sumio Watanabe,et al.  Almost All Learning Machines are Singular , 2007, 2007 IEEE Symposium on Foundations of Computational Intelligence.

[25]  Andrew Gordon Wilson,et al.  Subspace Inference for Bayesian Deep Learning , 2019, UAI.

[26]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[27]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[28]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[29]  Jeffrey W. Miller A detailed treatment of Doob's theorem , 2018, 1801.03122.

[30]  Ryan P. Adams,et al.  Gaussian process product models for nonparametric nonstationarity , 2008, ICML '08.

[31]  Daniel Flam-Shepherd Mapping Gaussian Process Priors to Bayesian Neural Networks , 2017 .

[32]  Herbert K. H. Lee,et al.  A Noninformative Prior for Neural Networks , 2004, Machine Learning.

[33]  P. Gänssler Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[34]  Yee Whye Teh,et al.  Neural Processes , 2018, ArXiv.

[35]  R.P. Lippmann,et al.  Pattern classification using neural networks , 1989, IEEE Communications Magazine.

[36]  Radford M. Neal Priors for Infinite Networks , 1996 .

[37]  Andrej Blejec,et al.  distribution of the ratio of jointly normal variables , 2004, Advances in Methodology and Statistics.

[38]  Juho Rousu,et al.  Non-Stationary Gaussian Process Regression with Hamiltonian Monte Carlo , 2015, AISTATS.

[39]  Guodong Zhang,et al.  Functional Variational Bayesian Neural Networks , 2019, ICLR.

[40]  D. Mackay,et al.  Bayesian methods for adaptive models , 1992 .

[41]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[42]  David Barber,et al.  Radial Basis Functions: A Bayesian Treatment , 1997, NIPS.

[43]  Sung-Bae Cho,et al.  Radial basis function neural networks: a topical state-of-the-art survey , 2016, Open Comput. Sci..