A Noninformative Prior for Neural Networks

While many implementations of Bayesian neural networks use large, complex hierarchical priors, in much of modern Bayesian statistics, noninformative (flat) priors are very common. This paper introduces a noninformative prior for feed-forward neural networks, describing several theoretical and practical advantages of this approach. In particular, a simpler prior allows for a simpler Markov chain Monte Carlo algorithm. Details of MCMC implementation are included.

[1]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[2]  Herbert K. H. Lee,et al.  Model Selection for Neural Network Classification , 2001, J. Classif..

[3]  L. Wasserman,et al.  Asymptotic inference for mixture models by using data‐dependent priors , 2000 .

[4]  C. Robert,et al.  Estimation of Finite Mixture Distributions Through Bayesian Sampling , 1994 .

[5]  Herbert K. H. Lee Consistency of posterior distributions for neural networks , 2000, Neural Networks.

[6]  L. Wasserman,et al.  The Selection of Prior Distributions by Formal Rules , 1996 .

[7]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[8]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[9]  Herbert K. H. Lee A Framework for Nonparametric Regression Using Neural Networks , 2022 .

[10]  Eric R. Ziegel,et al.  Practical Nonparametric and Semiparametric Bayesian Statistics , 1998, Technometrics.

[11]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[12]  B. Silverman,et al.  Some Aspects of the Spline Smoothing Approach to Non‐Parametric Regression Curve Fitting , 1985 .

[13]  J. Berger,et al.  Objective Bayesian Analysis of Spatially Correlated Data , 2001 .

[14]  Peter Müller,et al.  Feedforward Neural Networks for Nonparametric Regression , 1998 .

[15]  Peter Müller,et al.  Issues in Bayesian Analysis of Neural Network Models , 1998, Neural Computation.

[16]  V. D. Oliveira,et al.  Objective Bayesian Analysis of SpatiallyCorrelated , 2000 .

[17]  Nando de Freitas,et al.  Robust Full Bayesian Learning for Radial Basis Networks , 2001, Neural Computation.

[18]  R. Pace,et al.  Sparse spatial autoregressions , 1997 .

[19]  Herbert K. H. Lee Difficulties in Estimating the Normalizing Constant of the Posterior for a Neural Network , 2002 .

[20]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[21]  Terrence L. Fine,et al.  Feedforward Neural Network Methodology , 1999, Information Science and Statistics.

[22]  Christophe Andrieu,et al.  Robust Full Bayesian Learning for Neural Networks , 1999 .

[23]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[24]  H. Jeffreys,et al.  Theory of probability , 1896 .

[25]  D. Mackay,et al.  Bayesian methods for adaptive models , 1992 .

[26]  J. A. Hartigan,et al.  Invariant Prior Distributions , 1964 .

[27]  Herbert K. H. Lee,et al.  Model selection and model averaging for neural networks , 1998 .

[28]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[29]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[30]  H. Chipman,et al.  Bayesian CART Model Search , 1998 .

[31]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[32]  D. Madigan,et al.  Bayesian Model Averaging for Linear Regression Models , 1997 .