Robust Full Bayesian Learning for Radial Basis Networks

We propose a hierarchical full Bayesian model for radial basis networks. This model treats the model dimension (number of neurons), model parameters, regularization parameters, and noise parameters as unknown random variables. We develop a reversible-jump Markov chain Monte Carlo (MCMC) method to perform the Bayesian computation. We find that the results obtained using this method are not only better than the ones reported previously, but also appear to be robust with respect to the prior specification. In addition, we propose a novel and computationally efficient reversible-jump MCMC simulated annealing algorithm to optimize neural networks. This algorithm enables us to maximize the joint posterior distribution of the network parameters and the number of basis function. It performs a global search in the joint space of the parameters and number of parameters, thereby surmounting the problem of local minima to a large extent. We show that by calibrating the full hierarchical Bayesian prior, we can obtain the classical Akaike information criterion, Bayesian information criterion, and minimum description length model selection criteria within a penalized likelihood framework. Finally, we present a geometric convergence theorem for the algorithm with homogeneous transition kernel and a convergence theorem for the reversible-jump MCMC simulated annealing method.

[1]  H. Akaike A new look at the statistical model identification , 1974 .

[2]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[3]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Emile H. L. Aarts,et al.  Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.

[5]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[6]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[7]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[8]  Marcus Frean,et al.  The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[9]  John C. Platt A Resource-Allocating Network for Function Interpolation , 1991, Neural Computation.

[10]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..

[11]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[12]  B. Pendleton,et al.  Hybrid Monte Carlo simulations theory and initial comparison with molecular dynamics , 1993 .

[13]  Visakan Kadirkamanathan,et al.  A Function Estimation Approach to Sequential Learning with Neural Networks , 1993, Neural Computation.

[14]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[15]  D. M. Titterington,et al.  Neural Networks: A Review from a Statistical Perspective , 1994 .

[16]  A. Gelfand,et al.  Bayesian Model Choice: Asymptotics and Exact Calculations , 1994 .

[17]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[18]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[19]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[20]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[21]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[22]  Stephen J. Roberts,et al.  Novelty, confidence and errors in connectionist systems , 1996 .

[23]  R. Kohn,et al.  Nonparametric regression using Bayesian variable selection , 1996 .

[24]  Petar M. Djuric,et al.  A model selection rule for sinusoids in white Gaussian noise , 1996, IEEE Trans. Signal Process..

[25]  David J. Hand,et al.  Construction and Assessment of Classification Rules , 1997 .

[26]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[27]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[28]  Y Lu,et al.  A Sequential Learning Scheme for Function Approximation Using Minimal Radial Basis Function Neural Networks , 1997, Neural Computation.

[29]  Alan D. Marrs An Application of Reversible-Jump MCMC to Multivariate Spherical Gaussian Mixtures , 1997, NIPS.

[30]  Dean Phillips Foster,et al.  Calibration and Empirical Bayes Variable Selection , 1997 .

[31]  David G. T. Denison,et al.  Bayesian MARS , 1998, Stat. Comput..

[32]  C. C. Homes,et al.  Bayesian Radial Basis Functions of Variable Dimension , 1998, Neural Computation.

[33]  Peter Müller,et al.  Feedforward Neural Networks for Nonparametric Regression , 1998 .

[34]  Peter Müller,et al.  Issues in Bayesian Analysis of Neural Network Models , 1998, Neural Computation.

[35]  J. M. Spyers-Ashby,et al.  A comparison of fast fourier transform (FFT) and autoregressive (AR) spectral estimation techniques for the analysis of tremor data , 1998, Journal of Neuroscience Methods.

[36]  Petar M. Djuric,et al.  Asymptotic MAP criteria for model selection , 1998, IEEE Trans. Signal Process..

[37]  Mahesan Niranjan,et al.  The EM algorithm and neural networks for nonlinear state space estimation , 1998 .

[38]  L. A. Breyerz,et al.  Convergence of Simulated Annealing Using Foster-lyapunov Criteria , 1999 .

[39]  William D. Penny,et al.  Bayesian neural networks for classification: how useful is the evidence framework? , 1999, Neural Networks.

[40]  I. Nabney Efficient training of RBF networks for classification , 1999 .

[41]  Christophe Andrieu,et al.  Sequential Bayesian Estimation And Model Selection Applied To Neural Networks , 1999 .

[42]  Christophe Andrieu,et al.  Joint Bayesian model selection and estimation of noisy sinusoids via reversible jump MCMC , 1999, IEEE Trans. Signal Process..

[43]  A. Doucet,et al.  Sequential MCMC for Bayesian model selection , 1999, Proceedings of the IEEE Signal Processing Workshop on Higher-Order Statistics. SPW-HOS '99.

[44]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[45]  Gomes de Freitas,et al.  Bayesian methods for neural networks , 2000 .

[46]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[47]  Bani K. Mallick,et al.  Bayesian wavelet networks for nonparametric regression , 2000, IEEE Trans. Neural Networks Learn. Syst..

[48]  Arnaud Doucet,et al.  Sequential Monte Carlo Methods to Train Neural Network Models , 2000, Neural Computation.

[49]  L. A. Breyer,et al.  Convergence of simulated annealing using Foster-Lyapunov criteria , 2001, Journal of Applied Probability.

[50]  Petar M. Djuric,et al.  Model selection by MCMC computation , 2001, Signal Process..