Bayesian information criteria and smoothing parameter selection in radial basis function networks

By extending Schwarz's (1978) basic idea we derive a Bayesian information criterion which enables us to evaluate models estimated by the maximum penalised likelihood method or the method of regularisation. The proposed criterion is applied to the choice of smoothing parameters and the number of basis functions in radial basis function network models. Monte Carlo experiments were conducted to examine the performance of the nonlinear modelling strategy of estimating the weight parameters by regularisation and then determining the adjusted parameters by the Bayesian information criterion. The simulation results show that our modelling procedure performs well in various situations. Copyright Biometrika Trust 2004, Oxford University Press.

[1]  Edmund Taylor Whittaker On a New Method of Graduation , 1922, Proceedings of the Edinburgh Mathematical Society.

[2]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[3]  I. Good Nonparametric roughness penalties for probability densities , 1971 .

[4]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[5]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[6]  N. Sugiura Further analysts of the data by akaike' s information criterion and the finite corrections , 1978 .

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  B. Silverman,et al.  Some Aspects of the Spline Smoothing Approach to Non‐Parametric Regression Curve Fitting , 1985 .

[9]  B. Yandell,et al.  Semi-Parametric Generalized Linear Models. , 1985 .

[10]  A. Davison Approximate predictive likelihood , 1986 .

[11]  B. Yandell,et al.  Automatic Smoothing of Regression Functions in Generalized Linear Models , 1986 .

[12]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[13]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[14]  L. Tierney,et al.  Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .

[15]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[16]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[17]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[18]  L. Tierney,et al.  Fully Exponential Laplace Approximations to Expectations and Variances of Nonpositive Functions , 1989 .

[19]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[20]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[21]  G. Wahba Spline models for observational data , 1990 .

[22]  L. Tierney,et al.  The validity of posterior expansions based on Laplace''s method , 1990 .

[23]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[24]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[25]  J. H. Schuenemeyer,et al.  Generalized Linear Models (2nd ed.) , 1992 .

[26]  A. Barron,et al.  Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .

[27]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[28]  John A. Nelder,et al.  Generalized linear models. 2nd ed. , 1993 .

[29]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[30]  Shun-ichi Amari,et al.  Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.

[31]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[32]  L. Wasserman,et al.  A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz Criterion , 1995 .

[33]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[34]  A. O'Hagan,et al.  Fractional Bayes factors for model comparison , 1995 .

[35]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[36]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[37]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[38]  G. Kitagawa,et al.  Generalised information criteria in model selection , 1996 .

[39]  Joseph E. Cavanaugh,et al.  Regression and time series model selection using variants of the schwarz information criterion , 1997 .

[40]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[41]  Clifford M. Hurvich,et al.  Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion , 1998 .

[42]  C. C. Homes,et al.  Bayesian Radial Basis Functions of Variable Dimension , 1998, Neural Computation.

[43]  Peter Müller,et al.  Feedforward Neural Networks for Nonparametric Regression , 1998 .

[44]  D. Pauler The Schwarz criterion and related methods for normal linear models , 1998 .

[45]  Seiya Imoto,et al.  Estimating Nonlinear Regression Models based on Radial Basis Function Networks , 2001 .

[46]  A. Lanterman Schwarz, Wallace, and Rissanen: Intertwining Themes in Theories of Model Selection , 2001 .

[47]  Nando de Freitas,et al.  Robust Full Bayesian Learning for Radial Basis Networks , 2001, Neural Computation.

[48]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[49]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[50]  Sadanori Konishi,et al.  Nonlinear regression modeling via regularized wavelets and smoothing parameter selection , 2006 .