论文信息 - Bayesian nonlinear model selection and neural networks: a conjugate prior approach

Bayesian nonlinear model selection and neural networks: a conjugate prior approach

In order to select the best predictive neural-network architecture in a set of several candidate networks, we propose a general Bayesian nonlinear regression model comparison procedure, based on the maximization of an expected utility criterion. This criterion selects the model under which the training set achieves the highest level of internal consistency, through the predictive probability distribution of each model. The density of this distribution is computed as the model posterior predictive density and is asymptotically approximated from the assumed Gaussian likelihood of the data set and the related conjugate prior density of the parameters. The use of such a conjugate prior allows the analytic calculation of the parameter posterior and predictive posterior densities, in an empirical-Bayes-like approach. This Bayesian selection procedure allows us to compare general nonlinear regression models and in particular feedforward neural networks, in addition to embedded models as usual with asymptotic comparison tests.

Pascal Neveu | Jean-Pierre Vila | Vérène Wagner

[1] Robert Hecht-Nielsen,et al. On the Geometry of Feedforward Neural Network Error Surfaces , 1993, Neural Computation.

[2] Dennis V. Lindley,et al. Empirical Bayes Methods , 1974 .

[3] Aaas News,et al. Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[4] Christian P. Robert,et al. The Bayesian choice , 1994 .

[5] D. Mackay,et al. HYPERPARAMETERS: OPTIMIZE, OR INTEGRATE OUT? , 1996 .

[6] H. Raiffa,et al. Applied Statistical Decision Theory. , 1961 .

[7] Gregory J. Wolff,et al. Optimal Brain Surgeon: Extensions and performance comparisons , 1993, NIPS 1993.

[8] Christian Lebiere,et al. The Cascade-Correlation Learning Architecture , 1989, NIPS.

[9] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[10] Wray L. Buntine,et al. Bayesian Back-Propagation , 1991, Complex Syst..

[11] S. Y. Kung,et al. An algebraic projection analysis for optimal hidden units size and learning rates in back-propagation learning , 1988, IEEE 1988 International Conference on Neural Networks.