Computation with Infinite Neural Networks

For neural networks with a wide class of weight priors, it can be shown that in the limit of an infinite number of hidden units, the prior over functions tends to a gaussian process. In this article, analytic forms are derived for the covariance function of the gaussian processes corresponding to networks with sigmoidal and gaussian hidden units. This allows predictions to be made efficiently using networks with an infinite number of hidden units and shows, somewhat paradoxically, that it may be easier to carry out Bayesian prediction with infinite networks rather than finite ones.

[1]  Emanuel Parzen,et al.  Stochastic Processes , 1962 .

[2]  D. J. Farlie,et al.  Prediction and Regulation by Linear Least-Square Methods , 1964 .

[3]  E. A. Sylvestre,et al.  Principal modes of variation for processes with continuous sample curves , 1986 .

[4]  A. Yaglom Correlation Theory of Stationary and Related Random Functions I: Basic Results , 1987 .

[5]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[6]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[7]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[8]  G. Wahba Spline models for observational data , 1990 .

[9]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[10]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[11]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[12]  Kurt Hornik,et al.  Some new results on neural network approximation , 1993, Neural Networks.

[13]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[14]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[15]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[16]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[17]  John E. Moody,et al.  Smoothing Regularizers for Projective Basis Function Networks , 1996, NIPS.

[18]  Christopher K. I. Williams Computing with Infinite Networks , 1996, NIPS.

[19]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[20]  M. Gibbs,et al.  Efficient implementation of gaussian processes , 1997 .

[21]  Christopher K. I. Williams,et al.  Gaussian regression and optimal finite dimensional linear models , 1997 .

[22]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  James T. Kwok Moderating the outputs of support vector machine classifiers , 1999, IEEE Trans. Neural Networks.

[24]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.