A Bayesian Committee Machine

The Bayesian committee machine (BCM) is a novel approach to combining estimators that were trained on different data sets. Although the BCM can be applied to the combination of any kind of estimators, the main foci are gaussian process regression and related systems such as regularization networks and smoothing splines for which the degrees of freedom increase with the number of training data. Somewhat surprisingly, we find that the performance of the BCM improves if several test points are queried at the same time and is optimal if the number of test points is at least as large as the degrees of freedom of the estimator. The BCM also provides a new solution for on-line learning with potential applications to data mining. We apply the BCM to systems with fixed basis functions and discuss its relationship to gaussian process regression. Finally, we show how the ideas behind the BCM can be applied in a non-Bayesian setting to extend the input-dependent combination of estimators.

[1]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[2]  G. Wahba Bayesian "Confidence Intervals" for the Cross-validated Smoothing Spline , 1983 .

[3]  P. Diaconis Bayesian Numerical Analysis , 1988 .

[4]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[5]  G. Wahba Spline models for observational data , 1990 .

[6]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[7]  David J. C. MacKay,et al.  Bayesian Model Comparison and Backprop Nets , 1991, NIPS.

[8]  Volker Tresp,et al.  Combining Estimators Using Non-Constant Weighting Functions , 1994, NIPS.

[9]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[10]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[11]  John E. Moody,et al.  Smoothing Regularizers for Projective Basis Function Networks , 1996, NIPS.

[12]  Christopher K. I. Williams Computing with Infinite Networks , 1996, NIPS.

[13]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[14]  M. Gibbs,et al.  Efficient implementation of gaussian processes , 1997 .

[15]  Volker Tresp,et al.  Averaging Regularized Estimators , 1997, Neural Computation.

[16]  Radford M. Neal Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification , 1997, physics/9701026.

[17]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[18]  Christopher K. I. Williams,et al.  Gaussian regression and optimal finite dimensional linear models , 1997 .

[19]  Tomaso A. Poggio,et al.  A Sparse Representation for Function Approximation , 1998, Neural Computation.

[20]  Christopher K. I. Williams Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[21]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[22]  Christopher K. I. Williams Computation with Infinite Neural Networks , 1998, Neural Computation.

[23]  Manfred Opper,et al.  Finite-Dimensional Approximation of Gaussian Processes , 1998, NIPS.

[24]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[25]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[26]  Klaus Ritter,et al.  Bayesian numerical analysis , 2000 .

[27]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.