The generalized Bayesian committee machine

In this paper we introduce the Generalized Bayesian Committee Machine (GBCM) for applications with large data sets. In particular, the GBCM can be used in the context of kernel based systems such as smoothing splines, kriging, regularization networks and Gaussian process regression which —for computational reasons— are otherwise limited to rather small data sets. The GBCM provides a novel and principled way of combining estimators trained for regression, classification, the prediction of counts, the prediction of lifetimes and other applications which can be derived from the exponential family of distributions. We describe an online version of the GBCM which only requires one pass through the data set and only requires the storage of a matrix of the dimension of the number of query or test points. After training, the prediction at additional test points only requires resources dependent on the number of query points but is independent of the number of training data. We confirm the good scaling behavior using real and experimental data sets.

[1]  David J. C. MacKay,et al.  Bayesian Model Comparison and Backprop Nets , 1991, NIPS.

[2]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[3]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[4]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[5]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[6]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[7]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[8]  D. Mackay,et al.  Introduction to Gaussian processes , 1998 .

[9]  Christopher K. I. Williams Computing with Infinite Networks , 1996, NIPS.

[10]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[11]  田中 豊 Multivariate Statistical Modelling Based on Generalized Linear Models/Ludwig Fahrmeir,Gerhard Tutz(1994) , 1995 .

[12]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[13]  Yoshua Bengio,et al.  Boosting Neural Networks , 2000, Neural Computation.

[14]  L. Fahrmeir,et al.  Multivariate statistical modelling based on generalized linear models , 1994 .

[15]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[16]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[17]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[18]  Ole Winther,et al.  Efficient Approaches to Gaussian Process Classification , 1999, NIPS.

[19]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[20]  G. Wahba Bayesian "Confidence Intervals" for the Cross-validated Smoothing Spline , 1983 .

[21]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[22]  Radford M. Neal Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification , 1997, physics/9701026.

[23]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[24]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[25]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[26]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[27]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Christopher K. I. Williams Computation with Infinite Neural Networks , 1998, Neural Computation.

[29]  Volker Tresp,et al.  Combining Estimators Using Non-Constant Weighting Functions , 1994, NIPS.

[30]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.