Bounds on the Generalization Ability of Bayesian Inference and Gibbs Algorithms

Recent theoretical works applying the methods of statistical learning theory have put into relief the interest of old well known learning paradigms such as Bayesian inference and Gibbs algorithms. Sample complexity bounds have been given for such paradigms in the zero error case. This paper studies the behavior of these algorithms without this assumption. Results include uniform convergence of Gibbs algorithm towards Bayesian inference, rate of convergence of the empirical loss towards the generalization loss, convergence of the generalization error towards the optimal loss in the underlying class of functions.

[1]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization , 1997 .

[2]  Wray L. Buntine Variational Extensions to EM and Multinomial PCA , 2002, ECML.

[3]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[4]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[5]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[6]  Pal Rujan,et al.  Playing Billiards in Version Space , 1997, Neural Computation.

[7]  Daphne Koller,et al.  Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence , 2001 .

[8]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems , 1997 .

[9]  Ralf Herbrich,et al.  Bayes Point Machines: Estimating the Bayes Point in Kernel Space , 1999 .

[10]  Nando de Freitas,et al.  Variational MCMC , 2001, UAI.

[11]  David Haussler,et al.  Calculation of the learning curve of Bayes optimal classification algorithm for learning a perceptron with noise , 1991, COLT '91.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Thomas L. Griffiths,et al.  A probabilistic approach to semantic representation , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.

[14]  G. Casella,et al.  Rao-Blackwellisation of sampling schemes , 1996 .

[15]  Colin Campbell,et al.  Robust Bayes Point Machines , 2000, ESANN.

[16]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[17]  S. L. Scott Bayesian Methods for Hidden Markov Models , 2002 .

[18]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.