Distribution-Dependent PAC-Bayes Priors

We develop the idea that the PAC-Bayes prior can be informed by the data-generating distribution. We prove sharp bounds for an existing framework, and develop insights into function class complexity in this model and suggest means of controlling it with new algorithms. In particular we consider controlling capacity with respect to the unknown geometry of the data-generating distribution. We finally extend this localization to more practical learning methods.

[1]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[2]  R. Serfling Approximation Theorems of Mathematical Statistics , 1980 .

[3]  O. Kallenberg,et al.  Some dimension-free features of vector-valued martingales , 1991 .

[4]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[5]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[6]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[7]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[8]  John Shawe-Taylor,et al.  PAC-Bayes & Margins , 2002, NIPS.

[9]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[10]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[11]  John Shawe-Taylor,et al.  PAC Bayes and Margins , 2003 .

[12]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[13]  J. Langford Tutorial on Practical Prediction Theory for Classification , 2005, J. Mach. Learn. Res..

[14]  G. Prato An Introduction to Infinite-Dimensional Analysis , 2006 .

[15]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[16]  John Shawe-Taylor,et al.  Tighter PAC-Bayes Bounds , 2006, NIPS.

[17]  Ulrike von Luxburg,et al.  Graph Laplacians and their Convergence on Random Neighborhood Graphs , 2006, J. Mach. Learn. Res..

[18]  Gilles Blanchard,et al.  Occam's Hammer , 2006, COLT.

[19]  Liva Ralaivola,et al.  Chromatic PAC-Bayes Bounds for Non-IID Data , 2009, AISTATS.

[20]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[21]  Maria-Florina Balcan,et al.  A discriminative model for semi-supervised learning , 2010, J. ACM.