PAC-bayesian analysis of distribution dependent priors: Tighter risk bounds and stability analysis

We deal with the analysis of distribution dependent priors in the PAC-Bayes framework.We refine the analysis of the generalization ability of the Gibbs & Bayes classifiers.We review and refine the current state-of-the-art risk bounds.We apply the Algorithmic Stability framework to the PAC-Bayes one.We show that Catoni's data dependent posterior distribution is stable. In this paper we bound the risk of the Gibbs and Bayes classifiers (GC and BC), when the prior is defined in terms of the data generating distribution, and the posterior is defined in terms of the observed one, as proposed by Catoni (2007). We deal with this problem from two different perspectives. From one side we briefly review and further develop the classical PAC-Bayes analysis by refining the current state-of-the-art risk bounds. From the other side we propose a novel approach, based on the concept of Algorithmic Stability, which we call Distribution Stability (DS), and develop some new risk bounds over the GC and BC based on the DS. Finally, we show that the data dependent posterior distribution associated to the data generating prior has also attractive and previously unknown properties.

[1]  François Laviolette,et al.  A PAC-Bayes Sample-compression Approach to Kernel Methods , 2011, ICML.

[2]  Jean-Yves Audibert PAC-Bayesian aggregation and multi-armed bandits , 2010 .

[3]  François Laviolette,et al.  Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm , 2015, J. Mach. Learn. Res..

[4]  Luc Devroye,et al.  Distribution-free performance bounds for potential function rules , 1979, IEEE Trans. Inf. Theory.

[5]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[6]  E. S. Pearson,et al.  THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[7]  Davide Anguita,et al.  Fully Empirical and Data-Dependent Stability-Based Bounds , 2015, IEEE Transactions on Cybernetics.

[8]  Daniel Berend,et al.  Consistency of weighted majority votes , 2013, NIPS.

[9]  W. Rogers,et al.  A Finite Sample Distribution-Free Performance Bound for Local Discrimination Rules , 1978 .

[10]  Emilie Morvant,et al.  Apprentissage de vote de majorité pour la classification supervisée et l'adaptation de domaine : approches PAC-Bayésiennes et combinaison de similarités. (Learning Majority Vote for Supervised Classification and Domain Adaptation: PAC-Bayesian Approaches and Similarity Combination) , 2013 .

[11]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  Shiliang Sun,et al.  PAC-bayes bounds with data dependent priors , 2012, J. Mach. Learn. Res..

[14]  Naftali Tishby,et al.  PAC-Bayesian Generalization Bound for Density Estimation with Application to Co-clustering , 2009, AISTATS.

[15]  Matthias W. Seeger,et al.  Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[16]  M. Kearns,et al.  Algorithmic stability and sanity-check bounds for leave-one-out cross-validation , 1999 .

[17]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[18]  John Shawe-Taylor,et al.  A PAC analysis of a Bayesian estimator , 1997, COLT '97.

[19]  J. Langford Tutorial on Practical Prediction Theory for Classification , 2005, J. Mach. Learn. Res..

[20]  Lorenzo Rosasco,et al.  Are Loss Functions All the Same? , 2004, Neural Computation.

[21]  Yevgeny Seldin,et al.  PAC-Bayes-Empirical-Bernstein Inequality , 2013, NIPS.

[22]  François Laviolette,et al.  PAC-Bayesian Theory for Transductive Learning , 2014, AISTATS.

[23]  David A. McAllester Some PAC-Bayesian theorems , 1998, COLT' 98.

[24]  T. Erven PAC-Bayes Mini-tutorial: A Continuous Union Bound , 2014, 1405.1580.

[25]  Davide Anguita,et al.  In-sample Model Selection for Trimmed Hinge Loss Support Vector Machine , 2012, Neural Processing Letters.

[26]  Sayan Mukherjee,et al.  Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[27]  John Shawe-Taylor,et al.  Tighter PAC-Bayes Bounds , 2006, NIPS.

[28]  A. Agresti,et al.  Approximate is Better than “Exact” for Interval Estimation of Binomial Proportions , 1998 .

[29]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[30]  Liva Ralaivola,et al.  Chromatic PAC-Bayes Bounds for Non-IID Data , 2009, AISTATS.

[31]  John Shawe-Taylor,et al.  Tighter PAC-Bayes bounds through distribution-dependent priors , 2013, Theor. Comput. Sci..

[32]  Massimiliano Pontil,et al.  Stability of Randomized Learning Algorithms , 2005, J. Mach. Learn. Res..

[33]  David A. McAllester Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[34]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[35]  John Shawe-Taylor,et al.  PAC-Bayes & Margins , 2002, NIPS.

[36]  David A. McAllester PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.

[37]  John Shawe-Taylor,et al.  PAC-Bayesian Analysis of Contextual Bandits , 2011, NIPS.

[38]  Jean-Yves Audibert,et al.  PAC-Bayesian Generic Chaining , 2003, NIPS.

[39]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[40]  Ohad Shamir,et al.  Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..

[41]  Naftali Tishby,et al.  PAC-Bayesian Analysis of Co-clustering and Beyond , 2010, J. Mach. Learn. Res..

[42]  François Laviolette,et al.  From PAC-Bayes Bounds to Quadratic Programs for Majority Votes , 2011, ICML.

[43]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[44]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[45]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[46]  Luc Devroye,et al.  Distribution-free inequalities for the deleted and holdout error estimates , 1979, IEEE Trans. Inf. Theory.

[47]  S. Varadhan,et al.  Asymptotic evaluation of certain Markov process expectations for large time , 1975 .

[48]  François Laviolette,et al.  PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier , 2006, NIPS.

[49]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[50]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[51]  中澤 真,et al.  Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .

[52]  François Laviolette,et al.  PAC-Bayes risk bounds for sample-compressed Gibbs classifiers , 2005, ICML '05.

[53]  John Shawe-Taylor,et al.  PAC-Bayesian Inequalities for Martingales , 2011, IEEE Transactions on Information Theory.

[54]  François Laviolette,et al.  PAC-Bayes Risk Bounds for Stochastic Averages and Majority Votes of Sample-Compressed Classifiers , 2007, J. Mach. Learn. Res..

[55]  John Shawe-Taylor,et al.  Distribution-Dependent PAC-Bayes Priors , 2010, ALT.

[56]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[57]  M. Younsi Proof of a Combinatorial Conjecture Coming from the PAC-Bayesian Machine Learning Theory , 2012, 1209.0824.

[58]  Shmuel Nitzan,et al.  Optimal Decision Rules in Uncertain Dichotomous Choice Situations , 1982 .

[59]  Ben Taskar,et al.  PAC-Bayesian Collective Stability , 2014, AISTATS.