论文信息 - PAC-Bayes Risk Bounds for Stochastic Averages and Majority Votes of Sample-Compressed Classifiers - 字舞流文

PAC-Bayes Risk Bounds for Stochastic Averages and Majority Votes of Sample-Compressed Classifiers

We propose a PAC-Bayes theorem for the sample-compression setting where each classifier is described by a compression subset of the training data and a message string of additional information. This setting, which is the appropriate one to describe many learning algorithms, strictly generalizes the usual data-independent setting where classifiers are represented only by data-independent message strings (or parameters taken from a continuous set). The proposed PAC-Bayes theorem for the sample-compression setting reduces to the PAC-Bayes theorem of Seeger (2002) and Langford (2005) when the compression subset of each classifier vanishes. For posteriors having all their weights on a single sample-compressed classifier, the general risk bound reduces to a bound similar to the tight sample-compression bound proposed in Laviolette et al. (2005). Finally, we extend our results to the case where each sample-compressed classifier of a data-dependent ensemble may abstain of predicting a class label.

François Laviolette | Mario Marchand | M. Marchand | François Laviolette

[1] Manfred K. Warmuth,et al. Relating Data Compression and Learnability , 2003 .

[2] R. Rivest. Learning Decision Lists , 1987, Machine Learning.

[3] Colin Campbell,et al. Bayes Point Machines , 2001, J. Mach. Learn. Res..

[4] David A. McAllester. Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[5] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[6] Thomas Hofmann,et al. PAC-Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier , 2007 .

[7] Matthias W. Seeger,et al. Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[8] François Laviolette,et al. A PAC-Bayes approach to the Set Covering Machine , 2005, NIPS.

[9] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[10] Sally Floyd,et al. Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 2004, Machine Learning.

[11] John Shawe-Taylor,et al. PAC-Bayesian Compression Bounds on the Prediction Error of Learning Algorithms for Classification , 2005, Machine Learning.

[12] Mario Marchand,et al. Learning with Decision Lists of Data-Dependent Features , 2005, J. Mach. Learn. Res..

[13] David A. McAllester. Some PAC-Bayesian Theorems , 1998, COLT' 98.

[14] J. Langford. Tutorial on Practical Prediction Theory for Classification , 2005, J. Mach. Learn. Res..

[15] O. Catoni. A PAC-Bayesian approach to adaptive classification , 2004 .

[16] D. L. Reilly,et al. A neural model for category learning , 1982, Biological Cybernetics.

[17] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[18] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[19] François Laviolette,et al. Margin-Sparsity Trade-Off for the Set Covering Machine , 2005, ECML.

[20] François Laviolette,et al. A PAC-Bayes Risk Bound for General Loss Functions , 2006, NIPS.

[21] John Shawe-Taylor,et al. The Set Covering Machine , 2003, J. Mach. Learn. Res..

[22] Simon Haykin,et al. An Approach to Adaptive Classification , 2001 .

[23] John Shawe-Taylor,et al. PAC-Bayes & Margins , 2002, NIPS.

[24] David A. McAllester. PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.

[25] David G. Stork,et al. Pattern Classification , 1973 .

[26] Amos Storkey,et al. Advances in Neural Information Processing Systems 20 , 2007 .

[27] François Laviolette,et al. PAC-Bayes risk bounds for sample-compressed Gibbs classifiers , 2005, ICML '05.

[28] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.