Still No Free Lunches: The Price to Pay for Tighter PAC-Bayes Bounds

"No free lunch" results state the impossibility of obtaining meaningful bounds on the error of a learning algorithm without prior assumptions and modelling. Some models are expensive (strong assumptions, such as as subgaussian tails), others are cheap (simply finite variance). As it is well known, the more you pay, the more you get: in other words, the most expensive models yield the more interesting bounds. Recent advances in robust statistics have investigated procedures to obtain tight bounds while keeping the cost minimal. The present paper explores and exhibits what the limits are for obtaining tight PAC-Bayes bounds in a robust setting for cheap models, addressing the question: is PAC-Bayes good value for money?

[1]  O. Catoni Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.

[2]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[3]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[4]  Matthieu Lerasle,et al.  Lecture Notes: Selected topics on robust statistical learning theory , 2019, 1908.10761.

[5]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[6]  J. Cima,et al.  On weak* convergence in ¹ , 1996 .

[7]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[8]  François Laviolette,et al.  PAC-Bayesian Bounds based on the Rényi Divergence , 2016, AISTATS.

[9]  G. Lugosi,et al.  Sub-Gaussian mean estimators , 2015, 1509.05845.

[10]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[11]  Josef Kittler,et al.  Learning PAC-Bayes Priors for Probabilistic Neural Networks , 2021, ArXiv.

[12]  Matthieu Lerasle,et al.  ROBUST MACHINE LEARNING BY MEDIAN-OF-MEANS: THEORY AND PRACTICE , 2019 .

[13]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[14]  Benjamin Guedj,et al.  A Primer on PAC-Bayesian Learning , 2019, ICML 2019.

[15]  Matthew J. Holland PAC-Bayes under potentially heavy tails , 2019, NeurIPS.

[16]  Pierre Alquier,et al.  User-friendly introduction to PAC-Bayes bounds , 2021, ArXiv.

[17]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[18]  Csaba Szepesvari,et al.  Tighter risk certificates for neural networks , 2020, J. Mach. Learn. Res..

[19]  Pierre Alquier,et al.  Simpler PAC-Bayesian bounds for hostile data , 2016, Machine Learning.

[20]  Gintare Karolina Dziugaite,et al.  On the role of data in PAC-Bayes bounds , 2021, AISTATS.