Towards self-certified learning: Probabilistic neural networks trained by PAC-Bayes with Backprop

The result of training a probabilistic neural network is a probability distribution over network weights. This learnt distribution is the basis of a prediction scheme, e.g. building a stochastic predictor or integrating the predictions of all possible parameter settings. In this paper we experiment with training probabilistic neural networks from a PAC-Bayesian approach. We name PAC-Bayes with Backprop (PBB) the family of (probabilistic) neural network training methods derived from PAC-Bayes bounds and optimized through stochastic gradient descent. We show that the methods studied here represent promising candidates for self-certified learning, achieving state-of-the-art test performance in several data sets and at the same time obtaining reasonably tight certificates on the risk on any unseen data without the need for data-splitting protocols (both for testing and model selection).

[1]  Robert Price,et al.  A useful theorem for nonlinear devices having Gaussian inputs , 1958, IRE Trans. Inf. Theory.

[2]  Yoav Freund,et al.  Self bounding learning algorithms , 1998, COLT' 98.

[3]  Microchoice bounds and self bounding learning algorithms , 1999, COLT '99.

[4]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[5]  John Langford,et al.  (Not) Bounding the True Error , 2001, NIPS.

[6]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[7]  Andreas Maurer,et al.  A Note on the PAC Bayesian Theorem , 2004, ArXiv.

[8]  John Langford,et al.  Microchoice Bounds and Self Bounding Learning Algorithms , 2003, Machine Learning.

[9]  Shiliang Sun,et al.  PAC-bayes bounds with data dependent priors , 2012, J. Mach. Learn. Res..

[10]  John Shawe-Taylor,et al.  Tighter PAC-Bayes bounds through distribution-dependent priors , 2013, Theor. Comput. Sci..

[11]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[12]  Yevgeny Seldin,et al.  PAC-Bayes-Empirical-Bernstein Inequality , 2013, NIPS.

[13]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[14]  David M. Blei,et al.  The Generalized Reparameterization Gradient , 2016, NIPS.

[15]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[16]  Christian Igel,et al.  A Strongly Quasiconvex PAC-Bayesian Bound , 2016, ALT.

[17]  Gintare Karolina Dziugaite,et al.  Entropy-SGD optimizes the prior of a PAC-Bayes bound: Data-dependent PAC-Bayes priors via differential privacy , 2017, NeurIPS.

[18]  Martin Jankowiak,et al.  Pathwise Derivatives Beyond the Reparameterization Trick , 2018, ICML.

[19]  Csaba Szepesvari,et al.  Tighter risk certificates for neural networks , 2020, J. Mach. Learn. Res..