论文信息 - Towards self-certified learning: Probabilistic neural networks trained by PAC-Bayes with Backprop

Towards self-certified learning: Probabilistic neural networks trained by PAC-Bayes with Backprop

The result of training a probabilistic neural network is a probability distribution over network weights. This learnt distribution is the basis of a prediction scheme, e.g. building a stochastic predictor or integrating the predictions of all possible parameter settings. In this paper we experiment with training probabilistic neural networks from a PAC-Bayesian approach. We name PAC-Bayes with Backprop (PBB) the family of (probabilistic) neural network training methods derived from PAC-Bayes bounds and optimized through stochastic gradient descent. We show that the methods studied here represent promising candidates for self-certified learning, achieving state-of-the-art test performance in several data sets and at the same time obtaining reasonably tight certificates on the risk on any unseen data without the need for data-splitting protocols (both for testing and model selection).

J. Shawe-Taylor | M. Pérez-Ortiz | Omar Rivasplata

[1] Robert Price,et al. A useful theorem for nonlinear devices having Gaussian inputs , 1958, IRE Trans. Inf. Theory.

[2] Yoav Freund,et al. Self bounding learning algorithms , 1998, COLT' 98.

[3] Microchoice bounds and self bounding learning algorithms , 1999, COLT '99.

[4] David A. McAllester. PAC-Bayesian model averaging , 1999, COLT '99.

[5] John Langford,et al. (Not) Bounding the True Error , 2001, NIPS.

[6] Matthias W. Seeger,et al. PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[7] Andreas Maurer,et al. A Note on the PAC Bayesian Theorem , 2004, ArXiv.

[8] John Langford,et al. Microchoice Bounds and Self Bounding Learning Algorithms , 2003, Machine Learning.

[9] Shiliang Sun,et al. PAC-bayes bounds with data dependent priors , 2012, J. Mach. Learn. Res..

[10] John Shawe-Taylor,et al. Tighter PAC-Bayes bounds through distribution-dependent priors , 2013, Theor. Comput. Sci..

[11] Gábor Lugosi,et al. Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.