Improved PAC-Bayesian Bounds for Linear Regression

In this paper, we improve the PAC-Bayesian error bound for linear regression derived in Germain et al. [10]. The improvements are twofold. First, the proposed error bound is tighter, and converges to the generalization loss with a well-chosen temperature parameter. Second, the error bound also holds for training data that are not independently sampled. In particular, the error bound applies to certain time series generated by well-known classes of dynamical models, such as ARX models.

[1]  Petre Stoica,et al.  Decentralized Control , 2018, The Control Systems Handbook.

[2]  Pierre Alquier,et al.  Model selection for weakly dependent time series forecasting , 2009, 0902.2924.

[3]  Pierre Alquier,et al.  On the properties of variational approximations of Gibbs posteriors , 2015, J. Mach. Learn. Res..

[4]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[5]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[6]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[7]  Tong Zhang,et al.  Information-theoretic upper and lower bounds for statistical estimation , 2006, IEEE Transactions on Information Theory.

[8]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[9]  E. Hannan,et al.  The Statistical Theory of Linear Systems. , 1990 .

[10]  François Laviolette,et al.  Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm , 2015, J. Mach. Learn. Res..

[11]  Patrick Billingsley,et al.  Probability and Measure. , 1986 .

[12]  Roni Khardon,et al.  Excess Risk Bounds for the Bayes Risk using Variational Inference in Latent Gaussian Models , 2017, NIPS.

[13]  Mehryar Mohri,et al.  Generalization bounds for non-stationary mixing processes , 2016, Machine Learning.

[14]  Thomas Hofmann,et al.  Tighter PAC-Bayes Bounds , 2007 .

[15]  Peter Grünwald,et al.  The Safe Bayesian - Learning the Learning Rate via the Mixability Gap , 2012, ALT.

[16]  Mehryar Mohri,et al.  Theory and Algorithms for Forecasting Time Series , 2018, ArXiv.

[17]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[18]  Alexandre Lacoste,et al.  PAC-Bayesian Theory Meets Bayesian Inference , 2016, NIPS.

[19]  David A. McAllester Simplified PAC-Bayesian Margin Bounds , 2003, COLT.

[20]  Alex Simpkins,et al.  System Identification: Theory for the User, 2nd Edition (Ljung, L.; 1999) [On the Shelf] , 2012, IEEE Robotics & Automation Magazine.

[21]  Gene H. Golub,et al.  Matrix computations , 1983 .

[22]  Arnak S. Dalalyan,et al.  Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity , 2008, Machine Learning.

[23]  P. Spreij Probability and Measure , 1996 .

[24]  P. Billingsley,et al.  Probability and Measure , 1980 .

[25]  Biao Huang,et al.  System Identification , 2000, Control Theory for Physicists.

[26]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[27]  Benjamin Guedj,et al.  A Primer on PAC-Bayesian Learning , 2019, ICML 2019.

[28]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[29]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .