Distribution-Dependent Analysis of Gibbs-ERM Principle
暂无分享,去创建一个
Ilja Kuzborskij | Csaba Szepesvári | Nicolò Cesa-Bianchi | Csaba Szepesvari | N. Cesa-Bianchi | Ilja Kuzborskij
[1] François Laviolette,et al. Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm , 2015, J. Mach. Learn. Res..
[2] Lorenzo Rosasco,et al. Iterate averaging as regularization for stochastic gradient descent , 2018, COLT.
[3] John Shawe-Taylor,et al. PAC-Bayes & Margins , 2002, NIPS.
[4] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[5] John Shawe-Taylor,et al. Tighter PAC-Bayes bounds through distribution-dependent priors , 2013, Theor. Comput. Sci..
[6] Michael I. Jordan,et al. Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.
[7] Roni Khardon,et al. Excess Risk Bounds for the Bayes Risk using Variational Inference in Latent Gaussian Models , 2017, NIPS.
[8] Gintare Karolina Dziugaite,et al. Entropy-SGD optimizes the prior of a PAC-Bayes bound: Data-dependent PAC-Bayes priors via differential privacy , 2017, NeurIPS.
[9] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[10] Pierre Alquier,et al. On the properties of variational approximations of Gibbs posteriors , 2015, J. Mach. Learn. Res..
[11] Jean-Yves Audibert,et al. Robust linear least squares regression , 2010, 1010.0074.
[12] Tristan Milne,et al. Piecewise Strong Convexity of Neural Networks , 2018, NeurIPS.
[13] Michael I. Jordan,et al. Sampling can be faster than optimization , 2018, Proceedings of the National Academy of Sciences.
[14] Matthias W. Seeger,et al. PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..
[15] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[16] David M. Blei,et al. A Variational Analysis of Stochastic Gradient Algorithms , 2016, ICML.
[17] Shai Shalev-Shwartz,et al. Fast Rates for Empirical Risk Minimization of Strict Saddle Problems , 2017, COLT.
[18] Tommi S. Jaakkola,et al. Maximum Entropy Discrimination , 1999, NIPS.
[19] David M. Blei,et al. Frequentist Consistency of Variational Bayes , 2017, Journal of the American Statistical Association.
[20] David A. McAllester. Some PAC-Bayesian Theorems , 1998, COLT' 98.
[21] Kai Zheng,et al. Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints , 2017, COLT.
[22] Nando de Freitas,et al. An Introduction to MCMC for Machine Learning , 2004, Machine Learning.
[23] Gábor Lugosi,et al. Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.
[24] G. M. Tallis. Elliptical and Radial Truncation in Normal Populations , 1963 .
[25] Matus Telgarsky,et al. Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.
[26] Jinghui Chen,et al. Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.
[27] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[28] Michael I. Jordan,et al. Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.
[29] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[30] Maxim Raginsky,et al. Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.
[31] Tong Zhang,et al. Information-theoretic upper and lower bounds for statistical estimation , 2006, IEEE Transactions on Information Theory.
[32] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[33] Peter Grünwald,et al. A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity , 2017, ALT.
[34] Maxim Raginsky,et al. Local Optimality and Generalization Guarantees for the Langevin Algorithm via Empirical Metastability , 2018, COLT.
[35] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).