Relative Flatness and Generalization in the Interpolation Regime.
暂无分享,去创建一个
Cristian Sminchisescu | Henning Petzka | Michael Kamp | Linara Adilova | Mario Boley | C. Sminchisescu | Henning Petzka | Linara Adilova | Michael Kamp | Mario Boley
[1] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[2] J. Zico Kolter,et al. Uniform convergence may be unable to explain generalization in deep learning , 2019, NeurIPS.
[3] Davide Anguita,et al. Global Rademacher Complexity Bounds: From Slow to Fast Convergence Rates , 2015, Neural Processing Letters.
[4] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[5] Mikhail Belkin,et al. Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate , 2018, NeurIPS.
[6] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[7] Michael Rabadi,et al. Kernel Methods for Machine Learning , 2015 .
[8] David A. McAllester. Some PAC-Bayesian Theorems , 1998, COLT' 98.
[9] Jürgen Schmidhuber,et al. Simplifying Neural Nets by Discovering Flat Minima , 1994, NIPS.
[10] Max Tegmark,et al. Why Does Deep and Cheap Learning Work So Well? , 2016, Journal of Statistical Physics.
[11] Leslie Pack Kaelbling,et al. Generalization in Deep Learning , 2017, ArXiv.
[12] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[13] Konstantinos Pitas,et al. Better PAC-Bayes Bounds for Deep Neural Networks using the Loss Curvature , 2019, ArXiv.
[14] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[15] Jascha Sohl-Dickstein,et al. Sensitivity and Generalization in Neural Networks: an Empirical Study , 2018, ICLR.
[16] Peter Grünwald,et al. Invited review of the book Statistical and Inductive Inference by Minimum Message Length , 2006 .
[17] John Langford,et al. (Not) Bounding the True Error , 2001, NIPS.
[18] Ryan P. Adams,et al. Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach , 2018, ICLR.
[19] Shie Mannor,et al. Robustness and generalization , 2010, Machine Learning.
[20] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[21] M. C. Jones,et al. Variable location and scale kernel density estimation , 1994 .
[22] Masashi Sugiyama,et al. Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PAC-Bayesian Analysis , 2019, ICML.
[23] George Em Karniadakis,et al. Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness , 2019, Neural Networks.
[24] H. Whitney. Geometric Integration Theory , 1957 .
[25] Y. Yao,et al. On Early Stopping in Gradient Descent Learning , 2007 .
[26] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[27] Matthias W. Seeger,et al. PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..
[28] Tomaso A. Poggio,et al. Theory of Deep Learning IIb: Optimization Properties of SGD , 2018, ArXiv.
[29] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[30] Hossein Mobahi,et al. Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.
[31] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[32] Davide Anguita,et al. A local Vapnik-Chervonenkis complexity , 2016, Neural Networks.
[33] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[34] Trac D. Tran,et al. A Scale Invariant Flatness Measure for Deep Network Minima , 2019, ArXiv.
[35] Ohad Shamir,et al. Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..
[36] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[37] Peter L. Bartlett,et al. Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks , 2017, J. Mach. Learn. Res..
[38] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[39] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[40] David A. McAllester. PAC-Bayesian model averaging , 1999, COLT '99.
[41] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[42] Mikhail Belkin,et al. To understand deep learning we need to understand kernel learning , 2018, ICML.
[43] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[44] C. D. Kemp,et al. Density Estimation for Statistics and Data Analysis , 1987 .
[45] Naftali Tishby,et al. Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).
[46] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.
[47] Huan Wang,et al. Identifying Generalization Properties in Neural Networks , 2018, ArXiv.
[48] Arthur E. Hoerl,et al. Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.
[49] Eugenio Culurciello,et al. An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.
[50] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.