Understanding deep learning (still) requires rethinking generalization
暂无分享,去创建一个
Samy Bengio | Oriol Vinyals | Benjamin Recht | Moritz Hardt | Chiyuan Zhang | Oriol Vinyals | Samy Bengio | B. Recht | Chiyuan Zhang | Moritz Hardt | O. Vinyals
[1] Amnon Shashua,et al. Convolutional Rectifier Networks as Generalized Tensor Decompositions , 2016, ICML.
[2] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[3] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[4] Yi Zhang,et al. Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.
[5] Vitaly Feldman,et al. Does learning require memorization? a short tale about a long tail , 2019, STOC.
[6] David A. McAllester,et al. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.
[7] T. Poggio,et al. Statistical Learning: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization , 2002 .
[8] Sanjeev Arora,et al. Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.
[9] T. Poggio,et al. Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.
[10] Y. Yao,et al. On Early Stopping in Gradient Descent Learning , 2007 .
[11] Stefano Ermon,et al. Bias and Generalization in Deep Generative Models: An Empirical Study , 2018, NeurIPS.
[12] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[13] Samy Bengio,et al. Insights on representational similarity in neural networks with canonical correlation , 2018, NeurIPS.
[14] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[15] Matus Telgarsky,et al. Benefits of Depth in Neural Networks , 2016, COLT.
[16] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[17] Mikhail Belkin,et al. Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate , 2018, NeurIPS.
[18] D. Costarelli,et al. Constructive Approximation by Superposition of Sigmoidal Functions , 2013 .
[19] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.
[20] Andrew Y. Ng,et al. Learning Feature Representations with K-Means , 2012, Neural Networks: Tricks of the Trade.
[21] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[22] Ohad Shamir,et al. Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..
[23] Ryan P. Adams,et al. Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach , 2018, ICLR.
[24] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[25] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[26] Quoc V. Le,et al. A Bayesian Perspective on Generalization and Stochastic Gradient Descent , 2017, ICLR.
[27] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[28] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.
[29] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[30] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[31] Yoshua Bengio,et al. An Empirical Study of Example Forgetting during Deep Neural Network Learning , 2018, ICLR.
[32] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[33] Lorenzo Rosasco,et al. Generalization Properties and Implicit Regularization for Multiple Passes SGM , 2016, ICML.
[34] Raef Bassily,et al. The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning , 2017, ICML.
[35] Hrushikesh Narhar Mhaskar,et al. Approximation properties of a multilayered feedforward artificial neural network , 1993, Adv. Comput. Math..
[36] Yoshua Bengio,et al. Shallow vs. Deep Sum-Product Networks , 2011, NIPS.
[37] Ohad Shamir,et al. Size-Independent Sample Complexity of Neural Networks , 2017, COLT.
[38] Ohad Shamir,et al. The Power of Depth for Feedforward Neural Networks , 2015, COLT.
[39] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[40] Benjamin Recht,et al. Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.
[41] Bernhard Schölkopf,et al. A Generalized Representer Theorem , 2001, COLT/EuroCOLT.
[42] T. Poggio,et al. General conditions for predictivity in learning theory , 2004, Nature.
[43] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.
[44] Anastasios Kyrillidis,et al. Minimum norm solutions do not always generalize well for over-parameterized problems , 2018, ArXiv.