Local Regularizer Improves Generalization
暂无分享,去创建一个
Dimitris N. Metaxas | Chao Chen | Hui Qu | Yikai Zhang | Hui Qu | Yikai Zhang | Chao Chen
[1] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[2] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[3] Nadav Cohen,et al. On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.
[4] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[5] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[6] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .
[7] Yi Zhang,et al. Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.
[8] Michael I. Jordan,et al. Less than a Single Pass: Stochastically Controlled Stochastic Gradient , 2016, AISTATS.
[9] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[10] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[11] Mikhail Belkin,et al. Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate , 2018, NeurIPS.
[12] A. E. Hoerl,et al. Ridge regression: biased estimation for nonorthogonal problems , 2000 .
[13] Surya Ganguli,et al. On the Expressive Power of Deep Neural Networks , 2016, ICML.
[14] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[15] Patrick L. Combettes,et al. Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.
[16] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[17] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.
[18] Stefanie Jegelka,et al. ResNet with one-neuron hidden layers is a Universal Approximator , 2018, NeurIPS.
[19] Lei Wu,et al. Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes , 2017, ArXiv.
[20] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[21] Zeyuan Allen-Zhu,et al. How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD , 2018, NeurIPS.
[22] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Alexander J. Smola,et al. Efficient mini-batch training for stochastic optimization , 2014, KDD.
[24] David A. McAllester,et al. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.
[25] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[26] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..
[27] Mikhail Belkin,et al. To understand deep learning we need to understand kernel learning , 2018, ICML.
[28] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..
[29] Zeyuan Allen-Zhu,et al. Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.
[30] Ohad Shamir,et al. Size-Independent Sample Complexity of Neural Networks , 2017, COLT.
[31] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.
[32] Zaïd Harchaoui,et al. Catalyst for Gradient-based Nonconvex Optimization , 2018, AISTATS.
[33] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[34] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[35] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[36] David Tse,et al. Generalizable Adversarial Training via Spectral Normalization , 2018, ICLR.
[37] Yuichi Yoshida,et al. Spectral Norm Regularization for Improving the Generalizability of Deep Learning , 2017, ArXiv.
[38] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[39] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.
[40] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[41] Massimiliano Pontil,et al. Stability of Randomized Learning Algorithms , 2005, J. Mach. Learn. Res..
[42] Michael I. Jordan,et al. Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.
[43] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[44] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.
[45] Tong Zhang,et al. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.
[46] Zeyuan Allen-Zhu,et al. How To Make the Gradients Small Stochastically , 2018, NIPS 2018.
[47] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.
[48] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.