Implicit Bias of Gradient Descent on Linear Convolutional Networks
暂无分享,去创建一个
Nathan Srebro | Daniel Soudry | Suriya Gunasekar | Jason D. Lee | Jason D. Lee | Nathan Srebro | Suriya Gunasekar | Daniel Soudry | N. Srebro
[1] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[2] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[3] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[4] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[5] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[6] M. Muresan. A concrete approach to classical analysis , 2009 .
[7] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[8] Nathan Srebro,et al. Convergence of Gradient Descent on Separable Data , 2018, AISTATS.
[9] Matus Telgarsky,et al. Margins, Shrinkage, and Boosting , 2013, ICML.
[10] Yinyu Ye,et al. A note on the complexity of Lp minimization , 2011, Math. Program..
[11] Francis R. Bach,et al. Low-Rank Optimization on the Cone of Positive Semidefinite Matrices , 2008, SIAM J. Optim..
[12] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[13] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[14] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[15] Ambuj Tewari,et al. On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.
[16] Ruslan Salakhutdinov,et al. Geometry of Optimization and Implicit Regularization in Deep Learning , 2017, ArXiv.
[17] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[18] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[19] R. Rockafellar. Directionally Lipschitzian Functions and Subdifferential Calculus , 1979 .
[20] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[21] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[22] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[23] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[24] Yuanzhi Li,et al. Algorithmic Regularization in Over-parameterized Matrix Recovery , 2017, ArXiv.
[25] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[26] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[27] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[28] Matus Telgarsky,et al. Risk and parameter convergence of logistic regression , 2018, ArXiv.
[29] Renato D. C. Monteiro,et al. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..