Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models
暂无分享,去创建一个
Nathan Srebro | Daniel Soudry | Jason D. Lee | Suriya Gunasekar | Mor Shpigel Nacson | Nathan Srebro | Suriya Gunasekar | Daniel Soudry | J. Lee | N. Srebro | M. S. Nacson
[1] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[2] Ji Zhu,et al. Margin Maximizing Loss Functions , 2003, NIPS.
[3] Ji Zhu,et al. Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..
[4] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[5] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[6] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[8] Lei Wu,et al. Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes , 2017, ArXiv.
[9] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[10] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[11] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[12] Ruslan Salakhutdinov,et al. Geometry of Optimization and Implicit Regularization in Deep Learning , 2017, ArXiv.
[13] Zachary C. Lipton,et al. Weighted Risk Minimization & Deep Learning , 2018, ArXiv.
[14] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[15] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[16] Matus Telgarsky,et al. Risk and parameter convergence of logistic regression , 2018, ArXiv.
[17] Qiang Liu,et al. On the Margin Theory of Feedforward Neural Networks , 2018, ArXiv.
[18] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[19] Nathan Srebro,et al. Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.
[20] Yi Zhou,et al. When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models? , 2018 .
[21] Pradeep Ravikumar,et al. Connecting Optimization and Regularization Paths , 2018, NeurIPS.
[22] S. Sastry,et al. Cross-Entropy Loss Leads To Poor Margins , 2018 .
[23] Matus Telgarsky,et al. Gradient descent aligns the layers of deep linear networks , 2018, ICLR.
[24] Nathan Srebro,et al. Convergence of Gradient Descent on Separable Data , 2018, AISTATS.
[25] Colin Wei,et al. Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel , 2018, NeurIPS.
[26] Zachary C. Lipton,et al. What is the Effect of Importance Weighting in Deep Learning? , 2018, ICML.
[27] Nathan Srebro,et al. Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate , 2018, AISTATS.
[28] J. Zico Kolter,et al. A Continuous-Time View of Early Stopping for Least Squares Regression , 2018, AISTATS.