Complexity control by gradient descent in deep networks

[1]  Tomaso Poggio,et al.  Double descent in the condition number , 2019, ArXiv.

[2]  Kaifeng Lyu,et al.  Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.

[3]  Nathan Srebro,et al.  Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models , 2019, ICML.

[4]  Lorenzo Rosasco,et al.  Theory III: Dynamics and Generalization in Deep Networks , 2019, ArXiv.

[5]  Alexander Rakhlin,et al.  Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon , 2018, COLT.

[6]  Tengyuan Liang,et al.  Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.

[7]  Nathan Srebro,et al.  Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate , 2018, AISTATS.

[8]  Nathan Srebro,et al.  The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..

[9]  Ruslan Salakhutdinov,et al.  Geometry of Optimization and Implicit Regularization in Deep Learning , 2017, ArXiv.

[10]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[11]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[12]  Lorenzo Rosasco,et al.  Learning with Incremental Iterative Regularization , 2014, NIPS.

[13]  Sun-Yuan Kung,et al.  On gradient adaptation with unit-norm constraints , 2000, IEEE Trans. Signal Process..

[14]  Lea Fleischer,et al.  Regularization of Inverse Problems , 1996 .