On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent
暂无分享,去创建一个
Nathan Srebro | Daniel Soudry | Amir Globerson | Mor Shpigel Nacson | Blake E. Woodworth | Edward Moroshko | Blake Woodworth | Shahar Azulay | Nathan Srebro | Daniel Soudry | A. Globerson | Edward Moroshko | Shahar Azulay | N. Srebro | E. Moroshko | M. S. Nacson
[1] Matus Telgarsky,et al. Gradient descent aligns the layers of deep linear networks , 2018, ICLR.
[2] Nathan Srebro,et al. Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy , 2020, NeurIPS.
[3] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[4] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[5] Varun Kanade,et al. Implicit Regularization for Optimal Sparse Recovery , 2019, NeurIPS.
[6] Wei Hu,et al. Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.
[7] Francis Bach,et al. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.
[8] Hongyang Zhang,et al. Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.
[9] Hossein Mobahi,et al. A Unifying View on Implicit Bias in Training Linear Neural Networks , 2021, ICLR.
[10] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[11] Nathan Srebro,et al. Kernel and Rich Regimes in Overparametrized Models , 2019, COLT.
[12] Manfred K. Warmuth,et al. Reparameterizing Mirror Descent as Gradient Descent , 2020, NeurIPS.
[13] Nadav Cohen,et al. Implicit Regularization in Deep Learning May Not Be Explainable by Norms , 2020, NeurIPS.
[14] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[15] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[16] Kalyanmoy Deb,et al. Approximate KKT points and a proximity measure for termination , 2013, J. Glob. Optim..
[17] Ohad Shamir,et al. Implicit Regularization in ReLU Networks with the Square Loss , 2020, COLT.
[18] Nathan Srebro,et al. Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models , 2019, ICML.
[19] Kaifeng Lyu,et al. Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning , 2021, ICLR.
[20] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[21] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[22] Quynh Nguyen,et al. On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths , 2021, ICML.
[23] Manfred K. Warmuth,et al. Winnowing with Gradient Descent , 2020, COLT.
[24] Nathan Srebro,et al. Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.
[25] Nathan Srebro,et al. Mirrorless Mirror Descent: A More Natural Discretization of Riemannian Gradient Flow , 2020, ArXiv.