Implicit Regularization and Convergence for Weight Normalization
暂无分享,去创建一个
Suriya Gunasekar | Rachel A. Ward | Qiang Liu | Shanshan Wu | E. Dobriban | Xiaoxia Wu | Tongzheng Ren | Zhiyuan Li | Edgar Dobriban
[1] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[2] Nathan Srebro,et al. Dropout: Explicit Forms and Capacity Control , 2020, ICML.
[3] Michael W. Mahoney,et al. Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning , 2018, J. Mach. Learn. Res..
[4] O. Papaspiliopoulos. High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .
[5] Guido Mont'ufar,et al. Optimization Theory for ReLU Neural Networks Trained with Normalization Layers , 2020, ICML.
[6] Sifan Liu,et al. Ridge Regression: Structure, Cross-Validation, and Sketching , 2019, ICLR.
[7] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[8] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[9] Zhiyuan Zhang,et al. Understanding and Improving Layer Normalization , 2019, NeurIPS.
[10] Yuandong Tian. Over-parameterization as a Catalyst for Better Generalization of Deep ReLU network , 2019, ArXiv.
[11] Varun Kanade,et al. Implicit Regularization for Optimal Sparse Recovery , 2019, NeurIPS.
[12] Tomaso A. Poggio,et al. Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization , 2019, ArXiv.
[13] Edgar Dobriban,et al. Invariance reduces Variance: Understanding Data Augmentation in Deep Learning and Beyond , 2019, ArXiv.
[14] Matus Telgarsky,et al. The implicit bias of gradient descent on nonseparable data , 2019, COLT.
[15] Sanjeev Arora,et al. Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.
[16] Yuandong Tian,et al. Luck Matters: Understanding Training Dynamics of Deep ReLU Networks , 2019, ArXiv.
[17] Xiaoxia Wu,et al. AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization , 2018, ICML.
[18] Xiangru Lian,et al. Revisit Batch Normalization: New Understanding and Refinement via Composition Optimization , 2019, AISTATS.
[19] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[20] Michael W. Mahoney,et al. Traditional and Heavy-Tailed Self Regularization in Neural Network Models , 2019, ICML.
[21] J. Zico Kolter,et al. A Continuous-Time View of Early Stopping for Least Squares Regression , 2018, AISTATS.
[22] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[23] Zuowei Shen,et al. A Quantitative Analysis of the Effect of Batch Normalization on Gradient Descent , 2018, ICML.
[24] Sanjeev Arora,et al. Theoretical Analysis of Auto Rate-Tuning by Batch Normalization , 2018, ICLR.
[25] Ping Luo,et al. Towards Understanding Regularization in Batch Normalization , 2018, ICLR.
[26] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[27] Thomas Hofmann,et al. Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization , 2018, AISTATS.
[28] Roman Vershynin,et al. High-Dimensional Probability , 2018 .
[29] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[30] Raman Arora,et al. On the Implicit Bias of Dropout , 2018, ICML.
[31] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[32] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.
[33] Xiaoxia Wu,et al. WNGrad: Learn the Learning Rate in Gradient Descent , 2018, ArXiv.
[34] Elad Hoffer,et al. Norm matters: efficient and accurate normalization schemes in deep networks , 2018, NeurIPS.
[35] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[36] Hongyang Zhang,et al. Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.
[37] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[38] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[39] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[40] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.
[41] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[42] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.
[43] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.
[44] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[45] Michael I. Jordan,et al. Gradient Descent Converges to Minimizers , 2016, ArXiv.
[46] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[47] Stefan Wager,et al. High-Dimensional Asymptotics of Prediction: Ridge Regression and Classification , 2015, 1507.03003.
[48] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[49] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[50] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[51] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[52] Sida I. Wang,et al. Dropout Training as Adaptive Regularization , 2013, NIPS.
[53] Andrea Montanari,et al. The phase transition of matrix recovery from Gaussian measurements matches the minimax MSE of matrix denoising , 2013, Proceedings of the National Academy of Sciences.
[54] Michael W. Mahoney. Approximate computation and implicit regularization for very large-scale data analysis , 2012, PODS.
[55] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[56] Emmanuel J. Candès,et al. Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..
[57] R. Vershynin,et al. A Randomized Kaczmarz Algorithm with Exponential Convergence , 2007, math/0702226.
[58] Bernard Widrow,et al. Least-mean-square adaptive filters , 2003 .
[59] Sun-Yuan Kung,et al. On gradient adaptation with unit-norm constraints , 2000, IEEE Trans. Signal Process..
[60] Steve Rogers,et al. Adaptive Filter Theory , 1996 .
[61] Monson H. Hayes,et al. Statistical Digital Signal Processing and Modeling , 1996 .
[62] U. Helmke,et al. Optimization and Dynamical Systems , 1994, Proceedings of the IEEE.
[63] John G. Proakis,et al. Digital Signal Processing: Principles, Algorithms, and Applications , 1992 .
[64] Hervé Bourlard,et al. Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.
[65] O. Strand. Theory and methods related to the singular-function expansion and Landweber's iteration for integral equations of the first kind , 1974 .