Memo No . 90 July 18 , 2019 Theory III : Dynamics and Generalization in Deep Networks 1
暂无分享,去创建一个
T. Poggio | L. Rosasco | B. Miranda | Q. Liao | J. Hidary | Andrzej Banburski | Kenji Kawaguchi | Sasha Rakhlin
[1] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[2] Phan-Minh Nguyen,et al. Mean Field Limit of the Learning Dynamics of Multilayer Neural Networks , 2019, ArXiv.
[3] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[4] Daniel Kunin,et al. Loss Landscapes of Regularized Linear Autoencoders , 2019, ICML.
[5] Alexander Rakhlin,et al. Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon , 2018, COLT.
[6] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[7] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[8] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[9] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[10] Xiao Zhang,et al. Learning One-hidden-layer ReLU Networks via Gradient Descent , 2018, AISTATS.
[11] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[12] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[13] Qiang Liu,et al. On the Margin Theory of Feedforward Neural Networks , 2018, ArXiv.
[14] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[15] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[16] Wei Hu,et al. Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.
[17] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.
[18] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[19] Tomaso A. Poggio,et al. Theory of Deep Learning IIb: Optimization Properties of SGD , 2018, ArXiv.
[20] Ohad Shamir,et al. Size-Independent Sample Complexity of Neural Networks , 2017, COLT.
[21] Yuandong Tian,et al. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.
[22] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[23] Yuandong Tian,et al. When is a Convolutional Filter Easy To Learn? , 2017, ICLR.
[24] Inderjit S. Dhillon,et al. Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels , 2017, ArXiv.
[25] Guillermo Sapiro,et al. Robust Large Margin Deep Neural Networks , 2017, IEEE Transactions on Signal Processing.
[26] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[27] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[28] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[29] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[30] Noah Golowich,et al. Musings on Deep Learning: Properties of SGD , 2017 .
[31] Tomaso A. Poggio,et al. Theory II: Landscape of the Empirical Risk in Deep Learning , 2017, ArXiv.
[32] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[33] Yuandong Tian,et al. An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.
[34] Amit Daniely,et al. SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.
[35] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[36] Matus Telgarsky,et al. Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.
[37] Yann LeCun,et al. Singularity of the Hessian in Deep Learning , 2016, ArXiv.
[38] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[39] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[40] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[41] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[42] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[43] Shie Mannor,et al. Robustness and generalization , 2010, Machine Learning.
[44] Gábor Lugosi,et al. Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.
[45] Ji Zhu,et al. Margin Maximizing Loss Functions , 2003, NIPS.
[46] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2004 .
[47] Sun-Yuan Kung,et al. On gradient adaptation with unit-norm constraints , 2000, IEEE Trans. Signal Process..
[48] Paulo Jorge S. G. Ferreira,et al. The existence and uniqueness of the minimum norm solution to certain linear and nonlinear problems , 1996, Signal Process..
[49] B. Halpern. Fixed points of nonexpanding maps , 1967 .