Convergence Properties of Deep Neural Networks on Separable Data
暂无分享,去创建一个
Yoshua Bengio | Aaron C. Courville | Aaron Courville | Mohammad Pezeshki | Samira Shabanian | Remi Tachet des Combes | Yoshua Bengio | M. Pezeshki | Samira Shabanian | Rémi Tachet des Combes
[1] Morris Tenenbaum,et al. Ordinary differential equations : an elementary textbook for students of mathematics, engineering, and the sciences , 1963 .
[2] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[3] Hilbert J. Kappen,et al. On-line learning processes in artificial neural networks , 1993 .
[4] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[5] T. Poggio,et al. General conditions for predictivity in learning theory , 2004, Nature.
[6] Lorenzo Rosasco,et al. Are Loss Functions All the Same? , 2004, Neural Computation.
[7] Surya Ganguli,et al. Learning hierarchical categories in deep neural networks , 2013, CogSci.
[8] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.
[9] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[10] Yann LeCun,et al. The Loss Surface of Multilayer Networks , 2014, ArXiv.
[11] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[12] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[15] Jascha Sohl-Dickstein,et al. SVCCA: Singular Vector Canonical Correlation Analysis for Deep Understanding and Improvement , 2017, ArXiv.
[16] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[17] Christopher Joseph Pal,et al. On orthogonality and learning recurrent networks with long term dependencies , 2017, ICML.
[18] Surya Ganguli,et al. Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice , 2017, NIPS.
[19] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[20] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[21] Yuanzhi Li,et al. An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.
[22] Yi Zhou,et al. Convergence of SGD in Learning ReLU Models with Separable Data , 2018, ArXiv.
[23] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.
[24] Zhenyu Liao,et al. The Dynamics of Learning: A Random Matrix Approach , 2018, ICML.
[25] Colin Raffel,et al. Is Generator Conditioning Causally Related to GAN Performance? , 2018, ICML.
[26] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.
[27] Nathan Srebro,et al. Convergence of Gradient Descent on Separable Data , 2018, AISTATS.
[28] Chico Q. Camargo,et al. Deep learning generalizes because the parameter-function map is biased towards simple functions , 2018, ICLR.
[29] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.