暂无分享,去创建一个
Yoshua Bengio | Aaron C. Courville | Mohammad Pezeshki | Samira Shabanian | Remi Tachet des Combes | Yoshua Bengio | M. Pezeshki | Rémi Tachet des Combes | S. Shabanian
[1] Hilbert J. Kappen,et al. On-line learning processes in artificial neural networks , 1993 .
[2] Christopher Joseph Pal,et al. On orthogonality and learning recurrent networks with long term dependencies , 2017, ICML.
[3] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[4] Yann LeCun,et al. The Loss Surface of Multilayer Networks , 2014, ArXiv.
[5] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[6] Jascha Sohl-Dickstein,et al. SVCCA: Singular Vector Canonical Correlation Analysis for Deep Understanding and Improvement , 2017, ArXiv.
[7] Nathan Srebro,et al. Convergence of Gradient Descent on Separable Data , 2018, AISTATS.
[8] Yi Zhou,et al. Convergence of SGD in Learning ReLU Models with Separable Data , 2018, ArXiv.
[9] Surya Ganguli,et al. Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice , 2017, NIPS.
[10] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.
[11] T. Poggio,et al. General conditions for predictivity in learning theory , 2004, Nature.
[12] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[13] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[14] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[15] Zhenyu Liao,et al. The Dynamics of Learning: A Random Matrix Approach , 2018, ICML.
[16] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[17] Yuanzhi Li,et al. An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.
[18] Morris Tenenbaum,et al. Ordinary differential equations : an elementary textbook for students of mathematics, engineering, and the sciences , 1963 .
[19] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[20] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.
[21] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[22] Colin Raffel,et al. Is Generator Conditioning Causally Related to GAN Performance? , 2018, ICML.
[23] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.
[24] Surya Ganguli,et al. Learning hierarchical categories in deep neural networks , 2013, CogSci.
[25] Chico Q. Camargo,et al. Deep learning generalizes because the parameter-function map is biased towards simple functions , 2018, ICLR.
[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[27] Jascha Sohl-Dickstein,et al. SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.
[28] Lorenzo Rosasco,et al. Are Loss Functions All the Same? , 2004, Neural Computation.
[29] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[30] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.