暂无分享,去创建一个
[1] Sompolinsky,et al. Spin-glass models of neural networks. , 1985, Physical review. A, General physics.
[2] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[3] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[4] Michael I. Jordan,et al. Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.
[5] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[6] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[7] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[8] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[9] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[11] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[12] Thomas Hofmann,et al. Towards a Theoretical Understanding of Batch Normalization , 2018, ArXiv.
[13] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[14] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[15] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[16] Bolei Zhou,et al. Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Joan Bruna,et al. Neural Networks with Finite Intrinsic Dimension have no Spurious Valleys , 2018, ArXiv.
[18] Thomas Hofmann,et al. Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization , 2018, AISTATS.
[19] Surya Ganguli,et al. Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.
[20] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[21] Yuandong Tian,et al. An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.
[22] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[23] Yuandong Tian,et al. Better Computer Go Player with Neural Network and Long-term Prediction , 2016, ICLR.
[24] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[25] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[26] Stephen P. Boyd,et al. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..
[27] Yann LeCun,et al. Open Problem: The landscape of the loss surfaces of multilayer networks , 2015, COLT.
[28] David J. Schwab,et al. An exact mapping between the Variational Renormalization Group and Deep Learning , 2014, ArXiv.
[29] Yuandong Tian,et al. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.
[30] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[31] A. Montanari,et al. The landscape of empirical risk for nonconvex losses , 2016, The Annals of Statistics.