暂无分享,去创建一个
Yuandong Tian | Ari S. Morcos | Qucheng Gong | Ari Morcos | Tina Jiang | Yuandong Tian | Qucheng Gong | Tina Jiang
[1] Thomas Hofmann,et al. Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization , 2018, AISTATS.
[2] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[3] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.
[4] Thomas Hofmann,et al. Towards a Theoretical Understanding of Batch Normalization , 2018, ArXiv.
[5] Jason Yosinski,et al. Measuring the Intrinsic Dimension of Objective Landscapes , 2018, ICLR.
[6] Gintare Karolina Dziugaite,et al. Stabilizing the Lottery Ticket Hypothesis , 2019 .
[7] Samy Bengio,et al. Identity Crisis: Memorization and Generalization under Extreme Overparameterization , 2019, ICLR.
[8] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[9] Nir Shavit,et al. Deep Learning is Robust to Massive Label Noise , 2017, ArXiv.
[10] Gintare Karolina Dziugaite,et al. The Lottery Ticket Hypothesis at Scale , 2019, ArXiv.
[11] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[12] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[13] Gregory J. Wolff,et al. Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.
[14] Tegan Maharaj,et al. Deep Nets Don't Learn via Memorization , 2017, ICLR.
[15] Yuandong Tian,et al. A theoretical framework for deep locally connected ReLU network , 2018, ArXiv.
[16] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[17] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[18] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[19] Michael Carbin,et al. The Lottery Ticket Hypothesis: Training Pruned Neural Networks , 2018, ArXiv.
[20] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[21] Jascha Sohl-Dickstein,et al. A Mean Field Theory of Batch Normalization , 2019, ICLR.
[22] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[23] Zachary Chase Lipton. Stuck in a What? Adventures in Weight Space , 2016, ArXiv.
[24] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[25] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[26] Yuandong Tian,et al. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.
[27] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[29] Zhanxing Zhu,et al. Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes , 2017, ArXiv.
[30] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[31] Yuandong Tian,et al. Better Computer Go Player with Neural Network and Long-term Prediction , 2016, ICLR.
[32] Jason Yosinski,et al. Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.
[33] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[34] Sanjeev Arora,et al. Theoretical Analysis of Auto Rate-Tuning by Batch Normalization , 2018, ICLR.
[35] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[36] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[37] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[38] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[39] Yann LeCun,et al. Comparing dynamics: deep neural networks versus glassy systems , 2018, ICML.
[40] Levent Sagun,et al. A jamming transition from under- to over-parametrization affects generalization in deep learning , 2018, Journal of Physics A: Mathematical and Theoretical.
[41] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[42] Yann LeCun,et al. Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond , 2016, 1611.07476.