Sharp Minima Can Generalize For Deep Nets
暂无分享,去创建一个
Razvan Pascanu | Samy Bengio | Yoshua Bengio | Laurent Dinh | Yoshua Bengio | Samy Bengio | Razvan Pascanu | Laurent Dinh
[1] Geoffrey E. Hinton,et al. Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.
[2] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[3] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[4] Aapo Hyvärinen,et al. Nonlinear independent component analysis: Existence and uniqueness results , 1999, Neural Networks.
[5] A. Klyachko. Random walks on symmetric spaces and inequalities for matrix spectra , 2000 .
[6] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[7] Amos Storkey,et al. Advances in Neural Information Processing Systems 20 , 2007 .
[8] Léon Bottou,et al. On-line learning for very large data sets: Research Articles , 2005 .
[9] Léon Bottou,et al. On-line learning for very large data sets , 2005 .
[10] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[11] Yurii Nesterov,et al. Confidence level solutions for stochastic programming , 2000, Autom..
[12] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[13] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[14] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[15] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[16] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[17] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[18] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[19] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.
[20] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[21] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[22] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[23] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[24] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[25] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[26] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.
[27] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[28] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.
[29] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[30] Yann LeCun,et al. Explorations on high dimensional landscapes , 2014, ICLR.
[31] Roberto Cipolla,et al. Understanding symmetries in deep networks , 2015, ArXiv.
[32] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[33] Razvan Pascanu,et al. Natural Neural Networks , 2015, NIPS.
[34] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[35] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[36] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[37] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[38] Yoshua Bengio,et al. NICE: Non-linear Independent Components Estimation , 2014, ICLR.
[39] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[41] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[42] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[43] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[45] Daniel Jiwoong Im,et al. An empirical analysis of the optimization of deep network loss surfaces , 2016, 1612.04010.
[46] Matthias Bethge,et al. A note on the evaluation of generative models , 2015, ICLR.
[47] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[48] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.
[49] Yann LeCun,et al. Singularity of the Hessian in Deep Learning , 2016, ArXiv.
[50] Razvan Pascanu,et al. Local minima in training of deep networks , 2017, ArXiv.
[51] Yann LeCun,et al. Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond , 2016, 1611.07476.
[52] Daniel Jiwoong Im,et al. An Empirical Analysis of Deep Network Loss Surfaces , 2016, ArXiv.
[53] Gabriel Synnaeve,et al. Wav2Letter: an End-to-End ConvNet-based Speech Recognition System , 2016, ArXiv.
[54] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[55] Ole Winther,et al. Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.
[56] Venu Govindaraju,et al. Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks , 2016, ICML.
[57] Surya Ganguli,et al. On the Expressive Power of Deep Neural Networks , 2016, ICML.
[58] Samy Bengio,et al. Density estimation using Real NVP , 2016, ICLR.
[59] Shai Shalev-Shwartz,et al. Fast Rates for Empirical Risk Minimization of Strict Saddle Problems , 2017, COLT.
[60] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[61] Max Welling,et al. Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.
[62] Yoshua Bengio,et al. Sharp Minima Can Generalize For Deep Nets Supplementary Material , 2017 .
[63] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[64] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[65] Yann Dauphin,et al. A Convolutional Encoder Model for Neural Machine Translation , 2016, ACL.
[66] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..