The Effect of Gradient Noise on the Energy Landscape of Deep Networks
暂无分享,去创建一个
[1] J J Hopfield,et al. Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.
[2] M. Mézard,et al. Spin Glass Theory and Beyond , 1987 .
[3] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[4] C. Lee Giles,et al. An analysis of noise in recurrent neural networks: convergence and generalization , 1996, IEEE Trans. Neural Networks.
[5] S. Kak. Information, physics, and computation , 1996 .
[6] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[7] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[8] M. Talagrand. Spin glasses : a challenge for mathematicians : cavity and mean field models , 2003 .
[9] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.
[10] Riccardo Zecchina,et al. Survey propagation: An algorithm for satisfiability , 2002, Random Struct. Algorithms.
[11] Yan V Fyodorov,et al. Replica Symmetry Breaking Condition Exposed by Random Matrix Calculation of Landscape Complexity , 2007, cond-mat/0702601.
[12] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[13] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[14] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.
[15] Richard E. Neapolitan. Learning Bayesian Network Structure , 2009 .
[16] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[17] Antonio Auffinger,et al. Random Matrices and Complexity of Spin Glasses , 2010, 1003.1129.
[18] Tommi S. Jaakkola,et al. Learning Bayesian Network Structure using LP Relaxations , 2010, AISTATS.
[19] Florent Krzakala,et al. Statistical physics-based reconstruction in compressed sensing , 2011, ArXiv.
[20] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[21] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..
[22] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[23] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[24] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[25] Yoshua Bengio,et al. Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.
[26] T. Tao. Topics in Random Matrix Theory , 2012 .
[27] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[28] Y. Fyodorov. High-Dimensional Random Fields and Random Matrix Theory , 2013, 1307.2379.
[29] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.
[30] Rina Panigrahy,et al. Sparse Matrix Factorization , 2013, ArXiv.
[31] D. Panchenko. The Sherrington-Kirkpatrick Model , 2013 .
[32] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[33] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[34] Neil D. Lawrence,et al. Deep Gaussian Processes , 2012, AISTATS.
[35] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.
[36] Misha Denil,et al. Predicting Parameters in Deep Learning , 2014 .
[37] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.
[38] Aditya Bhaskara,et al. Provable Bounds for Learning Some Deep Representations , 2013, ICML.
[39] S. Majumdar,et al. Top eigenvalue of a random matrix: large deviations and third order phase transition , 2013, 1311.0580.
[40] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[41] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[42] Qiang Chen,et al. Network In Network , 2013, ICLR.
[43] Yann LeCun,et al. The Loss Surface of Multilayer Networks , 2014, ArXiv.
[44] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[45] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[46] Ryan P. Adams,et al. Avoiding pathologies in very deep networks , 2014, AISTATS.
[47] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.
[48] Zhaoran Wang,et al. Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time , 2014, NIPS.
[49] Stefano Soatto,et al. Visual Representations: Defining Properties and Deep Approximations , 2014, ICLR 2016.
[50] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.
[51] O. Zeitouni,et al. The extremal process of critical points of the pure p-spin spherical spin glass model , 2015, 1509.03098.
[52] Jeff Johnson,et al. Fast Convolutional Nets With fbfft: A GPU Performance Evaluation , 2014, ICLR.
[53] René Vidal,et al. Global Optimality in Tensor Factorization, Deep Learning, and Beyond , 2015, ArXiv.
[54] Jason Yosinski,et al. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[56] Yann LeCun,et al. Explorations on high dimensional landscapes , 2014, ICLR.
[57] Quoc V. Le,et al. Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.
[58] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[59] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[60] Richard G. Baraniuk,et al. A Probabilistic Theory of Deep Learning , 2015, ArXiv.
[61] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[62] Eliran Subag,et al. The complexity of spherical p-spin models - a second moment approach , 2015, 1504.02251.
[63] Stefano Soatto,et al. Visual Scene Representations: Sufficiency, Minimality, Invariance and Deep Approximations , 2014, ICLR.
[64] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[65] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[66] Yixin Chen,et al. Compressing Neural Networks with the Hashing Trick , 2015, ICML.
[67] A. Bovier. Metastability: A Potential-Theoretic Approach , 2016 .
[68] Christian Borgs,et al. Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes , 2016, Proceedings of the National Academy of Sciences.
[69] Hossein Mobahi,et al. Training Recurrent Neural Networks by Diffusion , 2016, ArXiv.
[70] Max Welling,et al. Group Equivariant Convolutional Networks , 2016, ICML.
[71] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[72] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[73] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[74] Rogério Schmidt Feris,et al. A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.
[75] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[76] Anima Anandkumar,et al. Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .
[77] Eric T. Nalisnick,et al. A Scale Mixture Perspective of Multiplicative Noise in Neural Networks , 2015, 1506.03208.