暂无分享,去创建一个
Yoshua Bengio | Amos J. Storkey | Stanislaw Jastrzebski | Asja Fischer | Devansh Arpit | Nicolas Ballas | Zachary Kenton | Yoshua Bengio | A. Storkey | Nicolas Ballas | Stanislaw Jastrzebski | Zachary Kenton | Devansh Arpit | Asja Fischer | Z. Kenton
[1] P. Kloeden,et al. Numerical Solution of Stochastic Differential Equations , 1992 .
[2] D. Sherrington. Stochastic Processes in Physics and Chemistry , 1983 .
[3] L. Bottou. Stochastic Gradient Learning in Neural Networks , 1991 .
[4] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.
[5] D. Mackay,et al. Bayesian methods for adaptive models , 1992 .
[6] Hilbert J. Kappen,et al. On-line learning processes in artificial neural networks , 1993 .
[7] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[8] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[9] L. Eon Bottou. Online Learning and Stochastic Approximations , 1998 .
[10] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[11] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.
[12] C. Gardiner. Stochastic Methods: A Handbook for the Natural and Social Sciences , 2009 .
[13] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[14] Hiroshi Nakagawa,et al. Approximation Analysis of Stochastic Gradient Langevin Dynamics by using Fokker-Planck Equation and Ito Process , 2014, ICML.
[15] Ryan Babbush,et al. Bayesian Sampling Using Stochastic Gradient Thermostats , 2014, NIPS.
[16] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.
[17] Tianqi Chen,et al. Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.
[18] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[19] Quoc V. Le,et al. Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.
[20] K. Zygalakis,et al. (Non-) asymptotic properties of Stochastic Gradient Langevin Dynamics , 2015, 1501.00438.
[21] Zhanxing Zhu,et al. Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling , 2015, NIPS.
[22] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[23] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[25] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[26] Zhanxing Zhu,et al. Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes , 2017, ArXiv.
[27] Naftali Tishby,et al. Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.
[28] E Weinan,et al. Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms , 2015, ICML.
[29] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[30] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[31] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.
[32] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[33] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[34] Quoc V. Le,et al. Understanding Generalization and Stochastic Gradient Descent , 2017 .
[35] David M. Blei,et al. Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..
[36] Jian-Guo Liu,et al. Batch Size Matters: A Diffusion Approximation Framework on Nonconvex Stochastic Gradient Descent , 2017, ArXiv.
[37] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.
[38] Lorenzo Rosasco,et al. Theory of Deep Learning III: explaining the non-overfitting puzzle , 2017, ArXiv.
[39] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[40] David D. Cox,et al. On the information bottleneck theory of deep learning , 2018, ICLR.
[41] Yao Zhang,et al. Energy–entropy competition and the effectiveness of stochastic gradient descent in machine learning , 2018, Molecular Physics.
[42] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).
[43] Lei Wu,et al. The Regularization Effects of Anisotropic Noise in Stochastic Gradient Descent , 2018, ArXiv.
[44] Stefano Soatto,et al. Deep relaxation: partial differential equations for optimizing deep neural networks , 2017, Research in the Mathematical Sciences.
[45] Quoc V. Le,et al. A Bayesian Perspective on Generalization and Stochastic Gradient Descent , 2017, ICLR.