Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks
暂无分享,去创建一个
[1] A. Laio,et al. Escaping free-energy minima , 2002, Proceedings of the National Academy of Sciences of the United States of America.
[2] M.G.B. Drew,et al. The art of molecular dynamics simulation , 1996 .
[3] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[4] Andrew M. Stuart,et al. Convergence of Numerical Time-Averaging and Stationary Measures via Poisson Equations , 2009, SIAM J. Numer. Anal..
[5] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[6] Gianpaolo Gobbo,et al. Extended Hamiltonian approach to continuous tempering. , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.
[7] Gerald Mathias,et al. Continuous Tempering Molecular Dynamics: A Deterministic Approach to Simulated Tempering. , 2016, Journal of chemical theory and computation.
[8] Lester Ingber,et al. Simulated annealing: Practice versus theory , 1993 .
[9] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[10] Fei-Fei Li,et al. Visualizing and Understanding Recurrent Networks , 2015, ArXiv.
[11] Lawrence Carin,et al. On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators , 2015, NIPS.
[12] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[13] Tianqi Chen,et al. Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.
[14] Yann LeCun,et al. Deep learning with Elastic Averaging SGD , 2014, NIPS.
[15] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[16] Quoc V. Le,et al. Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.
[17] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[18] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.
[19] Zhe Gan,et al. Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization , 2015, AISTATS.
[20] S. Geman,et al. Diffusions for global optimizations , 1986 .
[21] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[22] Stuart GEMANf. DIFFUSIONS FOR GLOBAL OPTIMIZATION , 2022 .
[23] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[24] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..
[25] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.