Stochastic Gradient and Langevin Processes
暂无分享,去创建一个
[1] A. Eberle. Couplings, distances and contractivity for diffusion processes revisited , 2013 .
[2] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[3] Ohad Shamir,et al. Global Non-convex Optimization with Discretized Diffusions , 2018, NeurIPS.
[4] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[5] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).
[6] G. Parisi. Brownian motion , 2005, Nature.
[7] Michael I. Jordan,et al. Quantitative Central Limit Theorems for Discrete Stochastic Processes , 2019, ArXiv.
[8] David M. Blei,et al. A Variational Analysis of Stochastic Gradient Algorithms , 2016, ICML.
[9] Xin T. Tong,et al. Statistical inference for model parameters in stochastic gradient descent , 2016, The Annals of Statistics.
[10] Ioannis Chatzigeorgiou,et al. Bounds on the Lambert Function and Their Application to the Outage Analysis of User Cooperation , 2013, IEEE Communications Letters.
[11] Sashank J. Reddi,et al. Why ADAM Beats SGD for Attention Models , 2019, ArXiv.
[12] Lester W. Mackey,et al. Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.
[13] Alain Durmus,et al. High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.
[14] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[15] Murat A. Erdogdu,et al. Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond , 2019, NeurIPS.
[16] Dacheng Tao,et al. Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence , 2019, NeurIPS.
[17] Alex Zhai,et al. The CLT in high dimensions: Quantitative bounds via martingale embedding , 2018, The Annals of Probability.
[18] Levent Sagun,et al. A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks , 2019, ICML.
[19] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[20] A. Eberle. Reflection coupling and Wasserstein contractivity without convexity , 2011 .
[21] Michael I. Jordan,et al. Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.
[22] Krishnakumar Balasubramanian,et al. Normal Approximation for Stochastic Gradient Descent via Non-Asymptotic Rates of Martingale CLT , 2019, COLT.
[23] Yuanzhi Li,et al. An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.
[24] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[25] Michael I. Jordan,et al. Sampling can be faster than optimization , 2018, Proceedings of the National Academy of Sciences.
[26] A. Dalalyan. Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.