Stochastic Gradient and Langevin Processes
暂无分享,去创建一个
[1] Sashank J. Reddi,et al. Why ADAM Beats SGD for Attention Models , 2019, ArXiv.
[2] Murat A. Erdogdu,et al. Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond , 2019, NeurIPS.
[3] Krishnakumar Balasubramanian,et al. Normal Approximation for Stochastic Gradient Descent via Non-Asymptotic Rates of Martingale CLT , 2019, COLT.
[4] Michael I. Jordan,et al. Quantitative Central Limit Theorems for Discrete Stochastic Processes , 2019, ArXiv.
[5] Levent Sagun,et al. A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks , 2019, ICML.
[6] Michael I. Jordan,et al. Sampling can be faster than optimization , 2018, Proceedings of the National Academy of Sciences.
[7] Lester W. Mackey,et al. Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.
[8] Alain Durmus,et al. High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.
[9] Dacheng Tao,et al. Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence , 2019, NeurIPS.
[10] Ohad Shamir,et al. Global Non-convex Optimization with Discretized Diffusions , 2018, NeurIPS.
[11] Alex Zhai,et al. The CLT in high dimensions: Quantitative bounds via martingale embedding , 2018, The Annals of Probability.
[12] Michael I. Jordan,et al. Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.
[13] Yuanzhi Li,et al. An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.
[14] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).
[15] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[16] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[17] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[18] Xin T. Tong,et al. Statistical inference for model parameters in stochastic gradient descent , 2016, The Annals of Statistics.
[19] David M. Blei,et al. A Variational Analysis of Stochastic Gradient Algorithms , 2016, ICML.
[20] A. Eberle. Couplings, distances and contractivity for diffusion processes revisited , 2013 .
[21] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[22] A. Dalalyan. Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.
[23] Ioannis Chatzigeorgiou,et al. Bounds on the Lambert Function and Their Application to the Outage Analysis of User Cooperation , 2013, IEEE Communications Letters.
[24] A. Eberle. Reflection coupling and Wasserstein contractivity without convexity , 2011 .
[25] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[26] G. Parisi. Brownian motion , 2005, Nature.