暂无分享,去创建一个
Jian-Guo Liu | Yuanyuan Feng | Tingran Gao | Lei Li | Yulong Lu | Jian‐Guo Liu | Yulong Lu | Yuanyuan Feng | Tingran Gao | Lei Li
[1] Persi Diaconis,et al. Iterated Random Functions , 1999, SIAM Rev..
[2] B. Øksendal. Stochastic differential equations : an introduction with applications , 1987 .
[3] F. Bach,et al. Bridging the gap between constant step size stochastic gradient descent and Markov chains , 2017, The Annals of Statistics.
[4] Erwan Faou,et al. Weak Backward Error Analysis for SDEs , 2011, SIAM J. Numer. Anal..
[5] C. Villani. Topics in Optimal Transportation , 2003 .
[6] Lei Li,et al. Semigroups of stochastic gradient descent and online principal component analysis: properties and diffusion approximations , 2017, 1712.06509.
[7] Asuman E. Ozdaglar,et al. A globally convergent incremental Newton method , 2014, Math. Program..
[8] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[9] Richard Socher,et al. Improving Generalization Performance by Switching from Adam to SGD , 2017, ArXiv.
[10] R. Srikant,et al. Adding One Neuron Can Eliminate All Bad Local Minima , 2018, NeurIPS.
[11] Prateek Jain,et al. A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares) , 2017, FSTTCS.
[12] Philippe von Wurstemberger,et al. Strong error analysis for stochastic gradient descent optimization algorithms , 2018, 1801.09324.
[13] A. Stuart,et al. Gaussian Approximations for Probability Measures on $\mathbf{R}^d$ , 2016, 1611.08642.
[14] E Weinan,et al. Stochastic Modified Equations and Dynamics of Stochastic Gradient Algorithms I: Mathematical Foundations , 2018, J. Mach. Learn. Res..
[15] Ohad Shamir,et al. Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.
[16] Stefano Soatto,et al. Deep relaxation: partial differential equations for optimizing deep neural networks , 2017, Research in the Mathematical Sciences.
[17] E Weinan,et al. Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms , 2015, ICML.
[18] S. Osher,et al. Sparse Recovery via Differential Inclusions , 2014, 1406.7728.
[19] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[20] M. Kopec. Weak backward error analysis for Langevin process , 2013, 1310.2599.
[21] Konstantinos Spiliopoulos,et al. Stochastic Gradient Descent in Continuous Time , 2016, SIAM J. Financial Math..
[22] H. Robbins. A Stochastic Approximation Method , 1951 .
[23] Wenqing Hu,et al. On the diffusion approximation of nonconvex stochastic gradient descent , 2017, Annals of Mathematical Sciences and Applications.
[24] Alexander J. Smola,et al. On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants , 2015, NIPS.
[25] Ohad Shamir,et al. Stochastic Convex Optimization , 2009, COLT.
[26] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[27] Justin A. Sirignano,et al. Stochastic Gradient Descent in Continuous Time: A Central Limit Theorem , 2017, Stochastic Systems.
[28] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[29] Stephen P. Boyd,et al. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..
[30] M. Kopec. Weak backward error analysis for overdamped Langevin processes , 2013, 1310.2404.
[31] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.
[32] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.
[33] Simone G. O. Fiori,et al. Quasi-Geodesic Neural Learning Algorithms Over the Orthogonal Group: A Tutorial , 2005, J. Mach. Learn. Res..
[34] Hans-Bernd Dürr,et al. A smooth vector field for quadratic programming , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[35] Tony Shardlow,et al. Modified Equations for Stochastic Differential Equations , 2006 .
[36] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[37] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[38] Assyr Abdulle,et al. High Order Numerical Approximation of the Invariant Measure of Ergodic SDEs , 2014, SIAM J. Numer. Anal..
[39] U. Helmke,et al. Optimization and Dynamical Systems , 1994, Proceedings of the IEEE.
[40] Assyr Abdulle,et al. High Weak Order Methods for Stochastic Differential Equations Based on Modified Equations , 2012, SIAM J. Sci. Comput..
[41] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[42] David M. Blei,et al. Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..
[43] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.