暂无分享,去创建一个
[1] Prateek Jain,et al. Accelerating Stochastic Gradient Descent , 2017, COLT.
[2] Aurélien Lucchi,et al. Continuous-time Models for Stochastic Optimization Algorithms , 2018, NeurIPS.
[3] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[4] Quanquan Gu,et al. Continuous and Discrete-time Accelerated Stochastic Mirror Descent for Strongly Convex Functions , 2018, ICML.
[5] G. N. Mil’shtejn. Approximate Integration of Stochastic Differential Equations , 1975 .
[6] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[7] Aryan Mokhtari,et al. Direct Runge-Kutta Discretization Achieves Acceleration , 2018, NeurIPS.
[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[9] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[10] P. Olver. Nonlinear Systems , 2013 .
[11] Tianbao Yang,et al. Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization , 2016, 1604.03257.
[12] Michael I. Jordan,et al. A Lyapunov Analysis of Momentum Methods in Optimization , 2016, ArXiv.
[13] Mathias Staudigl,et al. On the convergence of gradient-like flows with noisy gradient input , 2016, SIAM J. Optim..
[14] Michael I. Jordan,et al. Acceleration via Symplectic Discretization of High-Resolution Differential Equations , 2019, NeurIPS.
[15] Euhanna Ghadimi,et al. Global convergence of the Heavy-ball method for convex optimization , 2014, 2015 European Control Conference (ECC).
[16] Prateek Jain,et al. On the Insufficiency of Existing Momentum Schemes for Stochastic Optimization , 2018, 2018 Information Theory and Applications Workshop (ITA).
[17] V. Arnold. Mathematical Methods of Classical Mechanics , 1974 .
[18] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[19] Alexandre M. Bayen,et al. Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.
[20] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[21] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[22] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[23] Sébastien Gadat,et al. Long time behaviour and stationary regime of memory gradient diffusions , 2014 .
[24] Ali H. Sayed,et al. On the influence of momentum acceleration on online learning , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] E Weinan,et al. Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms , 2015, ICML.
[26] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..
[27] Guanghui Lan,et al. An optimal method for stochastic composite optimization , 2011, Mathematical Programming.
[28] Vladimir Braverman,et al. The Physical Systems Behind Optimization Algorithms , 2018, NeurIPS.
[29] Michael I. Jordan,et al. Understanding the acceleration phenomenon via high-resolution differential equations , 2018, Mathematical Programming.
[30] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..
[31] Peter L. Bartlett,et al. Acceleration and Averaging in Stochastic Descent Dynamics , 2017, NIPS.
[32] Yurii Nesterov,et al. Lectures on Convex Optimization , 2018 .
[33] Jessica Fuerst,et al. Stochastic Differential Equations And Applications , 2016 .
[34] E. Hairer,et al. Geometric Numerical Integration: Structure Preserving Algorithms for Ordinary Differential Equations , 2004 .
[35] S. Shreve,et al. Stochastic differential equations , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.
[36] S. Gadat,et al. On the long time behavior of second order differential equations with asymptotically small dissipation , 2007, 0710.1107.
[37] Jerry Ma,et al. Quasi-hyperbolic momentum and Adam for deep learning , 2018, ICLR.
[38] Tengyu Ma,et al. Gradient Descent Learns Linear Dynamical Systems , 2016, J. Mach. Learn. Res..
[39] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[40] Michael I. Jordan,et al. On Symplectic Optimization , 2018, 1802.03653.
[41] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[42] H. Robbins. A Stochastic Approximation Method , 1951 .
[43] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[44] Andre Wibisono,et al. A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.
[45] Quanquan Gu,et al. Accelerated Stochastic Mirror Descent: From Continuous-time Dynamics to Discrete-time Algorithms , 2018, AISTATS.
[46] Stephen P. Boyd,et al. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..