Rethinking the Variational Interpretation of Nesterov's Accelerated Method
暂无分享,去创建一个
Antonio Orvieto | Hadi Daneshmand | Peiyuan Zhang | Hadi Daneshmand | Peiyuan Zhang | Antonio Orvieto
[1] Alexandre M. Bayen,et al. Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.
[2] B. Talukdar,et al. Symmetries and conservation laws of the damped harmonic oscillator , 2008 .
[3] Daniel Kunin,et al. Noether's Learning Dynamics: The Role of Kinetic Symmetry Breaking in Deep Learning , 2021, ArXiv.
[4] Kwangjun Ahn. From Proximal Point Method to Nesterov's Acceleration , 2020, ArXiv.
[5] Ameet Talwalkar,et al. Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability , 2021, ICLR.
[6] Stephen P. Boyd,et al. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..
[7] Antonio Orvieto,et al. A Continuous-time Perspective for Modeling Acceleration in Riemannian Optimization , 2020, AISTATS.
[8] V. Arnold. Mathematical Methods of Classical Mechanics , 1974 .
[9] Andr'as Szegleti,et al. Dissipation in Lagrangian Formalism , 2020, Entropy.
[10] Emmanuel J. Candès,et al. Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.
[11] Quanquan Gu,et al. Accelerated Stochastic Mirror Descent: From Continuous-time Dynamics to Discrete-time Algorithms , 2018, AISTATS.
[12] Rudiger Urbanke,et al. Noether: The More Things Change, the More Stay the Same , 2021, ArXiv.
[13] Michael I. Jordan,et al. Generalized Momentum-Based Methods: A Hamiltonian Perspective , 2019, SIAM J. Optim..
[14] Michael I. Jordan,et al. On dissipative symplectic integration with applications to gradient-based optimization , 2020 .
[15] Andrea Braides. Gamma-Convergence for Beginners , 2002 .
[16] Melvin Leok,et al. A Variational Formulation of Accelerated Optimization on Riemannian Manifolds , 2021, SIAM J. Math. Data Sci..
[17] L. Milne‐Thomson. A Treatise on the Theory of Bessel Functions , 1945, Nature.
[18] Andre Wibisono,et al. A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.
[19] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .
[20] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[21] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[22] Michael W. Mahoney,et al. PyHessian: Neural Networks Through the Lens of the Hessian , 2019, 2020 IEEE International Conference on Big Data (Big Data).
[23] Andre Wibisono,et al. Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions , 2019, NeurIPS.
[24] Michael I. Jordan. DYNAMICAL, SYMPLECTIC AND STOCHASTIC PERSPECTIVES ON GRADIENT-BASED OPTIMIZATION , 2019, Proceedings of the International Congress of Mathematicians (ICM 2018).
[25] Aaron Defazio,et al. On the Curved Geometry of Accelerated Optimization , 2018, NeurIPS.
[26] Michael I. Jordan,et al. Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives , 2020, ArXiv.
[27] Philippe Casgrain,et al. A Latent Variational Framework for Stochastic Optimization , 2019, NeurIPS.
[28] S. Brendle,et al. Calculus of Variations , 1927, Nature.
[29] F. Opitz. Information geometry and its applications , 2012, 2012 9th European Radar Conference.
[30] Zeyuan Allen Zhu,et al. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.