Accelerated First-Order Optimization Algorithms for Machine Learning
暂无分享,去创建一个
Cong Fang | Zhouchen Lin | Huan Li | Zhouchen Lin | Cong Fang | Huan Li
[1] Wei Shi,et al. Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..
[2] Ohad Shamir,et al. On the Iteration Complexity of Oblivious First-Order Optimization Algorithms , 2016, ICML.
[3] Zeyuan Allen Zhu,et al. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.
[4] Nathan Srebro,et al. Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.
[5] Yi Zhou,et al. SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms , 2018 .
[6] Huan Li,et al. On the Complexity Analysis of the Primal Solutions for the Accelerated Randomized Dual Coordinate Ascent , 2018, J. Mach. Learn. Res..
[7] Marten van Dijk,et al. Finite-sum smooth optimization with SARAH , 2019, Computational Optimization and Applications.
[8] Antonin Chambolle,et al. Stochastic Primal-Dual Hybrid Gradient Algorithm with Arbitrary Sampling and Imaging Applications , 2017, SIAM J. Optim..
[9] Antonin Chambolle,et al. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.
[10] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .
[11] Donghwan Kim,et al. Optimized first-order methods for smooth convex minimization , 2014, Mathematical Programming.
[12] Quanquan Gu,et al. Stochastic Nested Variance Reduction for Nonconvex Optimization , 2018, J. Mach. Learn. Res..
[13] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.
[14] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[15] Lin Xiao,et al. An Accelerated Randomized Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2015, SIAM J. Optim..
[16] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..
[17] Zeyuan Allen-Zhu,et al. Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.
[18] Julien Mairal,et al. Optimization with First-Order Surrogate Functions , 2013, ICML.
[19] Yurii Nesterov,et al. Gradient methods for minimizing composite functions , 2012, Mathematical Programming.
[20] Zheng Qu,et al. Restarting the accelerated coordinate descent method with a rough strong convexity estimate , 2018, Comput. Optim. Appl..
[21] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[22] Alexander Gasnikov,et al. Primal–dual accelerated gradient methods with small-dimensional relaxation oracle , 2018, Optim. Methods Softw..
[23] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[24] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.
[25] Michael I. Jordan,et al. Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.
[26] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.
[27] Zhouchen Lin,et al. A Sharp Convergence Rate Analysis for Distributed Accelerated Gradient Methods , 2018, 1810.01053.
[28] Yuejie Chi,et al. Communication-Efficient Distributed Optimization in Networks with Gradient Tracking , 2019, AISTATS.
[29] Zeyuan Allen Zhu,et al. Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives , 2015, ICML.
[30] Na Li,et al. Harnessing smoothness to accelerate distributed optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).
[31] Saeed Ghadimi,et al. Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.
[32] Marc Teboulle,et al. Fast Gradient-Based Algorithms for Constrained Total Variation Image Denoising and Deblurring Problems , 2009, IEEE Transactions on Image Processing.
[33] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[34] Xiaoming Yuan,et al. On the acceleration of augmented Lagrangian method for linearly constrained optimization , 2010 .
[35] J. Berkson. Application of the Logistic Function to Bio-Assay , 1944 .
[36] Marc Teboulle,et al. Performance of first-order methods for smooth convex minimization: a novel approach , 2012, Mathematical Programming.
[37] Qing Ling,et al. EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.
[38] Yangyang Xu,et al. Accelerated First-Order Primal-Dual Proximal Methods for Linearly Constrained Composite Convex Programming , 2016, SIAM J. Optim..
[39] Michael I. Jordan,et al. Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.
[40] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[41] Michael I. Jordan,et al. Stochastic Cubic Regularization for Fast Nonconvex Optimization , 2017, NeurIPS.
[42] Yunmei Chen,et al. Optimal Primal-Dual Methods for a Class of Saddle Point Problems , 2013, SIAM J. Optim..
[43] Alexander J. Smola,et al. Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization , 2016, NIPS.
[44] Peter Richtárik,et al. Don't Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop , 2019, ALT.
[45] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[46] Dmitry Kovalev,et al. Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization , 2020, NeurIPS.
[47] Yuanzhi Li,et al. Neon2: Finding Local Minima via First-Order Oracles , 2017, NeurIPS.
[48] Justin Domke,et al. Finito: A faster, permutable incremental gradient method for big data problems , 2014, ICML.
[49] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[50] Peter Richtárik,et al. Quartz: Randomized Dual Coordinate Ascent with Arbitrary Sampling , 2015, NIPS.
[51] Yurii Nesterov,et al. Linear convergence of first order methods for non-strongly convex optimization , 2015, Math. Program..
[52] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..
[53] Thomas Brox,et al. iPiano: Inertial Proximal Algorithm for Nonconvex Optimization , 2014, SIAM J. Imaging Sci..
[54] Zhouchen Lin,et al. Accelerating Asynchronous Algorithms for Convex Optimization by Momentum Compensation , 2018, ArXiv.
[55] Na Li,et al. Accelerated Distributed Nesterov Gradient Descent , 2017, IEEE Transactions on Automatic Control.
[56] Mohit Singh,et al. A geometric alternative to Nesterov's accelerated gradient descent , 2015, ArXiv.
[57] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.
[58] Lihua Xie,et al. Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).
[59] Usman A. Khan,et al. A Linear Algorithm for Optimization Over Directed Graphs With Geometric Convergence , 2018, IEEE Control Systems Letters.
[60] Mikael Johansson,et al. Convergence Analysis of Approximate Primal Solutions in Dual First-Order Methods , 2015, SIAM J. Optim..
[61] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.
[62] Nathan Srebro,et al. Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.
[63] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.
[64] Richard G. Baraniuk,et al. Fast Alternating Direction Optimization Methods , 2014, SIAM J. Imaging Sci..
[65] Soummya Kar,et al. Decentralized Stochastic Optimization and Machine Learning: A Unified Variance-Reduction Framework for Robust Performance and Fast Convergence , 2020, IEEE Signal Processing Magazine.
[66] Renato D. C. Monteiro,et al. Iteration-complexity of first-order penalty methods for convex programming , 2013, Math. Program..
[67] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[68] Zeyuan Allen-Zhu,et al. Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization , 2018, ICML.
[69] Andre Wibisono,et al. A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.
[70] Huan Li,et al. Accelerated Optimization for Machine Learning: First-Order Algorithms , 2020 .
[71] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.
[72] Tong Zhang,et al. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.
[73] Emmanuel J. Candès,et al. Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.
[74] Nathan Srebro,et al. Lower Bounds for Non-Convex Stochastic Optimization , 2019, ArXiv.
[75] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.
[76] Benjamin Recht,et al. Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..
[77] Ohad Shamir,et al. Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.
[78] Thomas Brox,et al. iPiasco: Inertial Proximal Algorithm for Strongly Convex Optimization , 2015, Journal of Mathematical Imaging and Vision.
[79] Zhouchen Lin,et al. Accelerated Alternating Direction Method of Multipliers: An Optimal O(1 / K) Nonergodic Analysis , 2016, Journal of Scientific Computing.
[80] Yin Tat Lee,et al. Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.
[81] Ohad Shamir,et al. On Lower and Upper Bounds in Smooth and Strongly Convex Optimization , 2016, J. Mach. Learn. Res..
[82] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[83] Antonin Chambolle,et al. On the ergodic convergence rates of a first-order primal–dual algorithm , 2016, Math. Program..
[84] Euhanna Ghadimi,et al. Global convergence of the Heavy-ball method for convex optimization , 2014, 2015 European Control Conference (ECC).
[85] Dmitriy Drusvyatskiy,et al. An Optimal First Order Method Based on Optimal Quadratic Averaging , 2016, SIAM J. Optim..
[86] Bikash Joshi,et al. An Explicit Convergence Rate for Nesterov's Method from SDP , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).
[87] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[88] Guanghui Lan,et al. Gradient sliding for composite optimization , 2014, Mathematical Programming.
[89] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[90] Peter Richtárik,et al. Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..
[91] Shai Shalev-Shwartz,et al. SDCA without Duality, Regularization, and Individual Convexity , 2016, ICML.
[92] Angelia Nedic,et al. Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..
[93] Huan Li,et al. Accelerated Proximal Gradient Methods for Nonconvex Programming , 2015, NIPS.
[94] Zaïd Harchaoui,et al. Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice , 2017, J. Mach. Learn. Res..
[95] Yunmei Chen,et al. An Accelerated Linearized Alternating Direction Method of Multipliers , 2014, SIAM J. Imaging Sci..
[96] Rong Jin,et al. Linear Convergence with Condition Number Independent Access of Full Gradients , 2013, NIPS.
[97] Zeyuan Allen Zhu,et al. Optimal Black-Box Reductions Between Optimization Objectives , 2016, NIPS.
[98] Michael I. Jordan,et al. On Nonconvex Optimization for Machine Learning , 2019, J. ACM.
[99] Guilherme França,et al. An explicit rate bound for over-relaxed ADMM , 2015, 2016 IEEE International Symposium on Information Theory (ISIT).
[100] Kilian Q. Weinberger,et al. Optimal Convergence Rates for Convex Distributed Optimization in Networks , 2019, J. Mach. Learn. Res..
[101] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..
[102] Michael I. Jordan,et al. Stochastic Gradient Descent Escapes Saddle Points Efficiently , 2019, ArXiv.
[103] Soummya Kar,et al. Variance-Reduced Decentralized Stochastic Optimization With Accelerated Convergence , 2020, IEEE Transactions on Signal Processing.
[104] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
[105] Yi Zhou,et al. An optimal randomized incremental gradient method , 2015, Mathematical Programming.
[106] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.
[107] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[108] Chih-Jen Lin,et al. Iteration complexity of feasible descent methods for convex optimization , 2014, J. Mach. Learn. Res..
[109] Yuchen Zhang,et al. Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.
[110] Martin Jaggi,et al. An accelerated communication-efficient primal-dual optimization framework for structured machine learning , 2017, ArXiv.
[111] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[112] Ali H. Sayed,et al. Diffusion Adaptation Strategies for Distributed Optimization and Learning Over Networks , 2011, IEEE Transactions on Signal Processing.
[113] Usman A. Khan,et al. Distributed Nesterov Gradient Methods Over Arbitrary Graphs , 2019, IEEE Signal Processing Letters.
[114] José M. F. Moura,et al. Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.
[115] Zheng Qu,et al. Adaptive restart of accelerated gradient methods under local quadratic growth condition , 2017, IMA Journal of Numerical Analysis.
[116] Guanghui Lan,et al. A unified variance-reduced accelerated gradient method for convex optimization , 2019, NeurIPS.
[117] Tianbao Yang,et al. First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time , 2017, NeurIPS.
[118] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..
[119] Yair Carmon,et al. "Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions , 2017, ICML.
[120] Laurent Massoulié,et al. An Accelerated Decentralized Stochastic Proximal Algorithm for Finite Sums , 2019, NeurIPS.
[121] Zhouchen Lin,et al. Revisiting EXTRA for Smooth Distributed Optimization , 2020, SIAM J. Optim..
[122] Aaron Defazio,et al. A Simple Practical Accelerated Method for Finite Sums , 2016, NIPS.
[123] Yurii Nesterov,et al. Efficiency of the Accelerated Coordinate Descent Method on Structured Optimization Problems , 2017, SIAM J. Optim..
[124] S. Hyakin,et al. Neural Networks: A Comprehensive Foundation , 1994 .
[125] Laurent Massoulié,et al. Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.
[126] Stephen P. Boyd,et al. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..
[127] Lin Xiao,et al. On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.
[128] Guanghui Lan,et al. Accelerated gradient sliding for structured convex optimization , 2016, Computational Optimization and Applications.
[129] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..
[130] Zeyuan Allen Zhu,et al. Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling , 2015, ICML.
[131] Fanhua Shang,et al. A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates , 2018, ICML.
[132] Yi Zhou,et al. Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..
[133] Thomas Hofmann,et al. Escaping Saddles with Stochastic Gradients , 2018, ICML.
[134] Aryan Mokhtari,et al. DSA: Decentralized Double Stochastic Averaging Gradient Algorithm , 2015, J. Mach. Learn. Res..
[135] Yair Carmon,et al. Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..
[136] Zhouchen Lin,et al. Sharp Analysis for Nonconvex SGD Escaping from Saddle Points , 2019, COLT.
[137] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[138] Angelia Nedic,et al. A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks , 2018, 2020 Information Theory and Applications Workshop (ITA).
[139] Michael I. Jordan,et al. Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.
[140] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..