The Approximate Duality Gap Technique: A Unified Theory of First-Order Methods

We present a general technique for the analysis of first-order methods. The technique relies on the construction of a duality gap for an appropriate approximation of the objective function, where the function approximation improves as the algorithm converges. We show that in continuous time enforcement of an invariant that this approximate duality gap decreases at a certain rate exactly recovers a wide range of first-order continuous-time methods. We characterize the discretization errors incurred by different discretization methods, and show how iteration-complexity-optimal methods for various classes of problems cancel out the discretization error. The technique is illustrated on various classes of problems -- including solving variational inequalities for smooth monotone operators, convex minimization for Lipschitz-continuous objectives, smooth convex minimization, composite minimization, and smooth and strongly convex minimization -- and naturally extends to other settings.

[1]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[2]  Stephen J. Wright,et al.  Optimization for Machine Learning , 2013 .

[3]  Satish Rao,et al.  A new approach to computing maximum flows using electrical flows , 2013, STOC '13.

[4]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[5]  Jelena Diakonikolas,et al.  Alternating Randomized Block Coordinate Descent , 2018, ICML.

[6]  Dimitri P. Bertsekas,et al.  Convex Analysis and Optimization , 2003 .

[7]  Jelena Diakonikolas,et al.  Solving Packing and Covering LPs in $\tilde{O}(\frac{1}{\epsilon^2})$ Distributed Iterations with a Single Algorithm and Simpler Analysis , 2017 .

[8]  F. Bach,et al.  Integration Methods and Accelerated Optimization Algorithms , 2017, 1702.06751.

[9]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[10]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[11]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[12]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[13]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[14]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[15]  Arkadi Nemirovski,et al.  Lectures on modern convex optimization - analysis, algorithms, and engineering applications , 2001, MPS-SIAM series on optimization.

[16]  G. M. Korpelevich The extragradient method for finding saddle points and other problems , 1976 .

[17]  Maryam Fazel,et al.  Width-Independence Beyond Linear Objectives: Distributed Fair Packing and Covering Algorithms , 2018, ArXiv.

[18]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[19]  Yurii Nesterov,et al.  Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[20]  Yin Tat Lee,et al.  An Almost-Linear-Time Algorithm for Approximate Max Flow in Undirected Graphs, and its Multicommodity Generalizations , 2013, SODA.

[21]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[22]  Nikhil Bansal,et al.  Potential-Function Proofs for First-Order Methods , 2017, ArXiv.

[23]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[24]  Zeyuan Allen Zhu,et al.  Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[25]  Yurii Nesterov,et al.  Universal gradient methods for convex optimization problems , 2015, Math. Program..

[26]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[27]  Yurii Nesterov,et al.  Complexity bounds for primal-dual methods minimizing the model of objective function , 2017, Mathematical Programming.

[28]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[29]  Zeyuan Allen Zhu,et al.  A simple, combinatorial algorithm for solving SDD systems in nearly-linear time , 2013, STOC '13.

[30]  D. Bertsekas Control of uncertain systems with a set-membership description of the uncertainty , 1971 .

[31]  Michael I. Jordan,et al.  A Lyapunov Analysis of Momentum Methods in Optimization , 2016, ArXiv.

[32]  Huy L. Nguyen,et al.  Constrained Submodular Maximization: Beyond 1/e , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[33]  Mohit Singh,et al.  A geometric alternative to Nesterov's accelerated gradient descent , 2015, ArXiv.

[34]  Dmitriy Drusvyatskiy,et al.  An Optimal First Order Method Based on Optimal Quadratic Averaging , 2016, SIAM J. Optim..

[35]  Michael Cohen,et al.  On Acceleration with Noise-Corrupted Gradients , 2018, ICML.

[36]  Jonah Sherman,et al.  Nearly Maximum Flows in Nearly Linear Time , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.