Convex interpolation and performance estimation of first-order methods for convex optimization

The goal of this thesis is to show how to derive in a completely automated way exact and global worst-case guarantees for first-order methods in convex optimization. To this end, we formulate a generic optimization problem looking for the worst-case scenarios. The worst-case computation problems, referred to as performance estimation problems (PEPs), are intrinsically infinite-dimensional optimization problems formulated over a given class of objective functions. To render those problems tractable, we develop (smooth and non-smooth) convex interpolation framework, which provides necessary and sufficient conditions to interpolate our objective functions. With this idea, we transform PEPs into solvable finite-dimensional semidefinite programs, from which one obtains worst-case guarantees and worst-case functions, along with the corresponding explicit proofs. PEPs already proved themselves very useful as a tool for developing convergence analyses of first-order optimization methods. Among others, PEPs allow obtaining exact guarantees for gradient methods, along with their inexact, projected, proximal, conditional, decentralized and accelerated versions.

[1]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[2]  Sartaj Sahni,et al.  Computationally Related Problems , 1974, SIAM J. Comput..

[3]  S. Kakade,et al.  On the duality of strong convexity and strong smoothness : Learning applications and matrix regularization , 2009 .

[4]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[5]  Arkadi S. Nemirovsky,et al.  Information-based complexity of linear operator equations , 1992, J. Complex..

[6]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[7]  Phillipp Kaestner,et al.  Linear And Nonlinear Programming , 2016 .

[8]  Yurii Nesterov,et al.  First-order methods of smooth convex optimization with inexact oracle , 2013, Mathematical Programming.

[9]  Stephen P. Boyd,et al.  A Primer on Monotone Operator Methods , 2015 .

[10]  J. Moreau Proximité et dualité dans un espace hilbertien , 1965 .

[11]  Osman Güer On the convergence of the proximal point algorithm for convex minimization , 1991 .

[12]  Arkadi Nemirovski,et al.  Lectures on modern convex optimization - analysis, algorithms, and engineering applications , 2001, MPS-SIAM series on optimization.

[13]  J. Gondzio,et al.  A Second-Order Method for Strongly Convex L1-Regularization Problems , 2013 .

[14]  Marc Teboulle,et al.  An optimal variant of Kelley’s cutting-plane method , 2014, Math. Program..

[15]  Marc Teboulle,et al.  Smoothing and First Order Methods: A Unified Framework , 2012, SIAM J. Optim..

[16]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[17]  Jacek Gondzio,et al.  Matrix-free interior point method , 2012, Comput. Optim. Appl..

[18]  Panos M. Pardalos,et al.  Quadratic programming with one negative eigenvalue is NP-hard , 1991, J. Glob. Optim..

[19]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[20]  Jean-Yves Audibert Optimization for Machine Learning , 1995 .

[21]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[22]  Krzysztof C. Kiwiel,et al.  Convergence of Approximate and Incremental Subgradient Methods for Convex Optimization , 2003, SIAM J. Optim..

[23]  Marc Teboulle,et al.  Fast Gradient-Based Algorithms for Constrained Total Variation Image Denoising and Deblurring Problems , 2009, IEEE Transactions on Image Processing.

[24]  Dimitri P. Bertsekas,et al.  Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey , 2015, ArXiv.

[25]  Jeffrey A. Fessler,et al.  On the Convergence Analysis of the Optimized Gradient Method , 2015, J. Optim. Theory Appl..

[26]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[27]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[28]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[29]  Adrien B. Taylor,et al.  Exact Worst-Case Performance of First-Order Methods for Composite Convex Optimization , 2015, SIAM J. Optim..

[30]  H. Attouch,et al.  The rate of convergence of Nesterov's accelerated forward-backward method is actually $o(k^{-2})$ , 2015 .

[31]  H. Robbins A Stochastic Approximation Method , 1951 .

[32]  Jeffrey A. Fessler,et al.  Generalizing the Optimized Gradient Method for Smooth Convex Minimization , 2016, SIAM J. Optim..

[33]  Yurii Nesterov,et al.  Subgradient methods for huge-scale optimization problems , 2013, Mathematical Programming.

[34]  Pontus Giselsson,et al.  Tight global linear convergence rate bounds for Douglas–Rachford splitting , 2015, Journal of Fixed Point Theory and Applications.

[35]  Yoel Drori,et al.  The exact information-based complexity of smooth convex minimization , 2016, J. Complex..

[36]  Alexander I. Barvinok,et al.  A Remark on the Rank of Positive Semidefinite Matrices Subject to Affine Constraints , 2001, Discret. Comput. Geom..

[37]  Francis R. Bach,et al.  Low-Rank Optimization on the Cone of Positive Semidefinite Matrices , 2008, SIAM J. Optim..

[38]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[39]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[40]  Asuman E. Ozdaglar,et al.  Constrained Consensus and Optimization in Multi-Agent Networks , 2008, IEEE Transactions on Automatic Control.

[41]  Nesa L'abbe Wu,et al.  Linear programming and extensions , 1981 .

[42]  Hui Zhang,et al.  Restricted strong convexity and its applications to convergence analysis of gradient-type methods in convex optimization , 2015, Optim. Lett..

[43]  Etienne de Klerk,et al.  Exploiting special structure in semidefinite programming: A survey of theory and applications , 2010, Eur. J. Oper. Res..

[44]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[45]  Marc Teboulle,et al.  Performance of first-order methods for smooth convex minimization: a novel approach , 2012, Mathematical Programming.

[46]  Stephen P. Boyd,et al.  Convex piecewise-linear fitting , 2009 .

[47]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[48]  Zeyuan Allen Zhu,et al.  Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[49]  R. Rockafellar SECOND-ORDER CONVEX ANALYSIS , 1999 .

[50]  Etienne de Klerk,et al.  On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions , 2016, Optimization Letters.

[51]  David B. Dunson,et al.  Multivariate convex regression with adaptive partitioning , 2011, J. Mach. Learn. Res..

[52]  Rujie Liu,et al.  A better convergence analysis of the block coordinate descent method for large scale machine learning , 2016, 1608.04826.

[53]  Jean-Pierre Crouzeix,et al.  A relationship between the second derivatives of a convex function and of its conjugate , 1977, Math. Program..

[54]  Dmitriy Drusvyatskiy,et al.  An Optimal First Order Method Based on Optimal Quadratic Averaging , 2016, SIAM J. Optim..

[55]  Pedro Morin,et al.  On uniform consistent estimators for convex regression , 2011 .

[56]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[57]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[58]  Yurii Nesterov,et al.  How to advance in Structural Convex Optimization , 2008 .

[59]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[60]  Mohit Singh,et al.  A geometric alternative to Nesterov's accelerated gradient descent , 2015, ArXiv.

[61]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[62]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[63]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[64]  Van Hien Nguyen,et al.  Finite Convex Integration , 2004 .

[65]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[66]  W. Fenchel Convex cones, sets, and functions , 1953 .

[67]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[68]  Donghwan Kim,et al.  Optimized first-order methods for smooth convex minimization , 2014, Mathematical Programming.

[69]  Yurii Nesterov,et al.  Double Smoothing Technique for Large-Scale Linearly Constrained Convex Optimization , 2012, SIAM J. Optim..

[70]  Mark W. Schmidt,et al.  Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[71]  Monique Laurent,et al.  A new graph parameter related to bounded rank positive semidefinite matrix completions , 2012, Math. Program..

[72]  Jeffrey A. Fessler,et al.  Another Look at the Fast Iterative Shrinkage/Thresholding Algorithm (FISTA) , 2016, SIAM J. Optim..

[73]  Benjamin Recht,et al.  Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[74]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[75]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[76]  Jonathan Eckstein Splitting methods for monotone operators with applications to parallel optimization , 1989 .

[77]  L. Vandenberghe,et al.  Convex Optimization in Signal Processing and Communications: Graphical models of autoregressive processes , 2009 .

[78]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[79]  Adrien B. Taylor,et al.  Smooth strongly convex interpolation and exact worst-case performance of first-order methods , 2015, Mathematical Programming.

[80]  Amir Beck,et al.  Quadratic Matrix Programming , 2006, SIAM J. Optim..

[81]  Jacek Gondzio,et al.  Interior point methods 25 years later , 2012, Eur. J. Oper. Res..

[82]  C. Hildreth Point Estimates of Ordinates of Concave Functions , 1954 .

[83]  J. Lofberg,et al.  YALMIP : a toolbox for modeling and optimization in MATLAB , 2004, 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508).

[84]  R. Zia,et al.  Making sense of the Legendre transform , 2008, 0806.1147.

[85]  Shang-Hua Teng,et al.  Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time , 2001, STOC '01.

[86]  David B. Dunson,et al.  Ensemble Methods for Convex Regression with Applications to Geometric Programming Based Circuit Design , 2012, ICML.

[87]  R. Dykstra,et al.  A Method for Finding Projections onto the Intersection of Convex Sets in Hilbert Spaces , 1986 .

[88]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[89]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[90]  Renato D. C. Monteiro,et al.  A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[91]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[92]  W. Fenchel On Conjugate Convex Functions , 1949, Canadian Journal of Mathematics.

[93]  H. Attouch,et al.  Fast Convergence of an Inertial Gradient-like System with Vanishing Viscosity , 2015, 1507.04782.

[94]  D. Gabay Applications of the method of multipliers to variational inequalities , 1983 .

[95]  Gábor Pataki,et al.  On the Rank of Extreme Matrices in Semidefinite Programs and the Multiplicity of Optimal Eigenvalues , 1998, Math. Oper. Res..

[96]  Karl Henrik Johansson,et al.  Subgradient methods and consensus algorithms for solving convex optimization problems , 2008, 2008 47th IEEE Conference on Decision and Control.

[97]  John B. Moore,et al.  A Newton-like method for solving rank constrained linear matrix inequalities , 2006, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[98]  P. Lions,et al.  Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .

[99]  James Renegar,et al.  A mathematical view of interior-point methods in convex optimization , 2001, MPS-SIAM series on optimization.