One-sided Frank-Wolfe algorithms for saddle problems

We study a class of convex-concave saddlepoint problems of the form minx maxy〈Kx, y〉+ fP(x)−h(y) whereK is a linear operator, fP is the sum of a convex function f with a Lipschitzcontinuous gradient and the indicator function of a bounded convex polytope P , and h∗ is a convex (possibly nonsmooth) function. Such problem arises, for example, as a Lagrangian relaxation of various discrete optimization problems. Our main assumptions are the existence of an efficient linear minimization oracle (lmo) for fP and an efficient proximal map (prox) for h∗ which motivate the solution via a blend of proximal primaldual algorithms and Frank-Wolfe algorithms. In case h∗ is the indicator function of a linear constraint and function f is quadratic, we show a O(1/n) convergence rate on the dual objective, requiring O(n log n) calls of lmo. If the problem comes from the constrained optimization problem minx∈Rd{fP(x) |Ax− b = 0} then we additionally get bound O(1/n) both on the primal gap and on the infeasibility gap. In the most general case, we show a O(1/n) convergence rate of the primal-dual gap again requiring O(n log n) calls of lmo. To the best of our knowledge, this improves on the known convergence rates for the considered class of saddle-point problems. We show applications to labeling problems frequently appearing in machine learning and computer vision.

[1]  Stephen Gould,et al.  Accelerated dual decomposition for MAP inference , 2010, ICML.

[2]  Tony Jebara,et al.  Frank-Wolfe Algorithms for Saddle Point Problems , 2016, AISTATS.

[3]  Martin J. Wainwright,et al.  Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes , 2010, J. Mach. Learn. Res..

[4]  Saverio Salzo,et al.  Inexact and accelerated proximal point algorithms , 2011 .

[5]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[6]  Friedrich Fraundorfer,et al.  Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Amir Beck,et al.  First-Order Methods in Optimization , 2017 .

[8]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[9]  Xinhua Zhang,et al.  Decomposition-Invariant Conditional Gradient for General Polytopes with Line Search , 2017, NIPS.

[10]  Anton Osokin,et al.  Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs , 2016, ICML.

[11]  Julian Rasch,et al.  Inexact first-order primal–dual algorithms , 2018, Computational Optimization and Applications.

[12]  Christoph Schnörr,et al.  A study of Nesterov's scheme for Lagrangian decomposition and MAP labeling , 2011, CVPR 2011.

[13]  Laurent Condat,et al.  Proximal Splitting Algorithms: A Tour of Recent Advances, with New Twists. , 2020 .

[14]  Jean-François Aujol,et al.  Stability of Over-Relaxations for the Forward-Backward Algorithm, Application to FISTA , 2015, SIAM J. Optim..

[15]  Eric P. Xing,et al.  An Augmented Lagrangian Approach to Constrained MAP Inference , 2011, ICML.

[16]  Johan A. K. Suykens,et al.  Hybrid Conditional Gradient - Smoothing Algorithms with Applications to Sparse and Low Rank Regularization , 2014, ArXiv.

[17]  Shiqian Ma,et al.  On the Nonergodic Convergence Rate of an Inexact Augmented Lagrangian Framework for Composite Convex Programming , 2016, Math. Oper. Res..

[18]  Osman Güler,et al.  New Proximal Point Algorithms for Convex Minimization , 1992, SIAM J. Optim..

[19]  Nikos Komodakis,et al.  MRF Energy Minimization and Beyond via Dual Decomposition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Paul Grigas,et al.  An Extended Frank-Wolfe Method with "In-Face" Directions, and Its Application to Low-Rank Matrix Completion , 2015, SIAM J. Optim..

[21]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[22]  Marc Pollefeys,et al.  Globally Convergent Dual MAP LP Relaxation Solvers using Fenchel-Young Margins , 2012, NIPS.

[23]  Gauthier Gidel,et al.  Frank-Wolfe Splitting via Augmented Lagrangian Method , 2018, AISTATS.

[24]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[25]  Dmitry M. Malioutov,et al.  Lagrangian Relaxation for MAP Estimation in Graphical Models , 2007, ArXiv.

[26]  Ofer Meshi,et al.  Linear-Memory and Decomposition-Invariant Linearly Convergent Conditional Gradient Algorithm for Structured Polytopes , 2016, NIPS.

[27]  Christoph H. Lampert,et al.  A multi-plane block-coordinate frank-wolfe algorithm for training structural SVMs with a costly max-oracle , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Christoph Schnörr,et al.  Efficient MRF Energy Minimization via Adaptive Diminishing Smoothing , 2012, UAI.

[29]  Christoph Schnörr,et al.  Evaluation of a First-Order Primal-Dual Algorithm for MRF Energy Minimization , 2011, EMMCVPR.

[30]  Marc Pollefeys,et al.  Globally Convergent Parallel MAP LP Relaxation Solver using the Frank-Wolfe Algorithm , 2014, ICML.

[31]  Yi Zhou,et al.  Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..

[32]  Shimrit Shtern,et al.  Linearly convergent away-step conditional gradient for non-strongly convex functions , 2015, Mathematical Programming.

[33]  Mark W. Schmidt,et al.  Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[34]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[35]  Berç Rustem,et al.  Solving MRF Minimization by Mirror Descent , 2012, ISVC.

[36]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[37]  Geir Dahl,et al.  Lagrangian-based methods for finding MAP solutions for MRF models , 2000, IEEE Trans. Image Process..

[38]  Bogdan Savchynskyy,et al.  Discrete Graphical Models - An Optimization Perspective , 2019, Found. Trends Comput. Graph. Vis..

[39]  Tomás Werner,et al.  A Linear Programming Approach to Max-Sum Problem: A Review , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Mohamed-Jalal Fadili,et al.  Generalized Conditional Gradient with Augmented Lagrangian for Composite Minimization , 2019, SIAM J. Optim..

[41]  Volkan Cevher,et al.  A Conditional Gradient Framework for Composite Convex Minimization with Applications to Semidefinite Programming , 2018, ICML.

[42]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[43]  Vladimir Kolmogorov,et al.  MAP Inference via Block-Coordinate Frank-Wolfe Algorithm , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Pushmeet Kohli,et al.  Markov Random Fields for Vision and Image Processing , 2011 .

[45]  Sebastian Pokutta,et al.  Blended Conditional Gradients: the unconditioning of conditional gradients , 2018, ICML 2019.