Dualize, Split, Randomize: Fast Nonsmooth Optimization Algorithms

We introduce new primal-dual algorithms to minimize the sum of three convex functions, each having its own oracle. Namely, the first one is differentiable, smooth and possibly stochastic, the second is proximable, and the last one is a composition of a proximable function with a linear map. By leveraging variance reduction, we prove convergence to an exact solution with sublinear or linear rates, depending on strong convexity properties. The proposed theory is simple and unified by the umbrella of stochastic Davis-Yin splitting, which we first design in this work. Our theory covers several settings that are not tackled by any existing algorithm; we illustrate their importance with real-world applications and we show the efficiency of our algorithms by numerical experiments.

[1]  Zhouchen Lin,et al.  Revisiting EXTRA for Smooth Distributed Optimization , 2020, SIAM J. Optim..

[2]  Aaron Defazio,et al.  A Simple Practical Accelerated Method for Finite Sums , 2016, NIPS.

[3]  Aurélien Lucchi,et al.  Variance Reduced Stochastic Gradient Descent with Neighbors , 2015, NIPS.

[4]  Francis R. Bach,et al.  Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.

[5]  F. Bach,et al.  Stochastic quasi-gradient methods: variance reduction via Jacobian sketching , 2018, Mathematical Programming.

[6]  Colin N. Jones,et al.  Operator Splitting Methods in Control , 2016, Found. Trends Syst. Control..

[7]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[8]  Fabian Pedregosa,et al.  Proximal Splitting Meets Variance Reduction , 2018, AISTATS.

[9]  Ali H. Sayed,et al.  Decentralized Proximal Gradient Algorithms with Linear Convergence Rates , 2019 .

[10]  Peter Richtárik,et al.  Don't Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop , 2019, ALT.

[11]  Sebastian U. Stich,et al.  Stochastic Distributed Learning with Gradient Quantization and Variance Reduction , 2019, 1904.05115.

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Laurent Condat,et al.  Discrete Total Variation: New Definition and Minimization , 2017, SIAM J. Imaging Sci..

[14]  I. M. Otivation Playing with Duality: An Overview of Recent Primal-Dual Approaches for Solving Large-Scale Optimization Problems , 2018 .

[15]  Volkan Cevher,et al.  Stochastic Three-Composite Convex Minimization , 2017, NIPS.

[16]  Michael B. Wakin Sparse Image and Signal Processing: Wavelets, Curvelets, Morphological Diversity (Starck, J.-L., et al; 2010) [Book Reviews] , 2011, IEEE Signal Processing Magazine.

[17]  Ming Yan,et al.  A new primal-dual algorithm for minimizing the sum of three functions with a linear operator , 2016, 1611.09805.

[18]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[19]  James G. Scott,et al.  Proximal Algorithms in Statistics and Machine Learning , 2015, ArXiv.

[20]  Walid Hachem,et al.  A fully stochastic primal-dual algorithm , 2019, Optimization Letters.

[21]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[22]  Laurent Condat,et al.  A Primal–Dual Splitting Method for Convex Optimization Involving Lipschitzian, Proximable and Linear Composite Terms , 2012, Journal of Optimization Theory and Applications.

[23]  Peter Richtárik,et al.  A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent , 2019, AISTATS.

[24]  Laurent Condat,et al.  Proximal Splitting Algorithms: Overrelax them all! , 2019 .

[25]  Patrick L. Combettes,et al.  A forward-backward view of some primal-dual optimization methods in image recovery , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[26]  Damek Davis,et al.  A Three-Operator Splitting Scheme and its Optimization Applications , 2015, 1504.01032.

[27]  Antonin Chambolle,et al.  An introduction to continuous optimization for imaging , 2016, Acta Numerica.

[28]  Ying Sun,et al.  Distributed Algorithms for Composite Optimization: Unified Framework and Convergence Analysis , 2020, IEEE Transactions on Signal Processing.

[29]  Bang Công Vu,et al.  A splitting algorithm for dual monotone inclusions involving cocoercive operators , 2011, Advances in Computational Mathematics.

[30]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[31]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[32]  Ali Emrouznejad,et al.  Big Data Optimization: Recent Developments and Challenges , 2016 .

[33]  Jean-Yves Audibert Optimization for Machine Learning , 1995 .

[34]  Robert Tibshirani,et al.  Principal component‐guided sparse regression , 2018, Canadian Journal of Statistics.

[35]  Yujun Li,et al.  A Stochastic Proximal Point Algorithm for Saddle-Point Problems , 2019, ArXiv.

[36]  Volkan Cevher,et al.  Stochastic Three-Composite Convex Minimization with a Linear Operator , 2018, AISTATS.

[37]  P. L. Combettes,et al.  Primal-Dual Splitting Algorithm for Solving Inclusions with Mixtures of Composite, Lipschitzian, and Parallel-Sum Type Monotone Operators , 2011, Set-Valued and Variational Analysis.

[38]  Peter Richtárik,et al.  SEGA: Variance Reduction via Gradient Sketching , 2018, NeurIPS.

[39]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..