Bregman Finito/MISO for nonconvex regularized finite sum minimization without Lipschitz gradient continuity

We introduce two algorithms for nonconvex regularized finite sum minimization, where typical Lipschitz differentiability assumptions are relaxed to the notion of relative smoothness [7]. The first one is a Bregman extension of Finito/MISO [28, 42], studied for fully nonconvex problems when the sampling is random, or under convexity of the nonsmooth term when it is essentially cyclic. The second algorithm is a low-memory variant, in the spirit of SVRG [34] and SARAH [48], that also allows for fully nonconvex formulations. Our analysis is made remarkably simple by employing a Bregman Moreau envelope as Lyapunov function. In the randomized case, linear convergence is established when the cost function is strongly convex, yet with no convexity requirements on the individual functions in the sum. For the essentially cyclic and low-memory variants, global and linear convergence results are established when the cost function satisfies the Kurdyka-Łojasiewicz property.

[1]  Zeyuan Allen Zhu,et al.  Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives , 2015, ICML.

[2]  Wotao Yin,et al.  A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update , 2014, J. Sci. Comput..

[3]  Panagiotis Patrinos,et al.  Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problems , 2019, Mathematical Programming.

[4]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[5]  Marc Teboulle,et al.  On Linear Convergence of Non-Euclidean Gradient Methods without Strong Convexity and Lipschitz Gradient Continuity , 2019, J. Optim. Theory Appl..

[6]  Ting Kei Pong,et al.  Deducing Kurdyka-{\L}ojasiewicz exponent via inf-projection , 2019, 1902.03635.

[7]  Feng Ruan,et al.  Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval , 2017, Information and Inference: A Journal of the IMA.

[8]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[9]  Yonina C. Eldar,et al.  Phase Retrieval with Application to Optical Imaging: A contemporary overview , 2015, IEEE Signal Processing Magazine.

[10]  Justin Domke,et al.  Finito: A faster, permutable incremental gradient method for big data problems , 2014, ICML.

[11]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[12]  Niao He,et al.  On the Convergence Rate of Stochastic Mirror Descent for Nonsmooth Nonconvex Optimization , 2018, 1806.04781.

[13]  Wen Song,et al.  The Moreau envelope function and proximal mapping in the sense of the Bregman distance , 2012 .

[14]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[15]  Wei Peng,et al.  Proximal-Like Incremental Aggregated Gradient Method with Linear Convergence Under Bregman Distance Growth Conditions , 2017, Math. Oper. Res..

[16]  BolteJérôme,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems , 2010 .

[17]  Marc Teboulle,et al.  Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions , 1993, SIAM J. Optim..

[18]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[19]  Julien Mairal,et al.  Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[20]  Alfred O. Hero,et al.  A Convergent Incremental Gradient Method with a Constant Step Size , 2007, SIAM J. Optim..

[21]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[22]  Yuxin Chen,et al.  Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems , 2015, NIPS.

[23]  Panos M. Pardalos,et al.  Convex optimization theory , 2010, Optim. Methods Softw..

[24]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[25]  James V. Burke,et al.  Optical Wavefront Reconstruction: Theory and Numerical Methods , 2002, SIAM Rev..

[26]  Peter Richtárik,et al.  Fastest rates for stochastic mirror descent methods , 2018, Computational Optimization and Applications.

[27]  Heinz H. Bauschke,et al.  Legendre functions and the method of random Bregman projections , 1997 .

[28]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[29]  Asuman E. Ozdaglar,et al.  Global Convergence Rate of Proximal Incremental Aggregated Gradient Methods , 2016, SIAM J. Optim..

[30]  Marc Teboulle,et al.  A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications , 2017, Math. Oper. Res..

[31]  Zhi-Quan Luo,et al.  Iteration complexity analysis of block coordinate descent methods , 2013, Mathematical Programming.

[32]  Benar Fux Svaiter,et al.  An Inexact Hybrid Generalized Proximal Point Algorithm and Some New Results on the Theory of Bregman Functions , 2000, Math. Oper. Res..

[33]  Mohamed-Jalal Fadili,et al.  Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms , 2017, Journal of Optimization Theory and Applications.

[34]  Dmitriy Drusvyatskiy,et al.  Stochastic model-based minimization under high-order growth , 2018, ArXiv.

[35]  Hédy Attouch,et al.  On the convergence of the proximal algorithm for nonsmooth functions involving analytic features , 2008, Math. Program..

[36]  Marc Teboulle,et al.  A simplified view of first order methods for optimization , 2018, Math. Program..

[37]  Richard G. Baraniuk,et al.  Coherent inverse scattering via transmission matrices: Efficient phase retrieval algorithms and a public dataset , 2017, 2017 IEEE International Conference on Computational Photography (ICCP).

[38]  Yurii Nesterov,et al.  Relatively Smooth Convex Optimization by First-Order Methods, and Applications , 2016, SIAM J. Optim..

[39]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[40]  Angelia Nedic,et al.  On Stochastic Subgradient Mirror-Descent Algorithm with Weighted Averaging , 2013, SIAM J. Optim..

[41]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[42]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[43]  Aryan Mokhtari,et al.  Surpassing Gradient Descent Provably: A Cyclic Incremental Method with Linear Convergence Rate , 2016, SIAM J. Optim..

[44]  Heinz H. Bauschke,et al.  ESSENTIAL SMOOTHNESS, ESSENTIAL STRICT CONVEXITY, AND LEGENDRE FUNCTIONS IN BANACH SPACES , 2001 .

[45]  Jia Liu,et al.  Randomized Bregman Coordinate Descent Methods for Non-Lipschitz Optimization , 2020, ArXiv.

[46]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[47]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[48]  Adrian S. Lewis,et al.  The [barred L]ojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems , 2006, SIAM J. Optim..

[49]  Paul Tseng,et al.  Relaxation methods for problems with strictly convex separable costs and linear constraints , 1987, Math. Program..

[50]  Yingbin Liang,et al.  Provable Non-convex Phase Retrieval with Outliers: Median TruncatedWirtinger Flow , 2016, ICML.

[51]  Peter Richtárik,et al.  MISO is Making a Comeback With Better Proofs and Rates , 2019, 1906.01474.

[52]  Adrian S. Lewis,et al.  Clarke Subgradients of Stratifiable Functions , 2006, SIAM J. Optim..

[53]  Dimitri P. Bertsekas,et al.  Incremental proximal methods for large scale convex optimization , 2011, Math. Program..

[54]  Haihao Lu “Relative Continuity” for Non-Lipschitz Nonsmooth Convex Optimization Using Stochastic (or Deterministic) Mirror Descent , 2017, INFORMS Journal on Optimization.

[55]  Marc Teboulle,et al.  First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems , 2017, SIAM J. Optim..

[56]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[57]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[58]  Amir Beck,et al.  First-Order Methods in Optimization , 2017 .

[59]  Wotao Yin,et al.  Cyclic Coordinate-Update Algorithms for Fixed-Point Problems: Analysis and Applications , 2016, SIAM J. Sci. Comput..

[60]  S. Łojasiewicz Sur la géométrie semi- et sous- analytique , 1993 .

[61]  H. Robbins,et al.  A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .

[62]  K. Kurdyka On gradients of functions definable in o-minimal structures , 1998 .

[63]  Paul Tseng,et al.  Incrementally Updated Gradient Methods for Constrained and Regularized Optimization , 2013, Journal of Optimization Theory and Applications.

[64]  Benar Fux Svaiter,et al.  Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods , 2013, Math. Program..

[65]  John N. Tsitsiklis,et al.  Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..

[66]  Masoud Ahookhosh,et al.  A Bregman Forward-Backward Linesearch Algorithm for Nonconvex Composite Optimization: Superlinear Convergence to Nonisolated Local Minima , 2020, SIAM J. Optim..

[67]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[68]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[69]  Damek Davis,et al.  The nonsmooth landscape of phase retrieval , 2017, IMA Journal of Numerical Analysis.

[70]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[71]  M. R. Spiegel Mathematical handbook of formulas and tables , 1968 .