Block Coordinate Proximal Gradient Method for Nonconvex Optimization Problems : Convergence Analysis

We propose a block coordinate proximal gradient method for a composite minimization problem with two nonconvex function components in the objective while only one of them is assumed to be differentiable. Under some per-block Lipschitz-like conditions based on Bregman distance, but without the global Lipschitz continuity of the gradient of the differentiable function, we prove that any accumulation point of the sequence is a stationary point of the model. We further show that the stationarity is the “best” one if the global Lipschitz continuity is additionally assumed, and even the local minimizer for some special cases. Convergence analysis without the global Lipschitz continuity and the enhanced stationarity analysis make our results different from existing results in both the convex and nonconvex contexts.

[1]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[2]  I. Csiszár Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .

[3]  Z.-Q. Luo,et al.  Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[4]  Marc Teboulle,et al.  Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions , 1993, SIAM J. Optim..

[5]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[6]  K. Kurdyka On gradients of functions definable in o-minimal structures , 1998 .

[7]  B. Mordukhovich Variational Analysis and Generalized Differentiation II: Applications , 2006 .

[8]  Sven Leyffer,et al.  A Globally Convergent Filter Method for MPECs , 2007 .

[9]  Alternating minimization and projection methods for nonconvex problems , 2008 .

[10]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[11]  S. Shalev-Shwartz,et al.  Stochastic methods for {\it l}$_{\mbox{1}}$ regularized loss minimization , 2009, ICML 2009.

[12]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[13]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[14]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[15]  Yonina C. Eldar,et al.  Sparsity Constrained Nonlinear Optimization: Optimality Conditions and Algorithms , 2012, SIAM J. Optim..

[16]  Ambuj Tewari,et al.  On the Nonasymptotic Convergence of Cyclic Coordinate Descent Methods , 2013, SIAM J. Optim..

[17]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[18]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[19]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[20]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[21]  Stephen J. Wright,et al.  An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..

[22]  Juan Peypouquet,et al.  Splitting Methods with Variable Metric for Kurdyka–Łojasiewicz Functions and General Convergence Rates , 2015, J. Optim. Theory Appl..

[23]  Mingyi Hong,et al.  Improved Iteration Complexity Bounds of Cyclic Block Coordinate Descent for Convex Problems , 2015, NIPS.

[24]  Mark W. Schmidt,et al.  Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection , 2015, ICML.

[25]  Stephen J. Wright,et al.  Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties , 2014, SIAM J. Optim..

[26]  Peter Richtárik,et al.  Optimization in High Dimensions via Accelerated, Parallel, and Proximal Coordinate Descent , 2016, SIAM Rev..

[27]  Zhi-Quan Luo,et al.  A Unified Algorithmic Framework for Block-Structured Optimization Involving Big Data: With applications in machine learning and signal processing , 2015, IEEE Signal Processing Magazine.

[28]  Wotao Yin,et al.  A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update , 2014, J. Sci. Comput..

[29]  Zhi-Quan Luo,et al.  Iteration complexity analysis of block coordinate descent methods , 2013, Mathematical Programming.

[30]  Paul J. Goulart,et al.  A Novel Approach for Solving Convex Problems with Cardinality Constraints , 2017 .

[31]  Marc Teboulle,et al.  A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications , 2017, Math. Oper. Res..

[32]  Marc Teboulle,et al.  First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems , 2017, SIAM J. Optim..

[33]  Guoyin Li,et al.  Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods , 2016, Foundations of Computational Mathematics.