Inertial Block Proximal Methods for Non-Convex Non-Smooth Optimization

We propose inertial versions of block coordinate descent methods for solving non-convex non-smooth composite optimization problems. Our methods possess three main advantages compared to current state-of-the-art accelerated first-order methods: (1) they allow using two different extrapolation points to evaluate the gradients and to add the inertial force (we will empirically show that it is more efficient than using a single extrapolation point), (2) they allow to randomly picking the block of variables to update, and (3) they do not require a restarting step. We prove the subsequential convergence of the generated sequence under mild assumptions, prove the global convergence under some additional assumptions, and provide convergence rates. We deploy the proposed methods to solve non-negative matrix factorization (NMF) and show that they compete favorably with the state-of-the-art NMF algorithms. Additional experiments on non-negative approximate canonical polyadic decomposition, also known as non-negative tensor factorization, are also provided.

[1]  Clifford Hildreth,et al.  A quadratic programming procedure , 1957 .

[2]  BolteJérôme,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems , 2010 .

[3]  Marc Teboulle,et al.  First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems , 2017, SIAM J. Optim..

[4]  Benar Fux Svaiter,et al.  Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods , 2013, Math. Program..

[5]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[6]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[7]  Nicolas Gillis,et al.  Hierarchical Clustering of Hyperspectral Images Using Rank-Two Nonnegative Matrix Factorization , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[8]  Wotao Yin,et al.  A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update , 2014, J. Sci. Comput..

[9]  Nicolas Gillis,et al.  Accelerating Nonnegative Matrix Factorization Algorithms Using Extrapolation , 2018, Neural Computation.

[10]  Marc Teboulle,et al.  Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions , 1993, SIAM J. Optim..

[11]  Haesun Park,et al.  Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework , 2014, J. Glob. Optim..

[12]  Tamir Hazan,et al.  Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.

[13]  Wotao Yin,et al.  A fast patch-dictionary method for whole image recovery , 2014, ArXiv.

[14]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[15]  Thomas Brox,et al.  iPiano: Inertial Proximal Algorithm for Nonconvex Optimization , 2014, SIAM J. Imaging Sci..

[16]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[17]  Alvaro R. De Pierro,et al.  Re-examination of Bregman functions and new properties of their divergences , 2018, Optimization.

[18]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[19]  Peter Ochs,et al.  Unifying Abstract Inexact Convergence Theorems and Block Coordinate Variable Metric iPiano , 2016, SIAM J. Optim..

[20]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[21]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[22]  Nicolas Gillis,et al.  Accelerated Multiplicative Updates and Hierarchical ALS Algorithms for Nonnegative Matrix Factorization , 2011, Neural Computation.

[23]  Adrian S. Lewis,et al.  The [barred L]ojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems , 2006, SIAM J. Optim..

[24]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[25]  Masoud Ahookhosh,et al.  A Bregman Forward-Backward Linesearch Algorithm for Nonconvex Composite Optimization: Superlinear Convergence to Nonisolated Local Minima , 2020, SIAM J. Optim..

[26]  Radu Ioan Bot,et al.  An Inertial Tseng’s Type Proximal Algorithm for Nonsmooth and Nonconvex Optimization Problems , 2014, J. Optim. Theory Appl..

[27]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[28]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[29]  Marc Teboulle,et al.  A simplified view of first order methods for optimization , 2018, Math. Program..

[30]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[31]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[32]  Nicolas Gillis,et al.  The Why and How of Nonnegative Matrix Factorization , 2014, ArXiv.

[33]  Marc Teboulle,et al.  Interior Gradient and Proximal Methods for Convex and Conic Optimization , 2006, SIAM J. Optim..

[34]  Thomas Pock,et al.  Inertial Proximal Alternating Linearized Minimization (iPALM) for Nonconvex and Nonsmooth Problems , 2016, SIAM J. Imaging Sci..

[35]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[36]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[37]  Jonathan Eckstein,et al.  Nonlinear Proximal Point Algorithms Using Bregman Functions, with Applications to Convex Programming , 1993, Math. Oper. Res..

[38]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[39]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[40]  Marc Teboulle,et al.  A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications , 2017, Math. Oper. Res..

[41]  Nikos D. Sidiropoulos,et al.  A Flexible and Efficient Algorithmic Framework for Constrained Matrix and Tensor Factorization , 2015, IEEE Transactions on Signal Processing.

[42]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[43]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[44]  Marie-Françoise Roy,et al.  Real algebraic geometry , 1992 .

[45]  A. Auslender Asymptotic properties of the fenchel dual functional and applications to decomposition problems , 1992 .

[46]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[47]  Luigi Grippo,et al.  On the convergence of the block nonlinear Gauss-Seidel method under convex constraints , 2000, Oper. Res. Lett..

[48]  Wing-Kin Ma,et al.  Nonnegative Matrix Factorization for Signal and Data Analytics: Identifiability, Algorithms, and Applications , 2018, IEEE Signal Processing Magazine.

[49]  M. J. D. Powell,et al.  On search directions for minimization algorithms , 1973, Math. Program..

[50]  S. K. Zavriev,et al.  Heavy-ball method in nonconvex optimization problems , 1993 .

[51]  Marc Teboulle,et al.  Convergence of Proximal-Like Algorithms , 1997, SIAM J. Optim..

[52]  Hédy Attouch,et al.  On the convergence of the proximal algorithm for nonsmooth functions involving analytic features , 2008, Math. Program..

[53]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..