An Inertial Block Majorization Minimization Framework for Nonsmooth Nonconvex Optimization

In this paper, we introduce TITAN, a novel inerTial block majorIzation minimization framework for non-smooth non-convex opTimizAtioN problems. TITAN is a block coordinate method (BCM) that embeds inertial force to each majorization-minimization step of the block updates. The inertial force is obtained via an extrapolation operator that subsumes heavy-ball and Nesterov-type accelerations for block proximal gradient methods as special cases. By choosing various surrogate functions, such as proximal, Lipschitz gradient, Bregman, quadratic, and composite surrogate functions, and by varying the extrapolation operator, TITAN produces a rich set of inertial BCMs. We study sub-sequential convergence as well as global convergence for the generated sequence of TITAN. We illustrate the effectiveness of TITAN on two important machine learning problems, namely sparse non-negative matrix factorization and matrix completion.

[1]  Nicolas Gillis,et al.  Accelerated Multiplicative Updates and Hierarchical ALS Algorithms for Nonnegative Matrix Factorization , 2011, Neural Computation.

[2]  Julien Mairal,et al.  Optimization with First-Order Surrogate Functions , 2013, ICML.

[3]  Samir Adly,et al.  Finite Convergence of Proximal-Gradient Inertial Algorithms Combining Dry Friction with Hessian-Driven Damping , 2020, SIAM J. Optim..

[4]  Luigi Grippo,et al.  On the convergence of the block nonlinear Gauss-Seidel method under convex constraints , 2000, Oper. Res. Lett..

[5]  Adrian S. Lewis,et al.  The [barred L]ojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems , 2006, SIAM J. Optim..

[6]  Wotao Yin,et al.  A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update , 2014, J. Sci. Comput..

[7]  K. Kurdyka On gradients of functions definable in o-minimal structures , 1998 .

[8]  Émilie Chouzenoux,et al.  A block coordinate variable metric forward–backward algorithm , 2016, Journal of Global Optimization.

[9]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[10]  Nicolas Gillis,et al.  Inertial Block Proximal Methods for Non-Convex Non-Smooth Optimization , 2019, ICML.

[11]  Yurii Nesterov,et al.  Relatively Smooth Convex Optimization by First-Order Methods, and Applications , 2016, SIAM J. Optim..

[12]  Marc Teboulle,et al.  Novel Proximal Gradient Methods for Nonnegative Matrix Factorization with Sparsity Constraints , 2020, SIAM J. Imaging Sci..

[13]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[14]  Nicolas Gillis,et al.  Algorithms for Nonnegative Matrix Factorization with the Kullback-Leibler Divergence , 2020, ArXiv.

[15]  Hédy Attouch,et al.  On the convergence of the proximal algorithm for nonsmooth functions involving analytic features , 2008, Math. Program..

[16]  Marc Teboulle,et al.  A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications , 2017, Math. Oper. Res..

[17]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[18]  Yinyu Ye,et al.  Semidefinite programming based algorithms for sensor network localization , 2006, TOSN.

[19]  Clifford Hildreth,et al.  A quadratic programming procedure , 1957 .

[20]  Benar Fux Svaiter,et al.  Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods , 2013, Math. Program..

[21]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[22]  Jure Leskovec,et al.  The Network Completion Problem: Inferring Missing Nodes and Edges in Networks , 2011, SDM.

[23]  Zhi-Quan Luo,et al.  Iteration complexity analysis of block coordinate descent methods , 2013, Mathematical Programming.

[24]  Masoud Ahookhosh,et al.  Multi-block Bregman proximal alternating linearized minimization and its application to orthogonal nonnegative matrix factorization , 2019, Computational Optimization and Applications.

[25]  Thomas Brox,et al.  iPiano: Inertial Proximal Algorithm for Nonconvex Optimization , 2014, SIAM J. Imaging Sci..

[26]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[27]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[28]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[29]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[30]  Hoai An Le Thi,et al.  Group variable selection via ℓp,0 regularization and application to optimal scoring. , 2019, Neural networks : the official journal of the International Neural Network Society.

[31]  Peter Ochs,et al.  Unifying Abstract Inexact Convergence Theorems and Block Coordinate Variable Metric iPiano , 2016, SIAM J. Optim..

[32]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[33]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[34]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[36]  Marie-Françoise Roy,et al.  Real algebraic geometry , 1992 .

[37]  Masoud Ahookhosh,et al.  A block inertial Bregman proximal algorithm for nonsmooth nonconvex problems , 2020, ArXiv.

[38]  M. J. D. Powell,et al.  On search directions for minimization algorithms , 1973, Math. Program..

[39]  S. K. Zavriev,et al.  Heavy-ball method in nonconvex optimization problems , 1993 .

[40]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[41]  Thomas Pock,et al.  Inertial Proximal Alternating Linearized Minimization (iPALM) for Nonconvex and Nonsmooth Problems , 2016, SIAM J. Imaging Sci..

[42]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[43]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[44]  Franz Pernkopf,et al.  Sparse nonnegative matrix factorization with ℓ0-constraints , 2012, Neurocomputing.