Frank-Wolfe Optimization for Symmetric-NMF under Simplicial Constraint

Symmetric nonnegative matrix factorization has found abundant applications in various domains by providing a symmetric low-rank decomposition of nonnegative matrices. In this paper we propose a Frank-Wolfe (FW) solver to optimize the symmetric nonnegative matrix factorization problem under a simplicial constraint, which has recently been proposed for probabilistic clustering. Compared with existing solutions, this algorithm is simple to implement, and has no hyperparameters to be tuned. Building on the recent advances of FW algorithms in nonconvex optimization, we prove an $O(1/\varepsilon^2)$ convergence rate to $\varepsilon$-approximate KKT points, via a tight bound $\Theta(n^2)$ on the curvature constant, which matches the best known result in unconstrained nonconvex setting using gradient methods. Numerical results demonstrate the effectiveness of our algorithm. As a side contribution, we construct a simple nonsmooth convex problem where the FW algorithm fails to converge to the optimum. This result raises an interesting question about necessary conditions of the success of the FW algorithm on convex problems.

[1]  Boris Polyak,et al.  Constrained minimization methods , 1966 .

[2]  Peter J. C. Dickinson,et al.  On the computational complexity of membership problems for the completely positive cone and its dual , 2014, Comput. Optim. Appl..

[3]  Tu Bao Ho,et al.  Simplicial nonnegative matrix factorization , 2013, The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF).

[4]  Elad Hazan,et al.  Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets , 2014, ICML.

[5]  J. Magnus,et al.  Matrix differential calculus with applications to simple, Hadamard, and Kronecker products. , 1985 .

[6]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[7]  Simon Lacoste-Julien,et al.  Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[8]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[9]  J. Dunn,et al.  Conditional gradient algorithms with open loop step size rules , 1978 .

[10]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised Edition) , 1999 .

[11]  Stephen A. Vavasis,et al.  On the Complexity of Nonnegative Matrix Factorization , 2007, SIAM J. Optim..

[12]  Yurii Nesterov,et al.  Complexity bounds for primal-dual methods minimizing the model of objective function , 2017, Mathematical Programming.

[13]  Paul Grigas,et al.  New analysis and results for the Frank–Wolfe method , 2013, Math. Program..

[14]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[15]  Nicholas I. M. Gould,et al.  On the Complexity of Steepest Descent, Newton's and Regularized Newton's Methods for Nonconvex Unconstrained Optimization Problems , 2010, SIAM J. Optim..

[16]  Saeed Ghadimi,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Math. Program..

[17]  Jan R. Magnus On the concept of matrix derivative , 2010, J. Multivar. Anal..

[18]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[19]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[20]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[21]  Nanning Zheng,et al.  Non-negative matrix factorization for visual coding , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[22]  Martin Jaggi,et al.  Sparse Convex Optimization Methods for Machine Learning , 2011 .

[23]  Zhaoshui He,et al.  Symmetric Nonnegative Matrix Factorization: Algorithms and Applications to Probabilistic Clustering , 2011, IEEE Transactions on Neural Networks.

[24]  HarchaouiZaid,et al.  Conditional gradient algorithms for norm-regularized smooth convex optimization , 2015 .

[25]  Nikos D. Sidiropoulos,et al.  Non-Negative Matrix Factorization Revisited: Uniqueness and Algorithm for Symmetric Decomposition , 2014, IEEE Transactions on Signal Processing.

[26]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[27]  Katta G. Murty,et al.  Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..

[28]  Inderjit S. Dhillon,et al.  Efficient and Non-Convex Coordinate Descent for Symmetric Nonnegative Matrix Factorization , 2016, IEEE Transactions on Signal Processing.

[29]  Guanghui Lan The Complexity of Large-scale Convex Programming under a Linear Optimization Oracle , 2013, 1309.5550.

[30]  Peter L. Bartlett,et al.  Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks , 2008, J. Mach. Learn. Res..

[31]  Han Zhao,et al.  SoF: Soft-Cluster Matrix Factorization for Probabilistic Clustering , 2015, AAAI.

[32]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[33]  Chris H. Q. Ding,et al.  Symmetric Nonnegative Matrix Factorization for Graph Clustering , 2012, SDM.

[34]  Zaïd Harchaoui,et al.  Conditional gradient algorithms for norm-regularized smooth convex optimization , 2013, Math. Program..

[35]  Peter J. C. Dickinson The copositive cone, the completely positive cone and their generalisations , 2013 .

[36]  Robert J. Plemmons,et al.  Nonnegative Matrices in the Mathematical Sciences , 1979, Classics in Applied Mathematics.

[37]  Volkan Cevher,et al.  Frank-Wolfe works for non-Lipschitz continuous gradient objectives: Scalable poisson phase retrieval , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  A. Berman,et al.  Completely Positive Matrices , 2003 .

[39]  Haesun Park,et al.  SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering , 2015, J. Glob. Optim..