Clustering by Orthogonal NMF Model and Non-Convex Penalty Optimization

The non-negative matrix factorization (NMF) model with an additional orthogonality constraint on one of the factor matrices, called the orthogonal NMF (ONMF), has been found a promising clustering model and can outperform the classical K-means. However, solving the ONMF model is a challenging optimization problem because the coupling of the orthogonality and non-negativity constraints introduces a mixed combinatorial aspect into the problem due to the determination of the correct status of the variables (positive or zero). Most of the existing methods directly deal with the orthogonality constraint in its original form via various optimization techniques, but are not scalable for large-scale problems. In this paper, we propose a new ONMF based clustering formulation that equivalently transforms the orthogonality constraint into a set of norm-based non-convex equality constraints. We then apply a non-convex penalty (NCP) approach to add them to the objective as penalty terms, leading to a problem that is efficiently solvable. One smooth penalty formulation and one non-smooth penalty formulation are respectively studied. We build theoretical conditions for the penalized problems to provide feasible stationary solutions to the ONMF based clustering problem, as well as proposing efficient algorithms for solving the penalized problems of the two NCP methods. Experimental results based on both synthetic and real datasets are presented to show that the proposed NCP methods are computationally time efficient, and either match or outperform the existing K-means and ONMF based methods in terms of the clustering performance.

[1]  Oscar Dalmau Cedeño,et al.  Transportless conjugate gradient for optimization on Stiefel manifold , 2020, Comput. Appl. Math..

[2]  Lin Xiao,et al.  An Accelerated Proximal Coordinate Gradient Method , 2014, NIPS.

[3]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[4]  R. Sokal,et al.  THE COMPARISON OF DENDROGRAMS BY OBJECTIVE METHODS , 1962 .

[5]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[6]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[7]  Andri Mirzal,et al.  Nonparametric Orthogonal NMF and its Application in Cancer Clustering , 2014, DaEng.

[8]  Dimitris S. Papailiopoulos,et al.  Orthogonal NMF through Subspace Exploration , 2015, NIPS.

[9]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[10]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[11]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Yingyu Liang,et al.  Distributed k-Means and k-Median Clustering on General Topologies , 2013, NIPS 2013.

[13]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[14]  Nicolas Gillis,et al.  Two algorithms for orthogonal nonnegative matrix factorization with application to clustering , 2012, Neurocomputing.

[15]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[16]  Prosenjit Gupta,et al.  Clustering-based recommender system using principles of voting theory , 2014, 2014 International Conference on Contemporary Computing and Informatics (IC3I).

[17]  Tengke Xiong,et al.  Combining Collaborative Filtering and Clustering for Implicit Recommender System , 2013, 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA).

[18]  Ali Caner Turkmen A Review of Nonnegative Matrix Factorization Methods for Clustering , 2015, 1507.03194.

[19]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[20]  Lei Zhang,et al.  Tumor Clustering Using Nonnegative Matrix Factorization With Gene Selection , 2009, IEEE Transactions on Information Technology in Biomedicine.

[21]  C. Bauckhage k-Means Clustering Is Matrix Factorization , 2015, 1512.07548.

[22]  Bo Yang,et al.  Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering , 2016, ICML.

[23]  Joydeep Ghosh,et al.  Data Clustering Algorithms And Applications , 2013 .

[24]  Bo Yang,et al.  Learning From Hidden Traits: Joint Factor Analysis and Latent Clustering , 2016, IEEE Transactions on Signal Processing.

[25]  Seungjin Choi,et al.  Principal network analysis: identification of subnetworks representing major dynamics using gene expression data , 2011, Bioinform..

[26]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[27]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[28]  Marinka Zitnik,et al.  Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins , 2016, Bioinform..

[29]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[30]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[31]  Le Thi Hoai An,et al.  Accelerated Difference of Convex functions Algorithm and its Application to Sparse Binary Logistic Regression , 2018, IJCAI.

[32]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[33]  Seungjin Choi,et al.  Nonnegative Matrix Factorization with Orthogonality Constraints , 2010, J. Comput. Sci. Eng..

[34]  Yu Liu,et al.  K-Means Clustering with Distributed Dimensions , 2016, ICML.

[35]  Hassan Mansour,et al.  Video querying via compact descriptors of visually salient objects , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[36]  Erkki Oja,et al.  Selecting β-Divergence for Nonnegative Matrix Factorization by Score Matching , 2012, ICANN.

[37]  Tsung-Hui Chang,et al.  Clustering by Orthogonal Non-negative Matrix Factorization: A Sequential Non-convex Penalty Approach , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Ka Yee Yeung,et al.  Details of the Adjusted Rand index and Clustering algorithms Supplement to the paper “ An empirical study on Principal Component Analysis for clustering gene expression data ” ( to appear in Bioinformatics ) , 2001 .

[39]  Yuzuru Tanaka,et al.  A Fast Hierarchical Alternating Least Squares Algorithm for Orthogonal Nonnegative Matrix Factorization , 2014, ACML.

[40]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[41]  Wanjiun Liao,et al.  A Mathematical Theory for Clustering in Metric Spaces , 2015, IEEE Transactions on Network Science and Engineering.

[42]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[43]  Jiawei Han,et al.  Locally Consistent Concept Factorization for Document Clustering , 2011, IEEE Transactions on Knowledge and Data Engineering.

[44]  Peng Wu,et al.  Cell Subclass Identification in Single-Cell RNA-Sequencing Data Using Orthogonal Nonnegative Matrix Factorization , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[45]  Shiqian Ma,et al.  Proximal Gradient Method for Nonsmooth Optimization over the Stiefel Manifold , 2018, SIAM J. Optim..

[46]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[47]  Thomas Mauthner,et al.  Efficient Object Detection Using Orthogonal NMF Descriptor Hierarchies , 2010, DAGM-Symposium.

[48]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Jong-Shi Pang,et al.  Computing B-Stationary Points of Nonsmooth DC Programs , 2015, Math. Oper. Res..