Relational co-clustering via manifold ensemble learning

Co-clustering targets on grouping the samples and features simultaneously. It takes advantage of the duality between the samples and features. In many real-world applications, the data points or features usually reside on a submanifold of the ambient Euclidean space, but it is nontrivial to estimate the intrinsic manifolds in a principled way. In this study, we focus on improving the co-clustering performance via manifold ensemble learning, which aims to maximally approximate the intrinsic manifolds of both the sample and feature spaces. To achieve this, we develop a novel co-clustering algorithm called Relational Multi-manifold Co-clustering (RMC) based on symmetric nonnegative matrix tri-factorization, which decomposes the relational data matrix into three matrices. This method considers the inter-type relationship revealed by the relational data matrix and the intra-type information reflected by the affinity matrices. Specifically, we assume the intrinsic manifold of the sample or feature space lies in a convex hull of a group of pre-defined candidate manifolds. We hope to learn an appropriate convex combination of them to approach the desired intrinsic manifold. To optimize the objective, the multiplicative rules are utilized to update the factorized matrices and the entropic mirror descent algorithm is exploited to automatically learn the manifold coefficients. Experimental results demonstrate the superiority of the proposed algorithm.

[1]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[2]  Farshad Fotouhi,et al.  Co-clustering Documents and Words Using Bipartite Isoperimetric Graph Partitioning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[3]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[4]  Quanquan Gu,et al.  Co-clustering on manifolds , 2009, KDD.

[5]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Feiping Nie,et al.  Nonnegative Matrix Tri-factorization Based High-Order Co-clustering and Its Fast Implementation , 2011, 2011 IEEE 11th International Conference on Data Mining.

[7]  Xian-Sheng Hua,et al.  Ensemble Manifold Regularization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jiawei Han,et al.  Non-negative Matrix Factorization on Manifold , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10]  Chris H. Q. Ding,et al.  Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization , 2011, CIKM '11.

[11]  Philip S. Yu,et al.  Co-clustering by block value decomposition , 2005, KDD '05.

[12]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[13]  Hujun Bao,et al.  Understanding the Power of Clause Learning , 2009, IJCAI.

[14]  Jiawei Han,et al.  Locally Consistent Concept Factorization for Document Clustering , 2011, IEEE Transactions on Knowledge and Data Engineering.

[15]  Mikhail Belkin,et al.  Manifold Regularization : A Geometric Framework for Learning from Examples , 2004 .

[16]  Kehong Yuan,et al.  On alpha-divergence based nonnegative matrix factorization for clustering cancer gene expression data , 2008, Artif. Intell. Medicine.

[17]  Deng Cai,et al.  Probabilistic dyadic data analysis with local and global consistency , 2009, ICML '09.

[18]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[19]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[20]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[21]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[22]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.

[23]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.