Learning Inter- and Intra-Manifolds for Matrix Factorization-Based Multi-Aspect Data Clustering

Clustering on the data with multiple aspects, such as multi-view or multi-type relational data, has become popular in recent years due to their wide applicability. The approach using manifold learning with the Non-negative Matrix Factorization (NMF) framework, that learns the accurate low-rank representation of the multi-dimensional data, has shown effectiveness. We propose to include the inter-manifold in the NMF framework, utilizing the distance information of data points of different data types (or views) to learn the diverse manifold for data clustering. Empirical analysis reveals that the proposed method can find partial representations of various interrelated types and select useful features during clustering. Results on several datasets demonstrate that the proposed method outperforms the state-of-the-art multi-aspect data clustering methods in both accuracy and efficiency.

[1]  Chris H. Q. Ding,et al.  Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization , 2011, CIKM '11.

[2]  Jiawei Han,et al.  Locally Consistent Concept Factorization for Document Clustering , 2011, IEEE Transactions on Knowledge and Data Engineering.

[3]  Jeff A. Bilmes,et al.  On Deep Multi-View Representation Learning , 2015, ICML.

[4]  Richi Nayak,et al.  Robust clustering of multi-type relational data via a heterogeneous manifold ensemble , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[5]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[6]  Yizhou Sun,et al.  RankClus: integrating clustering with ranking for heterogeneous information network analysis , 2009, EDBT '09.

[7]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[8]  Joydeep Ghosh,et al.  Under Consideration for Publication in Knowledge and Information Systems Generative Model-based Document Clustering: a Comparative Study , 2003 .

[9]  Yun Fu,et al.  Multi-View Clustering via Deep Matrix Factorization , 2017, AAAI.

[10]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[11]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[12]  Massih-Reza Amini,et al.  Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization , 2009, NIPS.

[13]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[14]  Richi Nayak,et al.  A Novel Approach to Learning Consensus and Complementary Information for Multi-View Data Clustering , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[15]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[16]  Quanquan Gu,et al.  Co-clustering on manifolds , 2009, KDD.

[17]  Dhruv Batra,et al.  Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[19]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[20]  Richi Nayak,et al.  Learning Association Relationship and Accurate Geometric Structures for Multi-Type Relational Data , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[21]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[22]  Philip S. Yu,et al.  PathSelClus: Integrating Meta-Path Selection with User-Guided Object Clustering in Heterogeneous Information Networks , 2013, TKDD.

[23]  Soon Myoung Chung,et al.  Text Clustering with Feature Selection by Using Statistical Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[24]  Xinyu Zhang,et al.  Multi-view clustering based on graph-regularized nonnegative matrix factorization for object recognition , 2017, Inf. Sci..

[25]  Ming Yang,et al.  Multi-View Representation Learning: A Survey from Shallow Methods to Deep Methods , 2016, ArXiv.

[26]  I. Dhillon,et al.  A Unified View of Kernel k-means , Spectral Clustering and Graph Cuts , 2004 .

[27]  Jian Pei,et al.  A Survey on Network Embedding , 2017, IEEE Transactions on Knowledge and Data Engineering.

[28]  Jian Yu,et al.  High-Order Co-clustering Text Data on Semantics-Based Representation Model , 2011, PAKDD.

[29]  Charu C. Aggarwal,et al.  A Survey of Text Clustering Algorithms , 2012, Mining Text Data.

[30]  Zahid Halim,et al.  Multi-view document clustering via ensemble method , 2014, Journal of Intelligent Information Systems.

[31]  Feng Liu,et al.  Auto-encoder Based Data Clustering , 2013, CIARP.

[32]  Hong Yu,et al.  Multi-view clustering via multi-manifold regularized non-negative matrix factorization , 2017, Neural Networks.

[33]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[34]  Lin Wu,et al.  Iterative Views Agreement: An Iterative Low-Rank Based Structured Optimization Method to Multi-View Spectral Clustering , 2016, IJCAI.

[35]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[36]  Richi Nayak,et al.  Multi-type Relational Data Clustering for Community Detection by Exploiting Content and Structure Information in Social Networks , 2019, PRICAI.

[37]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[38]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Ananth Kalyanaraman,et al.  On clustering heterogeneous networks , 2013 .

[40]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[41]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[42]  Qiang Liu,et al.  A Survey of Clustering With Deep Learning: From the Perspective of Network Architecture , 2018, IEEE Access.

[43]  Chun Chen,et al.  Relational Multimanifold Coclustering , 2013, IEEE Transactions on Cybernetics.

[44]  Hal Daumé,et al.  Co-regularized Multi-view Spectral Clustering , 2011, NIPS.

[45]  Richi Nayak,et al.  Clustering Multi-View Data Using Non-negative Matrix Factorization and Manifold Learning for Effective Understanding: A Survey Paper , 2019 .

[46]  Feiping Nie,et al.  Robust Manifold Nonnegative Matrix Factorization , 2014, ACM Trans. Knowl. Discov. Data.

[47]  F. Bach,et al.  Optimization with Sparsity-Inducing Penalties (Foundations and Trends(R) in Machine Learning) , 2011 .

[48]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.

[49]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[50]  Huan Liu,et al.  Community detection via heterogeneous interaction analysis , 2012, Data Mining and Knowledge Discovery.