Multiple graph semi-supervised clustering with automatic calculation of graph associations

Abstract Multiple graph clustering is an important tool in data integration and data mining for graph-based data. The prediction and classification accuracy can be significantly improved by integrating information from multiple sources and data sets. The aim of multiple graph clustering is to partition objects into several clusters such that clusters in each graph are well-separated and clusters across different graphs are consistent. Existing methods assume that the degrees of association among different graphs are the same. However, some graphs may be strongly or weakly associated with the other graphs due to high or low correlation of their associated cluster structures. When their cluster structures are (or are not) similar, their degree of association should be high (or low). Accurate clustering results can be obtained by integrating such association information in multiple graphs. The main aim of this paper is to study multiple graph semi-supervised clustering by considering a large amount of multiple graph data and a small amount of labeled data. We propose a constrained optimization problem that can determine cluster structures and degrees of association simultaneously in multiple graph clustering. In our formulation, we make use of orthogonality for cluster structure indicator and cosine correlation for degree of association. With orthogonality constraint in the clustering process, we develop a gradient flow method to solve the resulting optimization problem. And the convergence of the proposed iterative method is also shown. Numerical examples including synthetic data and real data sets with few known labels are tested, and are presented to show the efficiency and effectiveness of our proposed method compared with the testing methods in the literature.

[1]  Chuan Chen,et al.  Multiple graphs clustering by gradient flow method , 2018, J. Frankl. Inst..

[2]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[3]  Xiaotao Qu,et al.  SNP-SNP Interaction Network in Angiogenesis Genes Associated with Prostate Cancer Aggressiveness , 2013, PloS one.

[4]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[5]  Pascal Frossard,et al.  Clustering on Multi-Layer Graphs via Subspace Analysis on Grassmann Manifolds , 2013, IEEE Transactions on Signal Processing.

[6]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[7]  Wei Cheng,et al.  Flexible and robust co-regularized multi-domain graph clustering , 2013, KDD.

[8]  Christopher J. C. Burges,et al.  Spectral clustering and transductive learning with multiple views , 2007, ICML '07.

[9]  Shiliang Sun,et al.  Robust Co-Training , 2011, Int. J. Pattern Recognit. Artif. Intell..

[10]  Masayuki Karasuyama,et al.  Multiple Graph Label Propagation by Sparse Integration , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Feiping Nie,et al.  Multi-view spectral clustering via sparse graph learning , 2020, Neurocomputing.

[12]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[13]  Feiping Nie,et al.  Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Chuan Chen,et al.  A Semisupervised Classification Approach for Multidomain Networks With Domain Selection , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Steven C. H. Hoi,et al.  Multiview Semi-Supervised Learning with Consensus , 2012, IEEE Transactions on Knowledge and Data Engineering.

[16]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[17]  Andreas Martin Lisewski,et al.  Graph sharpening plus graph integration: a synergy that improves protein functional classification , 2007, Bioinform..

[18]  Rong Wang,et al.  Auto-weighted multi-view clustering via spectral embedding , 2020, Neurocomputing.

[19]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Yunming Ye,et al.  MultiRank: co-ranking for objects and relations in multi-relational data , 2011, KDD.

[21]  Jiawei Han,et al.  Tensor space model for document analysis , 2006, SIGIR.

[22]  Bart De Moor,et al.  Multiview Partitioning via Tensor Methods , 2013, IEEE Transactions on Knowledge and Data Engineering.

[23]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..