Higher-Order Clustering in Heterogeneous Information Networks

As one type of complex networks widely-seen in real-world application, heterogeneous information networks (HINs) often encapsulate higher-order interactions that crucially reflect the complex nature among nodes and edges in real-world data. Modeling higher-order interactions in HIN facilitates the user-guided clustering problem by providing an informative collection of signals. At the same time, network motifs have been used extensively to reveal higher-order interactions and network semantics in homogeneous networks. Thus, it is natural to extend the use of motifs to HIN, and we tackle the problem of user-guided clustering in HIN by using motifs. We highlight the benefits of comprehensively modeling higher-order interactions instead of decomposing the complex relationships to pairwise interaction. We propose the MoCHIN model which is applicable to arbitrary forms of HIN motifs, which is often necessary for the application scenario in HINs due to their rich and diverse semantics encapsulated in the heterogeneity. To overcome the curse of dimensionality since the tensor size grows exponentially as the number of nodes increases in our model, we propose an efficient inference algorithm for MoCHIN. In our experiment, MoCHIN surpasses all baselines in three evaluation tasks under different metrics. The advantage of our model when the supervision is weak is also discussed in additional experiments.

[1]  Lorenzo De Stefani,et al.  TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size , 2016, KDD.

[2]  Tamir Hazan,et al.  Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.

[3]  Nikos D. Sidiropoulos,et al.  SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[4]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[5]  Philip S. Yu,et al.  Integrating meta-path selection with user-guided object clustering in heterogeneous information networks , 2012, KDD.

[6]  Yizhou Sun,et al.  RankClus: integrating clustering with ranking for heterogeneous information network analysis , 2009, EDBT '09.

[7]  Jakub W. Pachocki,et al.  Scalable Motif-aware Graph Clustering , 2016, WWW.

[8]  Jon M. Kleinberg,et al.  Simplicial closure and higher-order link prediction , 2018, Proceedings of the National Academy of Sciences.

[9]  Yizhou Sun,et al.  Recurrent Meta-Structure for Robust Similarity Measure in Heterogeneous Information Networks , 2017, ACM Trans. Knowl. Discov. Data.

[10]  Zhao Chen,et al.  Ranking Users in Social Networks With Higher-Order Structures , 2018, AAAI.

[11]  Jure Leskovec,et al.  Higher-order organization of complex networks , 2016, Science.

[12]  Jiawei Han,et al.  Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks , 2018, KDD.

[13]  Yu Zhou,et al.  DMSS: A Robust Deep Meta Structure Based Similarity Measure in Heterogeneous Information Networks , 2017, ArXiv.

[14]  Yizhou Sun,et al.  Semi-supervised Learning over Heterogeneous Information Networks by Ensemble of Meta-graph Guided Random Walks , 2017, IJCAI.

[15]  Kevin Chen-Chuan Chang,et al.  Motif-based Convolutional Neural Network on Graphs , 2017, ArXiv.

[16]  Yizhou Sun,et al.  Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification , 2016, WSDM.

[17]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[18]  F. L. Hitchcock The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .

[19]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[20]  O. Sporns,et al.  Motifs in Brain Networks , 2004, PLoS biology.

[21]  Jiawei Han,et al.  Multi-View Clustering via Joint Nonnegative Matrix Factorization , 2013, SDM.

[22]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[23]  Jure Leskovec,et al.  Higher-order clustering in networks , 2017, Physical review. E.

[24]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[25]  Zhao Li,et al.  Interactive Paths Embedding for Semantic Proximity Search on Heterogeneous Graphs , 2018, KDD.

[26]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[27]  Valeria Fionda,et al.  Meta Structures in Knowledge Graphs , 2017, International Semantic Web Conference.

[28]  Zoran Levnajic,et al.  Revealing the Hidden Language of Complex Networks , 2014, Scientific Reports.

[29]  Yizhou Sun,et al.  Mining heterogeneous information networks: a structural analysis approach , 2013, SKDD.

[30]  Nikos D. Sidiropoulos,et al.  Egonet tensor decomposition for community identification , 2016, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[31]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[32]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[33]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[34]  Nikos D. Sidiropoulos,et al.  Tensors for Data Mining and Data Fusion , 2016, ACM Trans. Intell. Syst. Technol..

[35]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[36]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[37]  Chengqi Zhang,et al.  MetaGraph2Vec: Complex Semantic Path Augmented Heterogeneous Network Embedding , 2018, PAKDD.

[38]  Jure Leskovec,et al.  Motifs in Temporal Networks , 2016, WSDM.

[39]  Kevin Chen-Chuan Chang,et al.  Distance-Aware DAG Embedding for Proximity Search on Heterogeneous Graphs , 2018, AAAI.

[40]  Xiang Li,et al.  Meta Structure: Computing Relevance in Large Heterogeneous Information Networks , 2016, KDD.

[41]  Dongdai Lin,et al.  Robust Face Clustering Via Tensor Decomposition , 2015, IEEE Transactions on Cybernetics.

[42]  Tamara G. Kolda,et al.  Efficient MATLAB Computations with Sparse and Factored Tensors , 2007, SIAM J. Sci. Comput..

[43]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[44]  Ertunc Erdil,et al.  Combining multiple clusterings using similarity graph , 2011, Pattern Recognit..

[45]  Evangelos E. Papalexakis,et al.  SMACD: Semi-supervised Multi-Aspect Community Detection , 2018, SDM.

[46]  Ali Pinar,et al.  Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts , 2014, WWW.

[47]  Tamara G. Kolda,et al.  Using Triangles to Improve Community Detection in Directed Networks , 2014, ArXiv.

[48]  Dik Lun Lee,et al.  Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks , 2017, KDD.

[49]  Philip S. Yu,et al.  Ranking-based Clustering on General Heterogeneous Information Networks by Network Projection , 2014, CIKM.

[50]  Jure Leskovec,et al.  Local Higher-Order Graph Clustering , 2017, KDD.

[51]  Chen Luo,et al.  Semi-supervised Clustering on Heterogeneous Information Networks , 2014, PAKDD.

[52]  J. H. Choi,et al.  DFacTo: Distributed Factorization of Tensors , 2014, NIPS.

[53]  Yizhou Sun,et al.  Clustering and Ranking in Heterogeneous Information Networks via Gamma-Poisson Model , 2015, SDM.

[54]  Joshua B. Tenenbaum,et al.  Modelling Relational Data using Bayesian Clustered Tensor Factorization , 2009, NIPS.

[55]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[56]  Xiang Li,et al.  Semi-supervised Clustering in Attributed Heterogeneous Information Networks , 2017, WWW.

[57]  Ravi Kumar,et al.  Counting Graphlets: Space vs Time , 2017, WSDM.

[58]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[59]  Kevin Chen-Chuan Chang,et al.  Semantic proximity search on graphs with metagraph-based learning , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[60]  Jon M. Kleinberg,et al.  Subgraph frequencies: mapping the empirical and extremal geography of large graph collections , 2013, WWW.

[61]  Jure Leskovec,et al.  Tensor Spectral Clustering for Partitioning Higher-order Network Structures , 2015, SDM.

[62]  Yizhou Sun,et al.  Graph Regularized Transductive Classification on Heterogeneous Information Networks , 2010, ECML/PKDD.

[63]  Philip S. Yu,et al.  Semi-supervised Tensor Factorization for Brain Network Analysis , 2016, ECML/PKDD.

[64]  Jiawei Han,et al.  Temporal Motifs in Heterogeneous Information Networks , 2018 .

[65]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[66]  Su Deng,et al.  A Tensor CP Decomposition Method for Clustering Heterogeneous Information Networks via Stochastic Gradient Descent Algorithms , 2017, Sci. Program..

[67]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[68]  Charu C. Aggarwal,et al.  Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes , 2012, Proc. VLDB Endow..

[69]  Jingrui He,et al.  A Local Algorithm for Structure-Preserving Graph Cut , 2017, KDD.