User-Guided Clustering in Heterogeneous Information Networks via Motif-Based Comprehensive Transcription

Heterogeneous information networks (HINs) with rich semantics are ubiquitous in real-world applications. For a given HIN, many reasonable clustering results with distinct semantic meaning can simultaneously exist. User-guided clustering is hence of great practical value for HINs where users provide labels to a small portion of nodes. To cater to a broad spectrum of user guidance evidenced by different expected clustering results, carefully exploiting the signals residing in the data is potentially useful. Meanwhile, as one type of complex networks, HINs often encapsulate higher-order interactions that reflect the interlocked nature among nodes and edges. Network motifs, sometimes referred to as meta-graphs, have been used as tools to capture such higher-order interactions and reveal the many different semantics. We therefore approach the problem of user-guided clustering in HINs with network motifs. In this process, we identify the utility and importance of directly modeling higher-order interactions without collapsing them to pairwise interactions. To achieve this, we comprehensively transcribe the higher-order interaction signals to a series of tensors via motifs and propose the MoCHIN model based on joint non-negative tensor factorization. This approach applies to arbitrarily many, arbitrary forms of HIN motifs. An inference algorithm with speed-up methods is also proposed to tackle the challenge that tensor size grows exponentially as the number of nodes in a motif increases. We validate the effectiveness of the proposed method on two real-world datasets and three tasks, and MoCHIN outperforms all baselines in three evaluation tasks under three different metrics. Additional experiments demonstrated the utility of motifs and the benefit of directly modeling higher-order information especially when user guidance is limited. (The code and the data are available at https://github.com/NoSegfault/MoCHIN.)

[1]  Nikos D. Sidiropoulos,et al.  SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[2]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[3]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[4]  Yizhou Sun,et al.  Mining heterogeneous information networks: a structural analysis approach , 2013, SKDD.

[5]  Yizhou Sun,et al.  Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification , 2016, WSDM.

[6]  Jiawei Han,et al.  Multi-View Clustering via Joint Nonnegative Matrix Factorization , 2013, SDM.

[7]  Nikos D. Sidiropoulos,et al.  Egonet tensor decomposition for community identification , 2016, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[8]  Kevin Chen-Chuan Chang,et al.  Semantic proximity search on graphs with metagraph-based learning , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[9]  Jiawei Han,et al.  Node, Motif and Subgraph: Leveraging Network Functional Blocks Through Structural Convolution , 2018, 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[10]  Philip S. Yu,et al.  Semi-supervised Tensor Factorization for Brain Network Analysis , 2016, ECML/PKDD.

[11]  Joshua B. Tenenbaum,et al.  Modelling Relational Data using Bayesian Clustered Tensor Factorization , 2009, NIPS.

[12]  Jiawei Han,et al.  AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks , 2018, SDM.

[13]  Jiawei Han,et al.  Meta-Graph Based HIN Spectral Embedding: Methods, Analyses, and Insights , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[14]  Dik Lun Lee,et al.  Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks , 2017, KDD.

[15]  Evangelos E. Papalexakis,et al.  SMACD: Semi-supervised Multi-Aspect Community Detection , 2018, SDM.

[16]  Chen Luo,et al.  Semi-supervised Clustering on Heterogeneous Information Networks , 2014, PAKDD.

[17]  Xiang Li,et al.  Semi-supervised Clustering in Attributed Heterogeneous Information Networks , 2017, WWW.

[18]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[19]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[20]  Ryan A. Rossi,et al.  Higher-order Spectral Clustering for Heterogeneous Graphs , 2018, ArXiv.

[21]  Zoran Levnajic,et al.  Revealing the Hidden Language of Complex Networks , 2014, Scientific Reports.

[22]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[23]  Po-Wei Chan,et al.  PReP: Path-Based Relevance from a Probabilistic Perspective in Heterogeneous Information Networks , 2017, KDD.

[24]  J. H. Choi,et al.  DFacTo: Distributed Factorization of Tensors , 2014, NIPS.

[25]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[26]  Xiang Li,et al.  Meta Structure: Computing Relevance in Large Heterogeneous Information Networks , 2016, KDD.

[27]  Dongdai Lin,et al.  Robust Face Clustering Via Tensor Decomposition , 2015, IEEE Transactions on Cybernetics.

[28]  Tamara G. Kolda,et al.  Efficient MATLAB Computations with Sparse and Factored Tensors , 2007, SIAM J. Sci. Comput..

[29]  Ryan A. Rossi,et al.  Efficient Graphlet Counting for Large Networks , 2015, 2015 IEEE International Conference on Data Mining.

[30]  Philip S. Yu,et al.  Ranking-based Clustering on General Heterogeneous Information Networks by Network Projection , 2014, CIKM.

[31]  Jure Leskovec,et al.  Local Higher-Order Graph Clustering , 2017, KDD.

[32]  Jure Leskovec,et al.  Higher-order organization of complex networks , 2016, Science.

[33]  Jiawei Han,et al.  Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks , 2018, KDD.

[34]  Jure Leskovec,et al.  Tensor Spectral Clustering for Partitioning Higher-order Network Structures , 2015, SDM.

[35]  Yizhou Sun,et al.  Graph Regularized Transductive Classification on Heterogeneous Information Networks , 2010, ECML/PKDD.

[36]  Jingrui He,et al.  A Local Algorithm for Structure-Preserving Graph Cut , 2017, KDD.

[37]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[38]  Zhao Li,et al.  Interactive Paths Embedding for Semantic Proximity Search on Heterogeneous Graphs , 2018, KDD.

[39]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[40]  Kevin Chen-Chuan Chang,et al.  Distance-Aware DAG Embedding for Proximity Search on Heterogeneous Graphs , 2018, AAAI.

[41]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[42]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[43]  Nikos D. Sidiropoulos,et al.  Tensors for Data Mining and Data Fusion , 2016, ACM Trans. Intell. Syst. Technol..

[44]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[45]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[46]  Zhao Chen,et al.  Ranking Users in Social Networks With Higher-Order Structures , 2018, AAAI.

[47]  Yizhou Sun,et al.  Semi-supervised Learning over Heterogeneous Information Networks by Ensemble of Meta-graph Guided Random Walks , 2017, IJCAI.

[48]  Kevin Chen-Chuan Chang,et al.  Motif-based Convolutional Neural Network on Graphs , 2017, ArXiv.

[49]  Olgica Milenkovic,et al.  Inhomogeneous Hypergraph Clustering with Applications , 2017, NIPS.

[50]  Lorenzo De Stefani,et al.  TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size , 2016, KDD.

[51]  Tamir Hazan,et al.  Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.

[52]  Philip S. Yu,et al.  Integrating meta-path selection with user-guided object clustering in heterogeneous information networks , 2012, KDD.

[53]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[54]  Su Deng,et al.  A Tensor CP Decomposition Method for Clustering Heterogeneous Information Networks via Stochastic Gradient Descent Algorithms , 2017, Sci. Program..