Meta-Graph Based HIN Spectral Embedding: Methods, Analyses, and Insights

Heterogeneous information network (HIN) has drawn significant research attention recently, due to its power of modeling multi-typed multi-relational data and facilitating various downstream applications. In this decade, many algorithms have been developed for HIN modeling, including traditional similarity measures and recent embedding techniques. Most algorithms on HIN leverage meta-graphs or meta-paths (special cases of meta-graphs) to capture various semantics. Given any arbitrary set of meta-graphs, existing algorithms either consider them as equally important or study their different importance through supervised learning. Their performance largely relies on prior knowledge and labeled data. While unsupervised embedding has shown to be a fundamental solution for various homogeneous network mining tasks, for HIN, it is a much harder problem due to such a presence of various meta-graphs. In this work, we propose to study the utility of different meta-graphs, as well as how to simultaneously leverage multiple meta-graphs for HIN embedding in an unsupervised manner. Motivated by prolific research on homogeneous networks, especially spectral graph theory, we firstly conduct a systematic empirical study on the spectrum and embedding quality of different meta-graphs on multiple HINs, which leads to an efficient method of meta-graph assessment. It also helps us to gain valuable insight into the higher-order organization of HINs and indicates a practical way of selecting useful embedding dimensions. Further, we explore the challenges of combining multiple meta-graphs to capture the multi-dimensional semantics in HIN through reasoning from mathematical geometry and arrive at an embedding compression method of autoencoder with l2,1-loss, which finds the most informative meta-graphs and embeddings in an end-to-end unsupervised manner. Finally, empirical analysis suggests a unified workflow to close the gap between our meta-graph assessment and combination methods. To the best of our knowledge, this is the first research effort to provide rich theoretical and empirical analyses on the utility of meta-graphs and their combinations, especially regarding HIN embedding. Extensive experimental comparisons with various state-of-the-art neural network based embedding methods on multiple real-world HINs demonstrate the effectiveness and efficiency of our framework in finding useful meta-graphs and generating high-quality HIN embeddings.

[1]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[4]  Jingrui He,et al.  A Local Algorithm for Structure-Preserving Graph Cut , 2017, KDD.

[5]  Yuan Zhang,et al.  Enhancing the Network Embedding Quality with Structural Similarity , 2017, CIKM.

[6]  Yizhou Sun,et al.  User guided entity similarity search using meta-path selection in heterogeneous information networks , 2012, CIKM.

[7]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[8]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[9]  Kevin Chen-Chuan Chang,et al.  Semantic Proximity Search on Heterogeneous Graph by Proximity Embedding , 2017, AAAI.

[10]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[11]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[12]  E. Davidson The iterative calculation of a few of the lowest eigenvalues and corresponding eigenvectors of large real-symmetric matrices , 1975 .

[13]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[14]  Wang-Chien Lee,et al.  HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning , 2017, CIKM.

[15]  L. Bush,et al.  Discovering Meta-Paths in Large Heterogeneous Information Networks , 2015 .

[16]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Matthias Hein,et al.  A nodal domain theorem and a higher-order Cheeger inequality for the graph $p$-Laplacian , 2016, Journal of Spectral Theory.

[18]  J. Leydold,et al.  Discrete Nodal Domain Theorems , 2000, math/0009120.

[19]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[20]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[21]  Philip S. Yu,et al.  Multi-view Graph Embedding with Hub Detection for Brain Network Analysis , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[22]  Jiawei Han,et al.  Meta-Path Guided Embedding for Similarity Search in Large-Scale Heterogeneous Information Networks , 2016, ArXiv.

[23]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.

[24]  Jure Leskovec,et al.  Local Higher-Order Graph Clustering , 2017, KDD.

[25]  Nikos Mamoulis,et al.  Heterogeneous Information Network Embedding for Meta Path based Proximity , 2017, ArXiv.

[26]  Jiawei Han,et al.  AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks , 2018, SDM.

[27]  Yanfang Ye,et al.  HinDroid: An Intelligent Android Malware Detection System Based on Structured Heterogeneous Information Network , 2017, KDD.

[28]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[29]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[30]  Jure Leskovec,et al.  Higher-order organization of complex networks , 2016, Science.

[31]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[32]  Yizhou Sun,et al.  RelSim: Relation Similarity Search in Schema-Rich Heterogeneous Information Networks , 2016, SDM.

[33]  Luca Trevisan,et al.  Multi-way spectral partitioning and higher-order cheeger inequalities , 2011, STOC '12.

[34]  Po-Wei Chan,et al.  PReP: Path-Based Relevance from a Probabilistic Perspective in Heterogeneous Information Networks , 2017, KDD.

[35]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[36]  Padhraic Smyth,et al.  A Spectral Clustering Approach To Finding Communities in Graph , 2005, SDM.

[37]  Dik Lun Lee,et al.  Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks , 2017, KDD.

[38]  Kevin Chen-Chuan Chang,et al.  Semantic proximity search on graphs with metagraph-based learning , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[39]  Jieping Ye,et al.  Did You Enjoy the Ride? Understanding Passenger Experience via Heterogeneous Network Embedding , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[40]  Srijan Sengupta,et al.  SPECTRAL CLUSTERING IN HETEROGENEOUS NETWORKS , 2015 .

[41]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[42]  Yizhou Sun,et al.  Semi-supervised Learning over Heterogeneous Information Networks by Ensemble of Meta-graph Guided Random Walks , 2017, IJCAI.

[43]  Olgica Milenkovic,et al.  Submodular Hypergraphs: p-Laplacians, Cheeger Inequalities and Spectral Clustering , 2018, ICML.

[44]  Xiang Li,et al.  On Transductive Classification in Heterogeneous Information Networks , 2016, CIKM.

[45]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[46]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[47]  Yizhou Sun,et al.  Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification , 2016, WSDM.

[48]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[49]  Olgica Milenkovic,et al.  Inhomogeneous Hypergraph Clustering with Applications , 2017, NIPS.

[50]  Philip S. Yu,et al.  Integrating meta-path selection with user-guided object clustering in heterogeneous information networks , 2012, KDD.

[51]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[52]  Jiawei Han,et al.  Graph Regularized Meta-path Based Transductive Regression in Heterogeneous Information Network , 2015, SDM.

[53]  Jiawei Han,et al.  KnowSim: A Document Similarity Measure on Structured Heterogeneous Information Networks , 2015, 2015 IEEE International Conference on Data Mining.

[54]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.