Distance-Aware DAG Embedding for Proximity Search on Heterogeneous Graphs

Proximity search on heterogeneous graphs aims to measure the proximity between two nodes on a graph w.r.t. some semantic relation for ranking. Pioneer work often tries to measure such proximity by paths connecting the two nodes. However, paths as linear sequences have limited expressiveness for the complex network connections. In this paper, we explore a more expressive DAG (directed acyclic graph) data structure for modeling the connections between two nodes. Particularly, we are interested in learning a representation for the DAGs to encode the proximity between two nodes. We face two challenges to use DAGs, including how to efficiently generate DAGs and how to effectively learn DAG embedding for proximity search. We find distance-awareness as important for proximity search and the key to solve the above challenges. Thus we develop a novel Distance-aware DAG Embedding (D2AGE) model. We evaluate D2AGE on three benchmark data sets with six semantic relations, and we show that D2AGE outperforms the state-of-the-art baselines. We release the code on https://github.com/shuaiOKshuai.

[1]  Kevin Chen-Chuan Chang,et al.  A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[2]  Hongyu Guo,et al.  DAG-Structured Long Short-Term Memory for Semantic Compositionality , 2016, NAACL.

[3]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[4]  Kevin Chen-Chuan Chang,et al.  Semantic Proximity Search on Heterogeneous Graph by Proximity Embedding , 2017, AAAI.

[5]  Pierre Baldi,et al.  The Principled Design of Large-Scale Recursive Neural Network Architectures--DAG-RNNs and the Protein Structure Prediction Problem , 2003, J. Mach. Learn. Res..

[6]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[7]  Xuelong Li,et al.  Unsupervised Large Graph Embedding , 2017, AAAI.

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[10]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[11]  Dik Lun Lee,et al.  Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks , 2017, KDD.

[12]  Franco Scarselli,et al.  Recursive neural networks for processing graphs with labelled edges: theory and applications , 2005, Neural Networks.

[13]  Gang Wang,et al.  DAG-Recurrent Neural Networks for Scene Labeling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jiawei Han,et al.  Mining advisor-advisee relationships from research publication networks , 2010, KDD.

[15]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[16]  Yizhou Sun,et al.  Semi-supervised Learning over Heterogeneous Information Networks by Ensemble of Meta-graph Guided Random Walks , 2017, IJCAI.

[17]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[18]  Kevin Chen-Chuan Chang,et al.  From Node Embedding To Community Embedding , 2016, ArXiv.

[19]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[20]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[21]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[22]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[23]  Kevin Chen-Chuan Chang,et al.  User profiling in an ego network: co-profiling attributes and relationships , 2014, WWW.

[24]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[25]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[26]  Kevin Chen-Chuan Chang,et al.  Semantic proximity search on graphs with metagraph-based learning , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[27]  Michalis Vazirgiannis,et al.  Matching Node Embeddings for Graph Similarity , 2017, AAAI.

[28]  Yu Shi,et al.  PReP: Path-Based Relevance from a Probabilistic Perspective in Heterogeneous Information Networks , 2017, KDD.

[29]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[30]  Quoc V. Le,et al.  Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[31]  Nir Friedman,et al.  Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .

[32]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.