Similarity Modeling on Heterogeneous Networks via Automatic Path Discovery

Heterogeneous networks are widely used to model real-world semi-structured data. The key challenge of learning over such networks is the modeling of node similarity under both network structures and contents. To deal with network structures, most existing works assume a given or enumerable set of meta-paths and then leverage them for the computation of meta-path-based proximities or network embeddings. However, expert knowledge for given meta-paths is not always available, and as the length of considered meta-paths increases, the number of possible paths grows exponentially, which makes the path searching process very costly. On the other hand, while there are often rich contents around network nodes, they have hardly been leveraged to further improve similarity modeling. In this work, to properly model node similarity in content-rich heterogeneous networks, we propose to automatically discover useful paths for pairs of nodes under both structural and content information. To this end, we combine continuous reinforcement learning and deep content embedding into a novel semi-supervised joint learning framework. Specifically, the supervised reinforcement learning component explores useful paths between a small set of example similar pairs of nodes, while the unsupervised deep embedding component captures node contents and enables inductive learning on the whole network. The two components are jointly trained in a closed loop to mutually enhance each other. Extensive experiments on three real-world heterogeneous networks demonstrate the supreme advantages of our algorithm. Code related to this paper is available at: https://github.com/yangji9181/AutoPath.

[1]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[2]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[3]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[4]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[5]  Yizhou Sun,et al.  RelSim: Relation Similarity Search in Schema-Rich Heterogeneous Information Networks , 2016, SDM.

[6]  Alexander J. Smola,et al.  Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning , 2017, ICLR.

[7]  Lin Zhong,et al.  Bi-directional Joint Inference for User Links and Attributes on Large Social Graphs , 2017, WWW.

[8]  Kevin Chen-Chuan Chang,et al.  Semantic proximity search on graphs with metagraph-based learning , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[9]  Jieping Ye,et al.  Did You Enjoy the Ride? Understanding Passenger Experience via Heterogeneous Network Embedding , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[12]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[13]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[14]  Yanfang Ye,et al.  HinDroid: An Intelligent Android Malware Detection System Based on Structured Heterogeneous Information Network , 2017, KDD.

[15]  Jiawei Han,et al.  Bridging Collaborative Filtering and Semi-Supervised Learning: A Neural Approach for POI Recommendation , 2017, KDD.

[16]  Wenhan Xiong,et al.  DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning , 2017, EMNLP.

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[19]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[20]  L. Bush,et al.  Discovering Meta-Paths in Large Heterogeneous Information Networks , 2015 .

[21]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Liyuan Liu,et al.  Graph Clustering with Dynamic Embedding , 2017, ArXiv.

[23]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[24]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[25]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[26]  Kevin Chen-Chuan Chang,et al.  Distance-Aware DAG Embedding for Proximity Search on Heterogeneous Graphs , 2018, AAAI.

[27]  Yixin Chen,et al.  Weisfeiler-Lehman Neural Machine for Link Prediction , 2017, KDD.

[28]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Yu Shi,et al.  PReP: Path-Based Relevance from a Probabilistic Perspective in Heterogeneous Information Networks , 2017, KDD.

[31]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[32]  Jiawei Han,et al.  Graph Regularized Meta-path Based Transductive Regression in Heterogeneous Information Network , 2015, SDM.

[33]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[34]  Dik Lun Lee,et al.  Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks , 2017, KDD.

[35]  Jiawei Han,et al.  KnowSim: A Document Similarity Measure on Structured Heterogeneous Information Networks , 2015, 2015 IEEE International Conference on Data Mining.

[36]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[37]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[38]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[39]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[40]  Yuan Zhang,et al.  Enhancing the Network Embedding Quality with Structural Similarity , 2017, CIKM.

[41]  Kevin Chen-Chuan Chang,et al.  Semantic Proximity Search on Heterogeneous Graph by Proximity Embedding , 2017, AAAI.

[42]  Jiawei Han,et al.  Meta-Path Guided Embedding for Similarity Search in Large-Scale Heterogeneous Information Networks , 2016, ArXiv.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Wang-Chien Lee,et al.  HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning , 2017, CIKM.

[45]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[46]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[47]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[48]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[49]  Nikos Mamoulis,et al.  Heterogeneous Information Network Embedding for Meta Path based Proximity , 2017, ArXiv.

[50]  Jiawei Han,et al.  AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks , 2018, SDM.

[51]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[52]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[53]  Philip S. Yu,et al.  Integrating meta-path selection with user-guided object clustering in heterogeneous information networks , 2012, KDD.

[54]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[55]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[56]  Daniel R. Figueiredo,et al.  struc2vec: Learning Node Representations from Structural Identity , 2017, KDD.

[57]  Jiawei Han,et al.  Mining Query-Based Subnetwork Outliers in Heterogeneous Information Networks , 2014, 2014 IEEE International Conference on Data Mining.