HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning

In this paper, we propose a novel representation learning framework, namely HIN2Vec, for heterogeneous information networks (HINs). The core of the proposed framework is a neural network model, also called HIN2Vec, designed to capture the rich semantics embedded in HINs by exploiting different types of relationships among nodes. Given a set of relationships specified in forms of meta-paths in an HIN, HIN2Vec carries out multiple prediction training tasks jointly based on a target set of relationships to learn latent vectors of nodes and meta-paths in the HIN. In addition to model design, several issues unique to HIN2Vec, including regularization of meta-path vectors, node type selection in negative sampling, and cycles in random walks, are examined. To validate our ideas, we learn latent vectors of nodes using four large-scale real HIN datasets, including Blogcatalog, Yelp, DBLP and U.S. Patents, and use them as features for multi-label node classification and link prediction applications on those networks. Empirical results show that HIN2Vec soundly outperforms the state-of-the-art representation learning models for network data, including DeepWalk, LINE, node2vec, PTE, HINE and ESim, by 6.6% to 23.8% of $micro$-$f_1$ in multi-label node classification and 5% to 70.8% of $MAP$ in link prediction.

[1]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[2]  Kevin Zhou Navigation in a small world , 2017 .

[3]  Jiawei Han,et al.  Large-Scale Embedding Learning in Heterogeneous Event Data , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[6]  Charu C. Aggarwal,et al.  Co-author Relationship Prediction in Heterogeneous Bibliographic Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[7]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[8]  R. Kronmal,et al.  On the Alias Method for Generating Random Variables From a Discrete Distribution , 1979 .

[9]  Joachim H. Ahrens,et al.  An alias method for sampling from the normal distribution , 1989, Computing.

[10]  Charu C. Aggarwal,et al.  Heterogeneous Network Embedding via Deep Architectures , 2015, KDD.

[11]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[12]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[13]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[14]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[16]  Jiawei Han,et al.  Ranking-based classification of heterogeneous information networks , 2011, KDD.

[17]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Tore Opsahl,et al.  Clustering in weighted networks , 2009, Soc. Networks.

[19]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[20]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.

[21]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[22]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Jiawei Han,et al.  Meta-Path Guided Embedding for Similarity Search in Large-Scale Heterogeneous Information Networks , 2016, ArXiv.

[25]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.