Collective Representation Learning on Spatiotemporal Heterogeneous Information Networks

Representation learning is a technique that is used to capture the underlying latent features of complex data. Representation learning on networks has been widely implemented for learning network structure and embedding it in a low dimensional vector space. In recent years, network embedding using representation learning has attracted increasing attention, and many deep architectures have been widely proposed. However, existing network embedding techniques ignore the multi-class spatial and temporal relationships that crucially reflect the complex nature among vertices and links in spatiotemporal heterogeneous information networks(SHINs). To address this problem, in this paper, we present two types of collective representation learning models for spatiotemporal heterogeneous information network embedding (SHNE). 1) We propose a model called Multilingual SHNE (M-SHNE); the proposed model leverages the use of random walks along with multilingual word embedding technique used in natural language processing (NLP) to collectively learn the spatiotemporal proximity measures between vertices in SHINs and preserve it in a low dimensional vector space. 2) We propose a second method called Meta path Constrained Random walk SHNE (MCR-SHNE) that combines the advantage of meta path counting algorithm, path constrained random walks, and word embedding technique to generate lower dimensional embeddings that preserve the spatiotemporal proximity measures in SHINs. Experimental results demonstrate the effectiveness of our two proposed models over state-of-the-art algorithms on real-world datasets.

[1]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[2]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[3]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[4]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[5]  Hui Xiong,et al.  Real Estate Ranking via Mixed Land-use Latent Models , 2015, KDD.

[6]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.

[7]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[8]  Hui Xiong,et al.  Sparse Real Estate Ranking with Online User Reviews and Offline Moving Behaviors , 2014, 2014 IEEE International Conference on Data Mining.

[9]  Roberto Navigli,et al.  Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities , 2016, Artif. Intell..

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[12]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[13]  ZhouZhi-Hua,et al.  Modeling of Geographic Dependencies for Real Estate Ranking , 2016 .

[14]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[15]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[16]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[17]  Tom M. Mitchell,et al.  Random Walk Inference and Learning in A Large Scale Knowledge Base , 2011, EMNLP.

[18]  Christopher D. Manning,et al.  Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[19]  Hui Xiong,et al.  Exploiting geographic dependencies for real estate appraisal: a mutual perspective of ranking and clustering , 2014, KDD.

[20]  Reynold Cheng,et al.  Discovering Meta-Paths in Large Heterogeneous Information Networks , 2015, WWW.

[21]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[22]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[23]  W. Woess Random walks on infinite graphs and groups, by Wolfgang Woess, Cambridge Tracts , 2001 .

[24]  Jian Pei,et al.  Community Preserving Network Embedding , 2017, AAAI.

[25]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[26]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[27]  Benjamin Van Durme,et al.  Multiview LSA: Representation Learning via Generalized CCA , 2015, NAACL.

[28]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[29]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[30]  Charu C. Aggarwal,et al.  Co-author Relationship Prediction in Heterogeneous Bibliographic Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.