论文信息 - Heterogeneous Network Embedding via Deep Architectures

Heterogeneous Network Embedding via Deep Architectures

Data embedding is used in many machine learning applications to create low-dimensional feature representations, which preserves the structure of data points in their original space. In this paper, we examine the scenario of a heterogeneous network with nodes and content of various types. Such networks are notoriously difficult to mine because of the bewildering combination of heterogeneous contents and structures. The creation of a multidimensional embedding of such data opens the door to the use of a wide variety of off-the-shelf mining techniques for multidimensional data. Despite the importance of this problem, limited efforts have been made on embedding a network of scalable, dynamic and heterogeneous data. In such cases, both the content and linkage structure provide important cues for creating a unified feature representation of the underlying network. In this paper, we design a deep embedding algorithm for networked data. A highly nonlinear multi-layered embedding function is used to capture the complex interactions between the heterogeneous data in a network. Our goal is to create a multi-resolution deep embedding function, that reflects both the local and global network structures, and makes the resulting embedding useful for a variety of data mining tasks. In particular, we demonstrate that the rich content and linkage information in a heterogeneous network can be captured by such an approach, so that similarities among cross-modal data can be measured directly in a common embedding space. Once this goal has been achieved, a wide variety of data mining problems can be solved by applying off-the-shelf algorithms designed for handling vector representations. Our experiments on real-world network datasets show the effectiveness and scalability of the proposed algorithm as compared to the state-of-the-art embedding methods.

[1] Yan Liu,et al. Latent feature learning in social media network , 2013, ACM Multimedia.

[2] Ning Xu,et al. Multimedia Classification , 2014, Data Classification: Algorithms and Applications.

[3] Jun Wang,et al. Comparing apples to oranges: a scalable solution with heterogeneous hashing , 2013, KDD.

[4] Jiawei Han,et al. ClusCite: effective citation recommendation by information network-based clustering , 2014, KDD.

[5] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[7] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[8] Yihong Gong,et al. Combining content and link for classification using matrix factorization , 2007, SIGIR.

[9] Hui Li,et al. A Deep Learning Approach to Link Prediction in Dynamic Networks , 2014, SDM.

[10] Razvan Pascanu,et al. Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[11] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[12] Tony Jebara,et al. Structure preserving embedding , 2009, ICML '09.

[13] Thomas Hofmann,et al. Greedy Layer-Wise Training of Deep Networks , 2007 .

[14] Enhong Chen,et al. Learning Deep Representations for Graph Clustering , 2014, AAAI.

[15] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16] Nitish Srivastava,et al. Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[17] David E. Rapach,et al. In-sample vs. out-of-sample tests of stock return predictability in the context of data mining , 2006 .

[18] Hui Xiong,et al. Temporal Skeletonization on Sequential Data: Patterns, Categorization, and Visualization , 2016, IEEE Trans. Knowl. Data Eng..

[19] Geoffrey E. Hinton,et al. Learning Distributed Representations of Relational Data using Linear Relational Embedding , 2001, WIRN.

[20] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[21] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[22] Zhen Li,et al. Learning Locally-Adaptive Decision Functions for Person Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Huan Liu,et al. Unsupervised feature selection for linked social media data , 2012, KDD.

[24] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[25] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[26] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[27] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[28] Charu C. Aggarwal,et al. Transfer Learning of Distance Metrics by Cross-Domain Metric Sampling across Heterogeneous Spaces , 2012, SDM.

[29] Steven Skiena,et al. DeepWalk: online learning of social representations , 2014, KDD.

[30] Yun Chi,et al. Combining link and content for community detection: a discriminative approach , 2009, KDD.

[31] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[32] Dong Liu,et al. Hybrid social media network , 2012, ACM Multimedia.

[33] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[34] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[35] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.

[36] Tat-Seng Chua,et al. NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[37] Hans-Peter Kriegel,et al. A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[38] Nicolas Le Roux,et al. A latent factor model for highly multi-relational data , 2012, NIPS.

[39] Geoffrey J. Gordon,et al. Relational learning via collective matrix factorization , 2008, KDD.

[40] Jun Wang,et al. A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data , 2014, ICML.

[41] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[42] Yizhou Sun,et al. P-Rank: a comprehensive structural similarity measure over information networks , 2009, CIKM.

[43] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.