Large-Scale Embedding Learning in Heterogeneous Event Data

Heterogeneous events, which are defined as events connecting strongly-typed objects, are ubiquitous in the real world. We propose a HyperEdge-Based Embedding (Hebe) framework for heterogeneous event data, where a hyperedge represents the interaction among a set of involving objects in an event. The Hebe framework models the proximity among objects in an event by predicting a target object given the other participating objects in the event (hyperedge). Since each hyperedge encapsulates more information on a given event, Hebe is robust to data sparseness. In addition, Hebe is scalable when the data size spirals. Extensive experiments on large-scale real-world datasets demonstrate the efficacy and robustness of Hebe.

[1]  Jiawei Han,et al.  Robust Tensor Decomposition with Gross Corruption , 2014, NIPS.

[2]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[3]  Fei Wang,et al.  FEMA: flexible evolutionary multi-faceted analysis for dynamic behavioral pattern discovery , 2014, KDD.

[4]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[5]  Nicolas Le Roux,et al.  A latent factor model for highly multi-relational data , 2012, NIPS.

[6]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[7]  Claude Berge,et al.  Hypergraphs - combinatorics of finite sets , 1989, North-Holland mathematical library.

[8]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[9]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  Lars Schmidt-Thieme,et al.  Pairwise interaction tensor factorization for personalized tag recommendation , 2010, WSDM '10.

[12]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[13]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[14]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[15]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[16]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[17]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[18]  Charu C. Aggarwal,et al.  Heterogeneous Network Embedding via Deep Architectures , 2015, KDD.

[19]  Tore Opsahl,et al.  Clustering in weighted networks , 2009, Soc. Networks.

[20]  Graham Cormode,et al.  Node Classification in Social Networks , 2011, Social Network Data Analytics.

[21]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[22]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[23]  Gal Chechik,et al.  Euclidean Embedding of Co-occurrence Data , 2004, J. Mach. Learn. Res..

[24]  Jiawei Han,et al.  Ranking-based classification of heterogeneous information networks , 2011, KDD.

[25]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[26]  Jure Leskovec,et al.  Tensor Spectral Clustering for Partitioning Higher-order Network Structures , 2015, SDM.

[27]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[28]  Rebecca Willett,et al.  Hypergraph-Based Anomaly Detection of High-Dimensional Co-Occurrences , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.