Topical network embedding

Networked data involve complex information from multifaceted channels, including topology structures, node content, and/or node labels etc., where structure and content are often correlated but are not always consistent. A typical scenario is the citation relationships in scholarly publications where a paper is cited by others not because they have the same content, but because they share one or multiple subject matters. To date, while many network embedding methods exist to take the node content into consideration, they all consider node content as simple flat word/attribute set and nodes sharing connections are assumed to have dependency with respect to all words or attributes. In this paper, we argue that considering topic-level semantic interactions between nodes is crucial to learn discriminative node embedding vectors. In order to model pairwise topic relevance between linked text nodes, we propose topical network embedding, where interactions between nodes are built on the shared latent topics. Accordingly, we propose a unified optimization framework to simultaneously learn topic and node representations from the network text contents and structures, respectively. Meanwhile, the structure modeling takes the learned topic representations as conditional context under the principle that two nodes can infer each other contingent on the shared latent topics. Experiments on three real-world datasets demonstrate that our approach can learn significantly better network representations, i.e., 4.1% improvement over the state-of-the-art methods in terms of Micro-F1 on Cora dataset. (The source code of the proposed method is available through the github link: https://github.com/codeshareabc/TopicalNE .)

[1]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[2]  Chengqi Zhang,et al.  Network Representation Learning: A Survey , 2017, IEEE Transactions on Big Data.

[3]  Masahiro Kimura,et al.  Extracting influential nodes on a social network for information diffusion , 2009, Data Mining and Knowledge Discovery.

[4]  Hady Wirawan Lauw,et al.  Probabilistic Latent Document Network Embedding , 2014, 2014 IEEE International Conference on Data Mining.

[5]  Jianxun Liu,et al.  A Topic-Sensitive Method for Mashup Tag Recommendation Utilizing Multi-Relational Service Data , 2021, IEEE Transactions on Services Computing.

[6]  Xiao Huang,et al.  Label Informed Attributed Network Embedding , 2017, WSDM.

[7]  Junwei Han,et al.  Heterogeneous Information Network Embedding based Personalized Query-Focused Astronomy Reference Paper Recommendation , 2018, Int. J. Comput. Intell. Syst..

[8]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[10]  Jaegul Choo,et al.  Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations , 2018, WWW.

[11]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[12]  Junwei Han,et al.  Generative Adversarial Network Based Heterogeneous Bibliographic Network Representation for Personalized Citation Recommendation , 2018, AAAI.

[13]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[14]  Michael Eickenberg,et al.  Machine learning for neuroimaging with scikit-learn , 2014, Front. Neuroinform..

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Kamal Kant Bharadwaj,et al.  Identifying community structure in a multi‐relational network employing non‐negative tensor factorization and GA k‐means clustering , 2017, WIREs Data Mining Knowl. Discov..

[17]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[18]  Clara Pizzuti,et al.  Detecting Topic Authoritative Social Media Users: A Multilayer Network Approach , 2018, IEEE Transactions on Multimedia.

[19]  Deli Zhao,et al.  Network Representation Learning with Rich Text Information , 2015, IJCAI.

[20]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[21]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[22]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[23]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[24]  Jian Pei,et al.  Community Preserving Network Embedding , 2017, AAAI.

[25]  Tomas Vitvar,et al.  Linked Web APIs dataset , 2017, Semantic Web.

[26]  Xuanjing Huang,et al.  Incorporate Group Information to Enhance Network Embedding , 2016, CIKM.

[27]  Jiawei Han,et al.  Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks , 2018, Data Mining and Knowledge Discovery.

[28]  Chengqi Zhang,et al.  Tri-Party Deep Network Representation , 2016, IJCAI.

[29]  Huan Liu,et al.  Toward online node classification on streaming networks , 2017, Data Mining and Knowledge Discovery.

[30]  Zhiyuan Liu,et al.  Max-Margin DeepWalk: Discriminative Learning of Network Representation , 2016, IJCAI.