Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs

We study the extent to which online social networks can be connected to open knowledge bases. The problem is referred to as learning social knowledge graphs. We propose a multi-modal Bayesian embedding model, GenVector, to learn latent topics that generate word and network embeddings. GenVector leverages large-scale unlabeled data with embeddings and represents data of two modalities---i.e., social network users and knowledge concepts---in a shared latent topic space. Experiments on three datasets show that the proposed method clearly outperforms state-of-the-art methods. We then deploy the method on AMiner, a large-scale online academic search system with a network of 38,049,189 researchers with a knowledge base with 35,415,011 concepts. Our method significantly decreases the error rate in an online A/B test with live users.

[1]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[2]  Zhoujun Li,et al.  Diabetes-Associated Factors as Predictors of Nursing Home Admission and Costs in the Elderly Across Europe. , 2017, Journal of the American Medical Directors Association.

[3]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[4]  James C. Bezdek,et al.  Convergence of Alternating Optimization , 2003, Neural Parallel Sci. Comput..

[5]  Hector Garcia-Molina,et al.  Social tag prediction , 2008, SIGIR '08.

[6]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[7]  Philip S. Yu,et al.  COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency , 2015, KDD.

[8]  Feifan Liu,et al.  Unsupervised Approaches for Automatic Keyword Extraction Using Meeting Transcripts , 2009, NAACL.

[9]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[10]  Jie Tang,et al.  Probabilistic Community and Role Model for Social Networks , 2015, KDD.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[13]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[14]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[15]  Kiyoaki Shirai,et al.  Topic Modeling based Sentiment Analysis on Social Media for Stock Market Prediction , 2015, ACL.

[16]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[17]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[18]  Long Zhu,et al.  A Hybrid Neural Network-Latent Topic Model , 2012, AISTATS.

[19]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[20]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[21]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[22]  Gerhard Weikum,et al.  NAGA: Searching and Ranking Knowledge , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[23]  C. J. van Rijsbergen,et al.  Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval , 1987, SIGIR 1987.

[24]  Ralph Kimball,et al.  The Microsoft Data Warehouse Toolkit: With SQL Server 2008 R2 and the Microsoft Business Intelligence Toolset , 2006 .

[25]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[26]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[27]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[28]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[29]  Rajarshi Das,et al.  Gaussian LDA for Topic Models with Word Embeddings , 2015, ACL.

[30]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[31]  Jimeng Sun,et al.  Incorporating Social Context and Domain Knowledge for Entity Recognition , 2015, WWW.

[32]  Christopher Potts,et al.  Sentiment expression conditioned by affective transitions and social forces , 2014, KDD.

[33]  Mark Dredze,et al.  Entity Linking: Finding Extracted Entities in a Knowledge Base , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[34]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[35]  Andrew McCallum,et al.  Word Representations via Gaussian Embedding , 2014, ICLR.