Incorporating Knowledge Graph Embeddings into Topic Modeling

Probabilistic topic models could be used to extract lowdimension topics from document collections. However, such models without any human knowledge often produce topics that are not interpretable. In recent years, a number of knowledge-based topic models have been proposed, but they could not process fact-oriented triple knowledge in knowledge graphs. Knowledge graph embeddings, on the other hand, automatically capture relations between entities in knowledge graphs. In this paper, we propose a novel knowledge-based topic model by incorporating knowledge graph embeddings into topic modeling. By combining latent Dirichlet allocation, a widely used topic model with knowledge encoded by entity vectors, we improve the semantic coherence significantly and capture a better representation of a document in the topic space. Our evaluation results will demonstrate the effectiveness of our method.

[1]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[2]  Baogang Wei,et al.  Incorporating Probabilistic Knowledge into Topic Models , 2015, PAKDD.

[3]  Edwin V. Bonilla,et al.  Improving Topic Coherence with Regularized Topic Models , 2011, NIPS.

[4]  Yves Grandvalet,et al.  Combining Two And Three-Way Embeddings Models for Link Prediction in Knowledge Bases , 2016, J. Artif. Intell. Res..

[5]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[6]  Timothy Baldwin,et al.  Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality , 2014, EACL.

[7]  Xiaojin Zhu,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation Using First-Order Logic , 2022 .

[8]  Jun Zhao,et al.  Learning to Represent Knowledge Graphs with Gaussian Embedding , 2015, CIKM.

[9]  Dat Quoc Nguyen,et al.  Improving Topic Models with Latent Feature Word Representations , 2015, TACL.

[10]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[11]  Ji-Rong Wen,et al.  How to Make a Semantic Network Probabilistic , 2014 .

[12]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Eric P. Xing,et al.  Grounding Topic Models with Knowledge Bases , 2016, IJCAI.

[15]  Nematollah Batmanghelich,et al.  Nonparametric Spherical Topic Modeling with Word Embeddings , 2016, ACL.

[16]  Padhraic Smyth,et al.  Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning , 2008, SEMWEB.

[17]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[18]  Jason Weston,et al.  A semantic matching energy function for learning with multi-relational data , 2013, Machine Learning.

[19]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  Quentin Pleple,et al.  Interactive Topic Modeling , 2013 .

[22]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[23]  Zhen Wang,et al.  Knowledge Graph and Text Jointly Embedding , 2014, EMNLP.

[24]  Xiaojin Zhu,et al.  Latent Dirichlet Allocation with Topic-in-Set Knowledge , 2009, HLT-NAACL 2009.

[25]  Ryan P. Adams,et al.  Graph-Sparse LDA: A Topic Model with Structured Sparsity , 2014, AAAI.

[26]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[27]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[28]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[29]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[30]  Bing Liu,et al.  Mining topics in documents: standing on the shoulders of big data , 2014, KDD.

[31]  Rajarshi Das,et al.  Gaussian LDA for Topic Models with Word Embeddings , 2015, ACL.

[32]  Zhiyuan Liu,et al.  Representation Learning of Knowledge Graphs with Entity Descriptions , 2016, AAAI.

[33]  Diyi Yang,et al.  Incorporating Word Correlation Knowledge into Topic Modeling , 2015, NAACL.

[34]  Yiming Yang,et al.  Von Mises-Fisher Clustering Models , 2014, ICML.

[35]  Arjun Mukherjee,et al.  Discovering coherent topics using general knowledge , 2013, CIKM.

[36]  Jun Zhao,et al.  Knowledge Graph Embedding via Dynamic Mapping Matrix , 2015, ACL.

[37]  Li Guo,et al.  Semantically Smooth Knowledge Graph Embedding , 2015, ACL.

[38]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[39]  Lejian Liao,et al.  Topic Modeling with Document Relative Similarities , 2015, IJCAI.

[40]  Hal Daumé,et al.  Incorporating Lexical Priors into Topic Models , 2012, EACL.

[41]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[42]  Padhraic Smyth,et al.  Statistical entity-topic models , 2006, KDD '06.