Enriching Topic Models with DBpedia

Traditional Topic Modeling approaches only consider the words in the document. By using an entity-topic modeling approach and including background knowledge about the entities such as the occupation of persons, the location of organizations, the band of a musician etc., we can better cluster related documents together, and produce semantic topic models that can be represented in a knowledge base. In our approach we first reduce the text documents to a set of entities and then enrich this set with background knowledge from DBpedia. Topic modeling is performed on the enriched set of entities and various feature combinations are evaluated in order to determine the combination that achieves the best classification precision or perplexity compared to using word-based topic models alone.

[1]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[2]  Pablo N. Mendes,et al.  Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[3]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[4]  Padhraic Smyth,et al.  Statistical entity-topic models , 2006, KDD '06.

[5]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[6]  David J. Miller,et al.  Parsimonious Topic Models with Salient Word Discovery , 2014, IEEE Transactions on Knowledge and Data Engineering.

[7]  Evgeniy Gabrilovich,et al.  Wikipedia-based Semantic Interpretation for Natural Language Processing , 2014, J. Artif. Intell. Res..

[8]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[9]  Padhraic Smyth,et al.  Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning , 2008, SEMWEB.

[10]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[11]  Xiaojin Zhu,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation Using First-Order Logic , 2022 .

[12]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Tomás Kliegr,et al.  Linked hypernyms: Enriching DBpedia with Targeted Hypernym Discovery , 2015, J. Web Semant..

[15]  Eric P. Xing,et al.  Grounding Topic Models with Knowledge Bases , 2016, IJCAI.

[16]  William W. Cohen,et al.  KB-LDA: Jointly Learning a Knowledge Base of Hierarchy, Relations, and Facts , 2015, ACL.

[17]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[18]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[19]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.