It all starts with entities: A Salient entity topic model

Entities play an essential role in understanding textual documents, regardless of whether the documents are short, such as tweets, or long, such as news articles. In short textual documents, all entities mentioned are usually considered equally important because of the limited amount of information. In long textual documents, however, not all entities are equally important: some are salient and others are not. Traditional entity topic models (ETMs) focus on ways to incorporate entity information into topic models to better explain the generative process of documents. However, entities are usually treated equally, without considering whether they are salient or not. In this work, we propose a novel ETM, Salient Entity Topic Model, to take salient entities into consideration in the document generation process. In particular, we model salient entities as a source of topics used to generate words in documents, in addition to the topic distribution of documents used in traditional topic models. Qualitative and quantitative analysis is performed on the proposed model. Application to entity salience detection demonstrates the effectiveness of our model compared to the state-of-the-art topic model baselines.

[1]  Yue Wang,et al.  Filtering out the noise in short text topic modeling , 2018, Inf. Sci..

[2]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Claudia Niederée,et al.  Balancing Novelty and Salience: Adaptive Learning to Rank Entities for Timeline Summarization of High-impact Events , 2015, CIKM.

[4]  Ben He,et al.  Question-answer topic model for question retrieval in community question answering , 2012, CIKM.

[5]  Simone Paolo Ponzetto,et al.  Entities as topic labels : combining entity linking and labeled LDA to improve topic interpretability and evaluability , 2016 .

[6]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[7]  Guilin Qi,et al.  Incorporating Wikipedia concepts and categories as prior knowledge into topic models , 2017, Intell. Data Anal..

[8]  Michael Gamon,et al.  Identifying salient entities in web pages , 2013, CIKM.

[9]  Andreas Stolcke,et al.  Word-phrase-entity language models: getting more mileage out of n-grams , 2014, INTERSPEECH.

[10]  Jing Zhang,et al.  o-HETM: An Online Hierarchical Entity Topic Model for News Streams , 2015, PAKDD.

[11]  Jihong Ouyang,et al.  Supervised topic models for multi-label classification , 2015, Neurocomputing.

[12]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[13]  Paolo Ferragina,et al.  Swat: A system for detecting salient Wikipedia entities in texts , 2018, Comput. Intell..

[14]  Ho-Jin Choi,et al.  Sequential Entity Group Topic Model for Getting Topic Flows of Entity Groups within One Document , 2012, PAKDD.

[15]  Jihong Ouyang,et al.  Centroid prior topic model for multi-label classification , 2015, Pattern Recognit. Lett..

[16]  Padhraic Smyth,et al.  Statistical entity-topic models , 2006, KDD '06.

[17]  Zhiyuan Liu,et al.  Representation Learning of Knowledge Graphs with Entity Descriptions , 2016, AAAI.

[18]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[19]  Nikolaos Aletras,et al.  Labeling Topics with Images Using a Neural Network , 2016, ECIR.

[20]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Timothy Baldwin,et al.  Automatic Labelling of Topic Models , 2011, ACL.

[22]  Bing Liu,et al.  Mining Aspect-Specific Opinion using a Holistic Lifelong Topic Model , 2016, WWW.

[23]  Xianpei Han,et al.  An Entity-Topic Model for Entity Linking , 2012, EMNLP.

[24]  Yizhou Sun,et al.  ETM: Entity Topic Models for Mining Documents Associated with Entities , 2012, 2012 IEEE 12th International Conference on Data Mining.

[25]  Rajeev Rastogi,et al.  Entity disambiguation with hierarchical topic models , 2011, KDD.

[26]  Anísio Lacerda,et al.  A general framework to expand short text for topic modeling , 2017, Inf. Sci..

[27]  Andrew McCallum,et al.  The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email , 2005 .

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  Lidia Pivovarova,et al.  Grouping business news stories based on salience of named entities , 2017, EACL.

[30]  Hong Shen,et al.  User clustering in a dynamic social network topic model for short text streams , 2017, Inf. Sci..

[31]  Daniel Gillick,et al.  A New Entity Salience Task with Millions of Training Examples , 2014, EACL.

[32]  Krisztian Balog,et al.  Entity-Oriented Search , 2018, The Information Retrieval Series.

[33]  Tie-Yan Liu,et al.  Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling , 2018, SIGIR.

[34]  Wenji Mao,et al.  A Non-Parametric Topic Model for Short Texts Incorporating Word Coherence Knowledge , 2016, CIKM.

[35]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[36]  Derek Greene,et al.  Unsupervised graph-based topic labelling using dbpedia , 2013, WSDM.

[37]  Timothy N. Rubin,et al.  Statistical topic models for multi-label document classification , 2011, Machine Learning.

[38]  Wei Shen,et al.  Linking named entities in Tweets with knowledge base via user interest modeling , 2013, KDD.