Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling

This paper presents a Kernel Entity Salience Model (KESM) that improves text understanding and retrieval by better estimating entity salience (importance) in documents. KESM represents entities by knowledge enriched distributed representations, models the interactions between entities and words by kernels, and combines the kernel scores to estimate entity salience. The whole model is learned end-to-end using entity salience labels. The salience model also improves ad hoc search accuracy, providing effective ranking features by modeling the salience of query entities in candidate documents. Our experiments on two entity salience corpora and two TREC ad hoc search datasets demonstrate the effectiveness of KESM over frequency-based and feature-based methods. We also provide examples showing how KESM conveys its text understanding ability learned from entity salience to search.

[1]  Tie-Yan Liu,et al.  Word-Entity Duet Representations for Document Ranking , 2017, SIGIR.

[2]  Oren Kurland,et al.  Document Retrieval Using Entity-Based Language Models , 2016, SIGIR.

[3]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[4]  Paolo Ferragina,et al.  Fast and Accurate Annotation of Short Texts with Wikipedia Pages , 2010, IEEE Software.

[5]  James P. Callan,et al.  Query Expansion with Freebase , 2015, ICTIR.

[6]  Michalis Vazirgiannis,et al.  Graph-of-word and TW-IDF: new approach to ad hoc IR , 2013, CIKM.

[7]  Peter Fankhauser,et al.  Boilerplate detection using shallow text features , 2010, WSDM '10.

[8]  James P. Callan,et al.  EsdRank: Connecting Query and Documents through External Semi-Structured Data , 2015, CIKM.

[9]  W. Bruce Croft,et al.  Parameterized concept weighting in verbose queries , 2011, SIGIR.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  Elena Mozzherina,et al.  An Approach to Improving the Classification of the New York Times Annotated Corpus , 2013, KESW.

[12]  Ebrahim Bagheri,et al.  Document Retrieval Model Through Semantic Linking , 2017, WSDM.

[13]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[14]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[15]  Christina Lioma,et al.  Graph-based term weighting for information retrieval , 2011, Information Retrieval.

[16]  Björn Buchhold,et al.  Semantic Search on Text and Knowledge Bases , 2016, Found. Trends Inf. Retr..

[17]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[18]  Daniel Gillick,et al.  A New Entity Salience Task with Millions of Training Examples , 2014, EACL.

[19]  James P. Callan,et al.  Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding , 2017, WWW.

[20]  Xitong Liu,et al.  Latent entity space: a novel retrieval approach for entity-bearing queries , 2015, Information Retrieval Journal.

[21]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[22]  Tie-Yan Liu,et al.  Bag-of-Entities Representation for Ranking , 2016, ICTIR.

[23]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[24]  James Allan,et al.  Entity query feature expansion using knowledge base links , 2014, SIGIR.

[25]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[26]  James P. Callan,et al.  Learning to Reweight Terms with Distributed Representations , 2015, SIGIR.

[27]  Le Zhao,et al.  Term necessity prediction , 2010, CIKM.

[28]  Zhiyuan Liu,et al.  Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search , 2018, WSDM.

[29]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[30]  Tomas Vitvar,et al.  Crowdsourced Corpus with Entity Salience Annotations , 2016, LREC.

[31]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[32]  James P. Callan,et al.  JointSem: Combining Query Entity Linking and Entity based Document Ranking , 2017, CIKM.