Dynamic Collective Entity Representations for Entity Ranking

Entity ranking, i.e., successfully positioning a relevant entity at the top of the ranking for a given query, is inherently difficult due to the potential mismatch between the entity’s description in a knowledge base, and the way people refer to the entity when searching for it. To counter this issue we propose a method for constructing dynamic collective entity representations. We collect entity descriptions from a variety of sources and combine them into a single entity representation by learning to weight the content from different sources that are associated with an entity for optimal retrieval effectiveness. Our method is able to add new descriptions in real time and learn the best representation as time evolves so as to capture the dynamics of how people search entities. Incorporating dynamic description sources into dynamic collective entity representations improves retrieval effectiveness by 7% over a state-ofthe-art learning to rank baseline. Periodic retraining of the ranker enables higher ranking effectiveness for dynamic collective entity representations.

[1]  Gianluca Demartini,et al.  Overview of the INEX 2008 Entity Ranking Track , 2009, INEX.

[2]  Wei Yuan,et al.  Smoothing clickthrough data for web search ranking , 2009, SIGIR.

[3]  Mong-Li Lee,et al.  Entity profiling with varying source reliabilities , 2014, KDD.

[4]  Jasmine Novak,et al.  Building enriched document representations using aggregated anchor text , 2009, SIGIR.

[5]  Ravi Kumar,et al.  A characterization of online browsing behavior , 2010, WWW '10.

[6]  Themis Palpanas,et al.  Entity ranking using click-log information , 2013, Intell. Data Anal..

[7]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[8]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[9]  Hugh E. Williams,et al.  Query association surrogates for Web search , 2004, J. Assoc. Inf. Sci. Technol..

[10]  Gilad Mishne,et al.  Twanchor text: a preliminary study of the value of tweets as anchor text , 2012, SIGIR '12.

[11]  Christoph Meinel,et al.  The Metadata Triumvirate: Social Annotations, Anchor Texts and Search Queries , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[12]  Craig MacDonald,et al.  About learning models with multiple query-dependent features , 2013, TOIS.

[13]  Ralf Krestel,et al.  Ranking Entities Using Web Search Query Logs , 2010, ECDL.

[14]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 13: Web and Hard Tracks , 2004, TREC.

[15]  Paul Thomas,et al.  Overview of the TREC 2009 Entity Track , 2009, TREC.

[16]  Kilian Q. Weinberger,et al.  Web-Search Ranking with Initialized Gradient Boosted Regression Trees , 2010, Yahoo! Learning to Rank Challenge.

[17]  Maarten de Rijke,et al.  News Comments: Exploring, Modeling, and Online Prediction , 2010, ECIR.

[18]  M. de Rijke,et al.  Category-Based Query Modeling for Entity Search , 2010, ECIR.

[19]  Kotagiri Ramamohanarao,et al.  Long-Term Learning for Web Search Engines , 2002, PKDD.

[20]  Wei-Ying Ma,et al.  Optimizing web search using web click-through data , 2004, CIKM '04.

[21]  M. de Rijke,et al.  Adding semantics to microblog posts , 2012, WSDM '12.

[22]  Dilek Z. Hakkani-Tür,et al.  Entity ranking for descriptive queries , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[23]  Yong Yu,et al.  Optimizing web search using social annotations , 2007, WWW '07.

[24]  Christopher J. C. Burges,et al.  A machine learning approach for improved BM25 retrieval , 2009, CIKM.

[25]  David Konopnicki,et al.  Queries as anchors: selection by association , 2005, HYPERTEXT '05.

[26]  Krisztian Balog,et al.  Overview of the TREC 2010 Entity Track , 2010, TREC.

[27]  W. Bruce Croft,et al.  Incorporating social anchors for ad hoc retrieval , 2013, OAIR.

[28]  Mounia Lalmas,et al.  Overview of the INEX 2007 Entity Ranking Track , 2008, INEX.

[29]  Jaap Kamps,et al.  Entity ranking using Wikipedia as a pivot , 2010, CIKM.

[30]  Andrei Z. Broder,et al.  Exploiting site-level information to improve web search , 2010, CIKM '10.

[31]  David Hawking,et al.  Using anchor text for homepage and topic distillation search tasks , 2012, J. Assoc. Inf. Sci. Technol..

[32]  Katrina Fenlon,et al.  Improving retrieval of short texts through document expansion , 2012, SIGIR '12.

[33]  Bernardo A. Huberman,et al.  Predicting the popularity of online content , 2008, Commun. ACM.

[34]  Hugh E. Williams,et al.  Query association surrogates for Web search: Research Articles , 2004 .

[35]  Gianluca Demartini,et al.  Overview of the INEX 2009 Entity Ranking Track , 2009, INEX.

[36]  Andrew McCallum,et al.  Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora , 2005 .

[37]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[38]  James A. Thom,et al.  Entity ranking in Wikipedia , 2007, SAC '08.

[39]  Amit Singhal,et al.  Document expansion for speech retrieval , 1999, SIGIR '99.

[40]  Djoerd Hiemstra,et al.  Retrieving Web Pages Using Content, Links, URLs and Anchors , 2001, TREC.

[41]  Krisztian Balog,et al.  On the use of semantic knowledge bases for temporally-aware entity retrieval , 2012, ESAIR '12.

[42]  Katja Hofmann,et al.  A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.

[43]  Kevin S. McCurley,et al.  Analysis of anchor text for web search , 2003, SIGIR.