Query modeling for entity search based on terms, categories, and examples

Users often search for entities instead of documents, and in this setting, are willing to provide extra input, in addition to a series of query terms, such as category information and example entities. We propose a general probabilistic framework for entity search to evaluate and provide insights in the many ways of using these types of input for query modeling. We focus on the use of category information and show the advantage of a category-based representation over a term-based representation, and also demonstrate the effectiveness of category-based expansion using example entities. Our best performing model shows very competitive performance on the INEX-XER entity ranking and list completion tasks.

[1]  James A. Thom,et al.  Using Wikipedia Categories and Links in Entity Ranking , 2007, INEX.

[2]  Jennifer Chu-Carroll,et al.  IBM's PIQUANT II in TREC 2004 , 2004, TREC.

[3]  Djoerd Hiemstra,et al.  Being Omnipresent To Be Almighty: The Importance of The Global Web Evidence for Organizational Expert Finding , 2008 .

[4]  Azadeh Shakery,et al.  Toward Entity Retrieval over Structured and Text Data , 2004 .

[5]  Gianluca Demartini,et al.  Overview of the INEX 2008 Entity Ranking Track , 2009, INEX.

[6]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[7]  Djoerd Hiemstra,et al.  Structured Document Retrieval, Multimedia Retrieval, and Entity Ranking Using PF/Tijah , 2008, INEX.

[8]  W. Bruce Croft,et al.  Proximity-based document representation for named entity retrieval , 2007, CIKM '07.

[9]  Stefan M. Rüger,et al.  Integrating Document Features for Entity Ranking , 2008, INEX.

[10]  James A. Thom,et al.  Exploiting Locality of Wikipedia Links in Entity Ranking , 2008, ECIR.

[11]  Gilad Mishne,et al.  A Study of Blog Search , 2006, ECIR.

[12]  Wouter Weerkamp,et al.  A Generative Language Modeling Approach for Ranking Entities , 2008, INEX.

[13]  M. de Rijke,et al.  Articulating information needs in XML query languages , 2006, TOIS.

[14]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[15]  W. Bruce Croft,et al.  A Probabilistic Retrieval Model for Semistructured Data , 2009, ECIR.

[16]  Jaap Kamps,et al.  Finding Entities in Wikipedia Using Links and Categories , 2008, INEX.

[17]  Xiangji Huang,et al.  Integrating multiple document features in language models for expert finding , 2010, Knowledge and Information Systems.

[18]  Paul Thomas,et al.  Overview of the TREC 2009 Entity Track , 2009, TREC.

[19]  Andrew Trotman,et al.  Focused Access to XML Documents, 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, Dagstuhl Castle, Germany, December 17-19, 2007. Selected Papers , 2008, INEX.

[20]  M. de Rijke,et al.  Learning Semantic Query Suggestions , 2009, SEMWEB.

[21]  Wei Lu,et al.  Adapting Language Modeling Methods for Expert Search to Rank Wikipedia Entities , 2008, INEX.

[22]  James A. Thom,et al.  Entity ranking in Wikipedia , 2007, SAC '08.

[23]  Maarten de Rijke,et al.  Associating People and Documents , 2008, ECIR.

[24]  Djoerd Hiemstra,et al.  The Importance of Prior Probabilities for Entry Page Search , 2002, SIGIR '02.

[25]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[26]  M. de Rijke,et al.  A few examples go a long way: constructing query models from elaborate query formulations , 2008, SIGIR '08.

[27]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[28]  Paavo Arvola,et al.  Entity Ranking Based on Category Expansion , 2008, INEX.

[29]  James A. Thom,et al.  Use of Wikipedia Categories in Entity Ranking , 2007, ArXiv.

[30]  Mounia Lalmas,et al.  Overview of the INEX 2007 Entity Ranking Track , 2008, INEX.

[31]  Katherine A. Heller,et al.  Bayesian Sets , 2005, NIPS.

[32]  Jaap Kamps,et al.  The Importance of Link Evidence in Wikipedia , 2008, ECIR.

[33]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[34]  Enrico Motta,et al.  The Open University at TREC 2006 Enterprise Track Expert Search Task , 2006, TREC.

[35]  Tao Tao,et al.  Regularized estimation of mixture models for robust pseudo-relevance feedback , 2006, SIGIR.

[36]  Andrew Trotman,et al.  Advances in Focused Retrieval, 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, Dagstuhl Castle, Germany, December 15-18, 2008. Revised and Selected Papers , 2009, INEX.

[37]  Leif Azzopardi,et al.  An analysis on document length retrieval trends in language modeling smoothing , 2008, Information Retrieval.

[38]  Maarten de Rijke,et al.  Search behavior of media professionals at an audiovisual archive: A transaction log analysis , 2010, J. Assoc. Inf. Sci. Technol..

[39]  Peter Bailey,et al.  Overview of the TREC 2007 Enterprise Track , 2007, TREC.

[40]  Emine Yilmaz,et al.  A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.

[41]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[42]  Nick Craswell,et al.  L3S at INEX 2008: Retrieving Entities Using Structured Information , 2008, INEX.

[43]  Jack G. Conrad,et al.  A system for discovering relationships by feature extraction from text databases , 1994, SIGIR '94.

[44]  M. de Rijke,et al.  A language modeling framework for expert finding , 2009, Inf. Process. Manag..

[45]  Saswati Mukherjee,et al.  An n-Gram and Initial Description Based Approach for Entity Ranking Track , 2008, INEX.

[46]  Krisztian Balog,et al.  People search in the enterprise , 2007, SIGF.

[47]  Maarten de Rijke,et al.  Thesaurus-Based Feedback to Support Mixed Search and Browsing Environments , 2007, ECDL.

[48]  James Allan,et al.  An Exploration of Entity Models, Collective Classification and Relation Description , 2004 .

[49]  Gilad Mishne,et al.  Boosting Web Retrieval through Query Operations , 2005, BNAIC.

[50]  Gianluca Demartini,et al.  L3S at INEX 2007: Query Expansion for Entity Ranking Using a Highly Accurate Ontology , 2007, INEX.

[51]  Fabian M. Suchanek,et al.  Yago: A Core of Semantic Knowledge Unifying WordNet and Wikipedia , 2007 .

[52]  M. de Rijke,et al.  Entity Retrieval , 2007 .

[53]  M. de Rijke,et al.  Category-Based Query Modeling for Entity Search , 2010, ECIR.

[54]  Jaana Kekäläinen,et al.  ExpansionTool: Concept-Based Query Expansion and Construction , 2001, Information Retrieval.

[55]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[56]  Peter Bailey,et al.  Overview of the TREC 2008 Enterprise Track , 2008, TREC.

[57]  Giuseppe Attardi,et al.  Ranking very many typed entities on wikipedia , 2007, CIKM '07.

[58]  Jovan Pehcevski,et al.  Topic Difficulty Prediction in Entity Ranking , 2008, INEX.