Automatic query expansion based on tag recommendation

We here propose a new method for expanding entity related queries that automatically filters, weights and ranks candidate expasion terms extracted from Wikipedia articles related to the original query. Our method is based on state-of-the-art tag recommendation methods that exploit heuristic metrics to estimate the descriptive capacity of a given term. Originally proposed for the context of tags, we here apply these recommendation methods to weight and rank terms extracted from multiple fields of Wikipedia articles according to their relevance for the article. We evaluate our method comparing it against three state-of-the-art baselines in three collections. Our results indicate that our method outperforms all baselines in all collections, with relative gains in MAP of up to 14% against the best ones.

[1]  Xin Li,et al.  Tag-based social interest discovery , 2008, WWW.

[2]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  David Hawking,et al.  Overview of the TREC-2001 Web track , 2002 .

[5]  Berthier A. Ribeiro-Neto,et al.  Concept-based interactive query expansion , 2005, CIKM '05.

[6]  Hongfei Lin,et al.  Social annotation in query expansion: a machine learning approach , 2011, SIGIR.

[7]  W. Bruce Croft,et al.  A framework for selective query expansion , 2004, CIKM '04.

[8]  Tao Tao,et al.  Regularized estimation of mixture models for robust pseudo-relevance feedback , 2006, SIGIR.

[9]  Korris Fu-Lai Chung,et al.  Improving weak ad-hoc queries using wikipedia asexternal corpus , 2007, SIGIR.

[10]  W. Bruce Croft,et al.  Indri at TREC 2004: Terabyte Track , 2004, TREC.

[11]  Wladmir Cardoso Brandão,et al.  EXPLOITING ENTITY SEMANTICS FOR QUERY EXPANSION , 2011 .

[12]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.

[13]  Wei-Ying Ma,et al.  Query Expansion by Mining User Logs , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Jussara M. Almeida,et al.  Associative tag recommendation exploiting multiple textual features , 2011, SIGIR.

[15]  Vera Lúcia Strube de Lima,et al.  Evaluation of a Thesaurus-Based Query Expansion Technique , 2003, PROPOR.

[16]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[17]  James Allan,et al.  A cluster-based resampling method for pseudo-relevance feedback , 2008, SIGIR '08.

[18]  Georgia Koutrika,et al.  Combating spam in tagging systems , 2007, AIRWeb '07.

[19]  Iadh Ounis,et al.  Combining fields for query expansion and adaptive query expansion , 2007, Inf. Process. Manag..

[20]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[21]  Charles L. A. Clarke,et al.  The TREC 2006 Terabyte Track , 2006, TREC.

[22]  Yang Xu,et al.  Query dependent pseudo-relevance feedback based on wikipedia , 2009, SIGIR.

[23]  W. Bruce Croft,et al.  Latent concept expansion using markov random fields , 2007, SIGIR.

[24]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[25]  Flavio Figueiredo,et al.  Assessing the quality of textual features in social media , 2013, Inf. Process. Manag..

[26]  Flavio Figueiredo,et al.  Evidence of quality of textual features on the web 2.0 , 2009, CIKM.

[27]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[28]  Pável Calado,et al.  Automatic Assessment of Document Quality in Web Collaborative Digital Libraries , 2011, JDIQ.

[29]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[30]  Rodrygo L. T. Santos,et al.  Learning to expand queries using entities , 2014, J. Assoc. Inf. Sci. Technol..

[31]  Jaime G. Carbonell,et al.  Retrieval and feedback models for blog feed search , 2008, SIGIR '08.

[32]  Elad Yom-Tov,et al.  Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval , 2005, SIGIR '05.

[33]  David Carmel,et al.  Social media recommendation based on people and tags , 2010, SIGIR.

[34]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[35]  Craig MacDonald,et al.  Intent-aware search result diversification , 2011, SIGIR.