A query term re-weighting approach using document similarity

A query term re-weighting method to reformulate textual queries is proposed.Our approach is a local query modification method.We use the information carried by the top documents in relation to each other.Query term re-weighting can applied to short queries too.Queries which use a general vocabulary set show the least improvement. Pseudo-relevance feedback is the basis of a category of automatic query modification techniques. Pseudo-relevance feedback methods assume the initial retrieved set of documents to be relevant. Then they use these documents to extract more relevant terms for the query or just re-weigh the user's original query. In this paper, we propose a straightforward, yet effective use of pseudo-relevance feedback method in detecting more informative query terms and re-weighting them. The query-by-query analysis of our results indicates that our method is capable of identifying the most important keywords even in short queries. Our main idea is that some of the top documents may contain a closer context to the user's information need than the others. Therefore, re-examining the similarity of those top documents and weighting this set based on their context could help in identifying and re-weighting informative query terms. Our experimental results in standard English and Persian test collections show that our method improves retrieval performance, in terms of MAP criterion, up to 7% over traditional query term re-weighting methods.

[1]  Farhad Oroumchian,et al.  Assessment of Query Reweighing, by Rocchio Method in Farsi Information Retrieval , 2008 .

[2]  Bamshad Mobasher,et al.  Web search personalization with ontological user profiles , 2007, CIKM '07.

[3]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[4]  Fattaneh Taghiyareh,et al.  Customizing Local Context Analysis for Farsi Information Retrieval by Using a New Concept Weighting Algorithm , 2008, 2008 Third International Workshop on Semantic Media Adaptation and Personalization.

[5]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[6]  Farhad Oroumchian,et al.  N-gram and Local Context Analysis for Persian text retrieval , 2007, 2007 9th International Symposium on Signal Processing and Its Applications.

[7]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[8]  Prasenjit Majumder,et al.  The FIRE 2008 Evaluation Exercise , 2010, TALIP.

[9]  W. Bruce Croft,et al.  Discovering key concepts in verbose queries , 2008, SIGIR '08.

[10]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 13: Web and Hard Tracks , 2004, TREC.

[11]  Masoud Rahgozar,et al.  Hamshahri: A standard Persian text collection , 2009, Knowl. Based Syst..

[12]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[13]  James Allan,et al.  A cluster-based resampling method for pseudo-relevance feedback , 2008, SIGIR '08.

[14]  W. Bruce Croft,et al.  Relevance Feedback and Personalization: A Language Modeling Perspective , 2001, DELOS.

[15]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[16]  James Allan,et al.  INQUERY and TREC-8 , 1998, TREC.

[17]  Yi Chen,et al.  Query Expansion Based on Clustered Results , 2011, Proc. VLDB Endow..

[18]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[19]  In-Ho Kang,et al.  Query type classification for web document retrieval , 2003, SIGIR.

[20]  Stephen E. Robertson,et al.  Relevance weighting for query independent evidence , 2005, SIGIR '05.

[21]  W. Bruce Croft,et al.  Query performance prediction in web search environments , 2007, SIGIR.