论文信息 - Query Aspect Based Term Weighting Regularization in Information Retrieval

Query Aspect Based Term Weighting Regularization in Information Retrieval

Traditional retrieval models assume that query terms are independent and rank documents primarily based on various term weighting strategies including TF-IDF and document length normalization. However, query terms are related, and groups of semantically related query terms may form query aspects. Intuitively, the relations among query terms could be utilized to identify hidden query aspects and promote the ranking of documents covering more query aspects. Despite its importance, the use of semantic relations among query terms for term weighting regularization has been under-explored in information retrieval. In this paper, we study the incorporation of query term relations into existing retrieval models and focus on addressing the challenge, i.e., how to regularize the weights of terms in different query aspects to improve retrieval performance. Specifically, we first develop a general strategy that can systematically integrate a term weighting regularization function into existing retrieval functions, and then propose two specific regularization functions based on the guidance provided by constraint analysis. Experiments on eight standard TREC data sets show that the proposed methods are effective to improve retrieval accuracy.

Wei Zheng | Hui Fang

[1] C. J. van Rijsbergen,et al. Information Retrieval , 1979, Encyclopedia of GIS.

[2] Gerard Salton,et al. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[3] Peter Boros,et al. Query Segmentation for Web Search , 2003, WWW.

[4] Tao Tao,et al. A formal study of information retrieval heuristics , 2004, SIGIR '04.

[5] Benjamin Rey,et al. Generating query substitutions , 2006, WWW '06.

[6] W. Bruce Croft,et al. A language modeling approach to information retrieval , 1998, SIGIR '98.

[7] Stephen E. Robertson,et al. On relevance weights with little relevance information , 1997, SIGIR '97.

[8] James Allan,et al. A Case For Shorter Queries, and Helping Users Create Them , 2007, NAACL.

[9] Chris Buckley,et al. Why current IR engines fail , 2004, SIGIR '04.

[10] Tao Tao,et al. An exploration of proximity measures in information retrieval , 2007, SIGIR.

[11] Stephen E. Robertson,et al. Okapi at TREC-3 , 1994, TREC.