Relation Based Term Weighting Regularization

Traditional retrieval models compute term weights based on only the information related to individual terms such as TF and IDF. However, query terms are related. Intuitively, these relations could provide useful information about the importance of a term in the context of other query terms. For example, query "perl tutorial" specifies that a user look for information relevant to both perl and tutorial. Thus, a document containing both terms should have higher relevance score than the ones with only one of them. However, if the IDF value of "tutorial" is much smaller than "perl", existing retrieval models may assign the document lower score than those containing multiple occurrences of "perl". It is clear that the importance of a term should be dependent on not only collection statistics but also the relations with other query terms. In this work, we study how to utilize semantic relations among query terms to regularize term weighting. Experiment results over TREC collections show that the proposed strategy is effective to improve the retrieval performance.

[1]  Wei Zheng,et al.  Query Aspect Based Term Weighting Regularization in Information Retrieval , 2010, ECIR.

[2]  F. Hartwig,et al.  Exploratory Data Analysis , 2008, Using Science in Cybersecurity.

[3]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[4]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[5]  Ellen M. Voorhees,et al.  The fourteenth text retrieval conference TREC 2005 , 2006 .

[6]  W. Bruce Croft,et al.  The use of phrases and structured queries in information retrieval , 1991, SIGIR '91.

[7]  ChengXiang Zhai,et al.  Semantic term matching in axiomatic approaches to information retrieval , 2006, SIGIR.

[8]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[9]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[10]  Amit Singhal,et al.  Pivoted document length normalization , 1996, SIGIR 1996.

[11]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[12]  ChengXiang Zhai,et al.  An exploration of axiomatic approaches to information retrieval , 2005, SIGIR '05.

[13]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[14]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[15]  Tao Tao,et al.  A formal study of information retrieval heuristics , 2004, SIGIR '04.

[16]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[17]  Warren R. Greiff,et al.  A theory of term weighting based on exploratory data analysis , 1998, SIGIR '98.

[18]  Clement T. Yu,et al.  An effective approach to document retrieval via utilizing WordNet and recognizing phrases , 2004, SIGIR '04.

[19]  Ellen M. Voorhees,et al.  Overview of the TREC 2006 , 2007, TREC.

[20]  Norbert Fuhr,et al.  Probabilistic Models in Information Retrieval , 1992, Comput. J..

[21]  Tao Tao,et al.  An exploration of proximity measures in information retrieval , 2007, SIGIR.

[22]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[23]  Claire Cardie,et al.  An Analysis of Statistical and Syntactic Phrases , 1997, RIAO.

[24]  W. R. Grei,et al.  A theory of term weighting based on exploratory data analysis , 1998, SIGIR 1998.