On the Necessity of Term Dependence in a Query Space for Weighted Retrieval

In recent years, in the context of the vector space model, the view, held by many researchers, that documents, queries, terms, etc. are all elements of a common space has been challenged (Bollmann-Sdorra and Raghavan, 1993). In particular, it was noted that term independence has to be investigated in the context of user preferences and it was shown, through counter examples, that term independence can hold in the document space, but not in the query space and vice-versa. In this paper, we continue the investigation of query and document spaces with respect to the property of term independence. We prove, under realistic assumptions, that requiring term independence to hold in the query space is inconsistent with the goal of achieving better performance by means of weighted retrieval. The result that term independence in the query space is undesirable is obtained without making any assumption about whether or not the property of term independence holds in the document space. The results of this paper reinforce our position that the properties of document and query spaces must be investigated separately, since the document and query spaces do not necessarily have the same properties.

[1]  Stephen Robertson,et al.  THEORIES AND MODELS IN INFORMATION RETRIEVAL , 1977 .

[2]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[3]  S. K. Michael Wong,et al.  Adaptive linear information retrieval models , 1987, SIGIR '87.

[4]  Clement T. Yu,et al.  Precision Weighting—An Effective Automatic Indexing Method , 1976, J. ACM.

[5]  S. K. Michael Wong,et al.  Linear structure in information retrieval , 1988, SIGIR '88.

[6]  Clement T. Yu A clustering algorithm based on user queries , 1974, J. Am. Soc. Inf. Sci..

[7]  Vijay V. Raghavan,et al.  Extended Boolean query processing in the generalized vector space model , 1989, Inf. Syst..

[8]  Yiyu Yao Measuring retrieval effectiveness based on user preference of documents , 1995 .

[9]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[10]  Friedbert Jochum,et al.  The LIVE-project: retrieval experiments based on evaluation viewpoints , 1985, SIGIR '85.

[11]  Vijay V. Raghavan,et al.  On the Delusiveness of Adopting a Common Space for Modeling IR Objects: Are Queries Documents , 1993, Journal of the American Society for Information Science.

[12]  Vijay V. Raghavan,et al.  On the reuse of past optimal queries , 1995, SIGIR '95.

[13]  William Cooper,et al.  A General Mathematical Model for Information Retrieval Systems , 1976, The Library Quarterly.

[14]  Gerald Salton,et al.  Automatic text processing , 1988 .

[15]  John A. Swets,et al.  Effectiveness of information retrieval methods , 1969 .

[16]  Clement T. Yu,et al.  Effective Automatic Indexing Using Term Addition and Deletion , 1978, JACM.

[17]  Vijay V. Raghavan,et al.  A critical analysis of vector space model for information retrieval , 1986, J. Am. Soc. Inf. Sci..