论文信息 - Ranking in context using vector spaces

Ranking in context using vector spaces

This paper presents a principled approach to the problem of retrieval in context. The notion of basis of a vector space was introduced previously to describe context. Any basis vectors represents a distinct piece of context, such as time, space, or word meaning. A vector is generated by a basis just as an informative object or an information need is generated in a context. As a consequence a different basis generates a different vector as a different context would generate different information needs or informative objects. Also the Vector Space Model (VSM) describes information needs and informative objects as query vectors and document vectors, respectively. However the VSM assumes that there is a unique basis, which is the set of versors and always generates the same vector provided the same coefficients. Thus a drawback of the VSM is that the vectors are insensitive to context, i.e. they are generated and ranked in the same way independently of the context in which information need and informative objects are. This paper also proposes a function to rank documents in context. Since a basis spans a subspace, which includes all the vectors of object being in the same context, the ranking function is a distance measure between the document vector and the subspace. Even though ranking is still based on an inner product between two vectors, the basic difference is that projection and distance depend on the basis, i.e. on the pieces of context and then ultimately on context. Since an informative object can be produced by different contexts, different bases can arise and then ranking can change. Mathematically, object vector generation is given by x = p1b1 + · · · p k b k = x = B · p, where B is a n × k (k ≤ n) complex matrix and p is a k × 1 real vector. The b's are independent vectors and as such form a basis B of a subspace L(B) of C n. The basis generates all the vectors in L(B) and every vector in it describes an informative object produced within the context described by B. It should be clear that every vector x in L(B) is entirely contained in the subspace spanned by B. Any other vector of the vector space may not be entirely contained in the subspace L(B) and may be more or less close to it. A vector y …

Massimo Melucci

[1] C. J. van Rijsbergen,et al. The geometry of information retrieval , 2004 .

[2] Stephen E. Robertson,et al. On Term Selection for Query Expansion , 1991, J. Documentation.

[3] Massimo Melucci,et al. An Evaluation of Automatically Constructed Hypertexts for Information Retrieval , 1999, Information Retrieval.

[4] Vijay V. Raghavan,et al. Vector Space Model of Information Retrieval - A Reevaluation , 1984, SIGIR.

[5] Diederik Aerts,et al. A Theory of Concepts and Their Combinations II: A Hilbert Space Representation , 2004 .

[6] Ellen M. Voorhees,et al. TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[7] Peter Bruza,et al. Quantum Logic of Semantic Space: An Exploratory Investigation of Context Effects in Practical Reasoning , 2005, We Will Show Them!.

[8] José Luis Vicedo González,et al. TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[9] Amit Singhal,et al. Pivoted document length normalization , 1996, SIGIR 1996.

[10] Ryen W. White,et al. Evaluating implicit feedback models using searcher simulations , 2005, TOIS.

[11] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.