From Entities to Geometry: Towards exploiting Multiple Sources to Predict Relevance

The goal of an Information Retrieval (IR) system is to predict which information objects can help users in satisfying their information needs, i.e. predict relevance. Different sources of evidence can be exploited for this purpose. These sources are the properties of the different entities involved when retrieving and accessing information, where examples of entities include the information objects, the task, the user, or the location. The main hypothesis of this paper is that, to exploit the variety of entities and sources, it is necessary to model the relationships existing between the entities and those existing between the properties of the entities. Such relationships are themselves possible sources that can be used to predict relevance. This paper proposes a methodology that supports the design of an IR system able to model in a uniform way the properties of the entities involved, the properties of their relationships and the relationships between the different properties. The methodology is structured in four steps, aiming, respectively, at supporting the selection of the sources, collecting the evidence, modeling the sources and their relationships, and using the latter two to predict relevance. Sources and relationships are modeled and then exploited through a previously proposed geometric framework, which provides a uniform and concrete representation in terms of vector subspaces.

[1]  C. J. van Rijsbergen,et al.  The geometry of information retrieval , 2004 .

[2]  Ryen W. White,et al.  An implicit feedback approach for interactive information retrieval , 2006, Inf. Process. Manag..

[3]  Fabio Crestani,et al.  A methodology for the automatic construction of a hypertext for information retrieval , 1993, SAC '93.

[4]  Norbert Fuhr,et al.  A probability ranking principle for interactive information retrieval , 2008, Information Retrieval.

[5]  Massimo Melucci,et al.  A basis for information retrieval in context , 2008, TOIS.

[6]  Vijay V. Raghavan,et al.  Vector Space Model of Information Retrieval - A Reevaluation , 1984, SIGIR.

[7]  Mounia Lalmas,et al.  Representing and retrieving structured documents using the Dempster-Shafer theory of evidence: modelling and evaluation , 1998, J. Documentation.

[8]  Jaime Teevan,et al.  Implicit feedback for inferring user preference: a bibliography , 2003, SIGF.

[9]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[10]  Mounia Lalmas,et al.  Combining evidence for Web retrieval using the inference network model: an experimental study , 2004, Inf. Process. Manag..

[11]  Massimo Melucci,et al.  University of Padua at TREC 2009: Relevance Feedback Track , 2009, TREC.

[12]  Susan T. Dumais,et al.  Learning user interaction models for predicting web search result preferences , 2006, SIGIR.

[13]  Bernard J. Jansen,et al.  Search log analysis: What it is, what's been done, how to do it , 2006 .

[14]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[15]  Ryen W. White,et al.  A study on the effects of personalization and task information on implicit feedback performance , 2006, CIKM '06.

[16]  Diane Kelly Understanding implicit feedback and document preference: a naturalistic user study , 2004, SIGF.

[17]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[18]  Ryen W. White,et al.  Utilizing a geometry of context for enhanced implicit feedback , 2007, CIKM '07.

[19]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.