From Entities to Geometry: Towards Exploiting Multiple Sources to Support Relevance Prediction

The goal of an Information Retrieval (IR) system is to predict which information objects can help users in satisfying their information needs, i.e. predict relevance. Different sources of evidence can be exploited for this purpose. These sources are the properties of the different entities involved when retrieving and accessing information, where examples of entities include the information objects, the task, the user, or the location. The main hypothesis of this paper is that, to exploit the variety of entities and sources, it is necessary to model the relationships existing between the entities and those existing between the properties of the entities. Such relationships are themselves possible sources that can be used to predict relevance. This paper proposes a methodology that supports the design of an IR system able to model in a uniform way the properties of the entities involved, the properties of their relationships and the relationships between the different properties. The methodology is structured in four steps, aiming, respectively, at supporting the selection of the sources, collecting the evidence, modeling the sources and their relationships, and using the latter two to predict relevance. Sources and relationships are modeled and then exploited through a previously proposed geometric framework, which provides a uniform and concrete representation in terms of vector subspaces.

[1]  Mounia Lalmas,et al.  Combining evidence for Web retrieval using the inference network model: an experimental study , 2004, Inf. Process. Manag..

[2]  Ryen W. White,et al.  A study on the effects of personalization and task information on implicit feedback performance , 2006, CIKM '06.

[3]  Massimo Melucci,et al.  A basis for information retrieval in context , 2008, TOIS.

[4]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[5]  Ryen W. White,et al.  Utilizing a geometry of context for enhanced implicit feedback , 2007, CIKM '07.

[6]  Massimo Melucci,et al.  University of Padua at TREC 2009: Relevance Feedback Track , 2009, TREC.

[7]  Norbert Fuhr,et al.  A probability ranking principle for interactive information retrieval , 2008, Information Retrieval.

[8]  Jaime Teevan,et al.  Implicit feedback for inferring user preference: a bibliography , 2003, SIGF.

[9]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[10]  C. J. van Rijsbergen,et al.  The geometry of information retrieval , 2004 .

[11]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[12]  Susan T. Dumais,et al.  Learning user interaction models for predicting web search result preferences , 2006, SIGIR.

[13]  Bernard J. Jansen,et al.  Search log analysis: What it is, what's been done, how to do it , 2006 .

[14]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[15]  Vijay V. Raghavan,et al.  Vector Space Model of Information Retrieval - A Reevaluation , 1984, SIGIR.

[16]  Mounia Lalmas,et al.  Representing and retrieving structured documents using the Dempster-Shafer theory of evidence: modelling and evaluation , 1998, J. Documentation.

[17]  Ryen W. White,et al.  An implicit feedback approach for interactive information retrieval , 2006, Inf. Process. Manag..

[18]  Fabio Crestani,et al.  A methodology for the automatic construction of a hypertext for information retrieval , 1993, SAC '93.

[19]  Diane Kelly Understanding implicit feedback and document preference: a naturalistic user study , 2004, SIGF.