Negations and document length in logical retrieval

An unsolved problem in logic-based information retrieval is how to obtain automatically logical representations for documents and queries. This problem limits the impact of logical models for information retrieval because their full expressive power cannot be harnessed. In this paper we propose a method for producing logical document representations which goes further than other simplistic "bag-of-words" approaches. The suggested procedure adopts popular information retrieval heuristics, such as document length corrections and global term distribution. This work includes a report of several experiments applying partial document representations in the context of a propositional model of information retrieval. The benefits of this expressive framework, powered by the new logical indexing approach, become apparent in the evaluation.

[1]  Steven Reece,et al.  Modelling information retrieval agents with belief revision , 1994, SIGIR '94.

[2]  Fabio Crestani,et al.  Exploiting the Similarity of Non-Matching Terms at Retrieval Time , 2000, Information Retrieval.

[3]  Mounia Lalmas,et al.  Information Retrieval: Uncertainty and Logics: Advanced Models for the Representation and Retrieval of Information , 1998 .

[4]  Mukesh Dalal,et al.  Investigations into a Theory of Knowledge Base Revision , 1988, AAAI.

[5]  C. J. van Rijsbergen,et al.  A Non-Classical Logic for Information Retrieval , 1997, Comput. J..

[6]  David E. Losada,et al.  Implementing document ranking within a logical framework , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[7]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[8]  D. Losada,et al.  Efficient algorithms for ranking documents represented as DNF formulas , 2000 .

[9]  Amit Singhal,et al.  Pivoted document length normalization , 1996, SIGIR 1996.

[10]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[11]  David E. Losada,et al.  Using a belief revision operator for document ranking in extended Boolean models , 1999, SIGIR '99.

[12]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[13]  Mounia Lalmas,et al.  The use of logic in information retrieval modelling , 1998, The Knowledge Engineering Review.

[14]  Djoerd Hiemstra,et al.  Bayesian extension to the language model for ad hoc information retrieval , 2003, SIGIR.

[15]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[16]  David E. Losada,et al.  Propositional Logic Representations for Documents and Queries: A Large-Scale Evaluation , 2003, ECIR.

[17]  David E. Losada,et al.  A Logical Model for Information Retrieval based on Propositional Logic and Belief Revision , 2001, Comput. J..

[18]  David E. Losada,et al.  Embedding Term Similarity and Inverse Document Frequency into a Logical Model of Information , 2003, J. Assoc. Inf. Sci. Technol..

[19]  Raymond Y. K. Lau,et al.  Belief revision for adaptive information retrieval , 2004, SIGIR '04.