Inferring query models by computing information flow

The language modelling approach to information retrieval can also be used to compute query models. A query model can be envisaged as an expansion of an initial query. The more prominent query models in the literature have a probabilistic basis. This paper introduces an alternative, non-probabilistic approach to query modelling whereby the strength of information flow is computed between a query Q and a term w. Information flow is a reflection of how strongly w is informationally contained within the query Q. The information flow model is based on Hyperspace Analogue to Language (HAL) vector representations, which reflects the lexical co-occurrence information of terms. Research from cognitive science has demonstrated the cognitive compatibility of HAL representations with human processing. Query models computed from TREC queries by HAL-based information flow are compared experimentally with two probabilistic query language models. Experimental results are provided showing the HAL-based information flow model be superior to query models computed via Markov chains, and seems to be as effective as a probabilistically motivated relevance model.

[1]  James Allan,et al.  Recent Experiments with INQUERY , 1995, TREC.

[2]  Gregory B. Newby,et al.  Cognitive space and information space , 2001, J. Assoc. Inf. Sci. Technol..

[3]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[4]  Jianying Wang,et al.  A corpus analysis approach for automatic query expansion , 1997, CIKM '97.

[5]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[6]  Peter Gärdenfors,et al.  Conceptual spaces - the geometry of thought , 2000 .

[7]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[8]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[9]  Peter Bruza,et al.  Discovering information flow suing high dimensional conceptual space , 2001, SIGIR '01.

[10]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[11]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[12]  Ellen M. Voorhees,et al.  Vector Expansion in a Large Collection , 1992, TREC.

[13]  Donna K. Harman,et al.  Overview of the Sixth Text REtrieval Conference (TREC-6) , 1997, Inf. Process. Manag..

[14]  Jon Barwise,et al.  Information Flow: The Logic of Distributed Systems , 1997 .

[15]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[16]  W. Bruce Croft,et al.  An Association Thesaurus for Information Retrieval , 1994, RIAO.

[17]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[18]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[19]  Hinrich Schütze,et al.  A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval , 1994, Inf. Process. Manag..

[20]  Curt Burgess,et al.  Explorations in context space: Words, sentences, discourse , 1998 .

[21]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .