Processing Conjunctive and Phrase Queries with the Set-Based Model

The objective of this paper is to present an extension to the set-based model (SBM), which is an effective technique for computing term weights based on co-occurrence patterns, for processing conjunctive and phrase queries. The intuition that semantically related term occurrences often occur closer to each other is taken into consideration. The novelty is that all known approaches that account for co-occurrence patterns was initially designed for processing disjunctive (OR) queries, and our extension provides a simple, effective and efficient way to process conjunctive (AND) and phrase queries. This technique is time efficient and yet yields nice improvements in retrieval effectiveness. Experimental results show that our extension improves the average precision of the answer set for all collection evaluated, keeping computational cost small. For the TReC-8 collection, our extension led to a gain, relative to the standard vector space model, of 23.32% and 18.98% in average precision curves for conjunctive and phrase queries, respectively.

[1]  Vijay V. Raghavan,et al.  On modeling of information retrieval concepts in vector spaces , 1987, TODS.

[2]  Vijay V. Raghavan,et al.  A Theoretical Framework for Association Mining Based on the Boolean Retrieval Model , 2001, DaWaK.

[3]  Vijay V. Raghavan,et al.  Experiments on the determination of the relationships between terms , 1979, ACM Trans. Database Syst..

[4]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[5]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[6]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[7]  Raghu Ramakrishnan,et al.  Proceedings : KDD 2000 : the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20-23, 2000, Boston, MA, USA , 2000 .

[8]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9]  Wagner Meira,et al.  Enhancing the Set-Based Model Using Proximity Information , 2002, SPIRE.

[10]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[11]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[12]  Wagner Meira,et al.  Set-based model: a new approach for information retrieval , 2002, SIGIR '02.

[13]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[14]  David Hawking,et al.  Overview of the TREC-2001 Web track , 2002 .

[15]  Donna K. Harman,et al.  Overview of the Eighth Text REtrieval Conference (TREC-8) , 1999, TREC.

[16]  Clement T. Yu,et al.  Precision Weighting—An Effective Automatic Indexing Method , 1976, J. ACM.

[17]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[18]  C. J. van Rijsbergen,et al.  An Evaluation of feedback in Document Retrieval using Co‐Occurrence Data , 1978, J. Documentation.

[19]  Clement T. Yu,et al.  An Evaluation of Term Dependence Models in Information Retrieval , 1982, SIGIR.

[20]  Mohammed J. Zaki Generating non-redundant association rules , 2000, KDD '00.

[21]  Vijay V. Raghavan,et al.  On extending the vector space model for Boolean query processing , 1986, SIGIR '86.