Efficient processing of vague queries using a data stream approach

In this paper, we consider vague queries n text and fact databases. A vague query can be formulated as a combination of vague cnterta. A single database object can meet a vague criterion to a certain degree. We confine ourselves to queries for which the answer can be computed efficiently by (perhaps repetitive) combtnatlon of ranktngs to new rankings. Since users usually w1lI tnspect some of the best answer objects only, the corresponding rarkngs need to be computed just as far as necessary to generate these first answer objects. In this contribution we describe an approach for esttmattng the number of elements needed from the basic rankings to compute a given number of elements of the resulting ranking. Experiments with a large text database prove the apphcability of our approach.

[1]  Amihai Motro,et al.  VAGUE: a user interface to relational databases that permits vague queries , 1988, TOIS.

[2]  Abraham Kandel,et al.  Implementing Imprecision in Information Systems , 1985, Inf. Sci..

[3]  Donna Harman,et al.  Overview of the First Text REtrieval Conference. , 1993, SIGIR 1993.

[4]  Chris Buckley,et al.  Optimization of inverted vector searches , 1985, SIGIR '85.

[5]  Norbert Fuhr,et al.  A Probabilistic Framework for Vague Queries and Imprecise Information in Databases , 1990, VLDB.

[6]  Dario Lucarella A Search Strategy for Large Document Bases , 1988, Electron. Publ..

[7]  Hans-Jürgen Zimmermann,et al.  Prinzipien und Anwendungspotential der Fuzzy Mengentheorie , 1991, Künstliche Intell..

[8]  H. Sichel On a Distribution Law for Word Frequencies , 1975 .

[9]  Norbert Fuhr,et al.  Integration of probabilistic fact and text retrieval , 1992, SIGIR '92.

[10]  Norbert Fuhr,et al.  Searching Proper Names in Databases , 1995, HIM.

[11]  Dennis Shasha,et al.  New techniques for best-match retrieval , 1990, TOIS.

[12]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[13]  Dario Lucarella,et al.  A document retrieval system based on nearest neighbour searching , 1988, J. Inf. Sci..

[14]  Myoung-Ho Kim,et al.  On the evaluation of Boolean operators in the extended Boolean retrieval framework , 1993, SIGIR.

[15]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[16]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[17]  Michael Stonebraker,et al.  Parallel Database Systems , 1990, Lecture Notes in Computer Science.