MindReader: Querying Databases Through Multiple Examples

Users often can not easily express their queries. For example, in a multimedia/image by content setting, the user might want photographs with sunsets; in current systems, like QBIC, the user has to give a sample query, and to specify the relative importance of color, shape and texture. Even worse, the user might want correlations between attributes, like, for example, in a traditional, medical record database, a medical researcher might want to find “mildly overweight patients”, where the implied query would be “weight/height M 4 lb/inch”. Our goal is to provide a user-friendly, but theoretically solid method, to handle such queries. We allow the user to give several examples, and, optionally, their ‘goodness’ scores, and we propose a novel method to “guess” which attributes are important, which correlations are important, and with what weight. Our contributions are twofold: (a) we formalize the problem as a minimization problem and show how to solve for the optimal solution, completely avoiding the ad-hoc heurist Part of this work was done while this author was vising University of Maryland and Carnegie Mellon University. $ This work was supported by NSF IRI-9625428. Also, by the National Science Foundation, ARPA and NASA under NSF Cooperative Agreement No. IRI-9411299. Permission to copy without fee all OT part of this material is granted provided that the copies are not made OT distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, OT to republish, requires a fee and/or special permission jrom the Endowment. Proceedings of the 24th VLDB Conference New York, USA, 1998 tics of the past. (b) Moreover, we are the first that can handle ‘diagonal’ queries (like the ‘overweight’ query above). Experiments on synthetic and real datasets show that our method estimates quickly and accurately the ‘hidden’ distance function in the user’s mind.

[1]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[2]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[3]  Gene H. Golub,et al.  Matrix computations , 1983 .

[4]  Amihai Motro,et al.  VAGUE: a user interface to relational databases that permits vague queries , 1988, TOIS.

[5]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[6]  Donna K. Harman,et al.  Relevance Feedback and Other Query Modification Techniques , 1992, Information retrieval (Boston).

[7]  Toshikazu Kato,et al.  Query by Visual Example - Content based Image Retrieval , 1992, EDBT.

[8]  Christos Faloutsos,et al.  Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[9]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[10]  Takeo Kanade,et al.  Informedia Digital Video Library , 1995, CACM.

[11]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[12]  Takeo Kanade,et al.  Intelligent Access to Digital Video: Informedia Project , 1996, Computer.

[13]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[14]  Robert M. Losee,et al.  Feedback in Information Retrieval. , 1996 .

[15]  The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries , 1997, SIGMOD Conference.

[16]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[17]  Thomas S. Huang,et al.  Content-based image retrieval with relevance feedback in MARS , 1997, Proceedings of International Conference on Image Processing.

[18]  M. Carey,et al.  Processing Top N and Bottom N Queries , 1997, IEEE Data Eng. Bull..

[19]  Hans-Peter Kriegel,et al.  Efficient User-Adaptable Similarity Search in Large Multimedia Databases , 1997, VLDB.