Query Processing for Distance Metrics

In applications such as vision and molecular biology, a common problem is to find the similar objects to a given target (according to some distance measure) in a large database. This paper presents a scheme for query processing in such situations. The basic strategy is to (partially) precompute inter-object distances, and by using the distance information and the triangle inequality, we eliminate the need to calculate certain object distances while evaluating queries. We propose several heuristics that may speed up query evaluation. A series of experiments are then performed to evaluate the effectiveness of our scheme and the relative performance of the heuristics for different queries. Finally we investigate the possibility of parallelizing our scheme through simulation. Our results show that parallelism is best applied in the later stages in evaluating a query.

[1]  Behrooz Kamgar-Parsi,et al.  An improved branch and bound algorithm for computing k-nearest neighbors , 1985, Pattern Recognit. Lett..

[2]  George G. Dodd Large data bases , 1970, SIGFIDET '70.

[3]  Grant Arthur Cheston Incremental algorithms in graph theory. , 1976 .

[4]  Hartmut Ehrig,et al.  Graph-Grammars and Their Application to Computer Science and Biology , 1978, Lecture Notes in Computer Science.

[5]  S R Pawagi Incremental graph algorithms for parallel random access machines , 1986 .

[6]  H. V. Jagadish,et al.  Efficient Search in Very Large Databases , 1988, VLDB.

[7]  Tetsuro Ito,et al.  Hierarchical file organization and its application to similar-string matching , 1983, TODS.

[8]  Hanan Samet,et al.  Distance Transform for Images Represented by Quadtrees , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Carlo Zaniolo The Representation and Deductive Retrieval of Complex Objects , 1985, VLDB.

[10]  Stephen Warshall,et al.  A Theorem on Boolean Matrices , 1962, JACM.

[11]  Caroline M. Eastman,et al.  Tree structures for high dimensionality nearest neighbor searching , 1982, Inf. Syst..

[12]  D. T. Lee,et al.  Computational Geometry—A Survey , 1984, IEEE Transactions on Computers.

[13]  Larry S. Davis,et al.  Approximate pattern matching in a pattern database system , 1980, Inf. Syst..

[14]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Computing k-Nearest Neighbors , 1975, IEEE Transactions on Computers.

[15]  Peter Widmayer,et al.  The LSD tree: spatial access to multidimensional and non-point objects , 1989, VLDB 1989.

[16]  Theo Härder,et al.  Parallelism in Processing Queries on Complex Objects , 1988, Proceedings [1988] International Symposium on Databases in Parallel and Distributed Systems.

[17]  Elisa Bertino,et al.  Indexing Techniques for Queries on Nested Objects , 1989, IEEE Trans. Knowl. Data Eng..

[18]  Walter A. Burkhard,et al.  Some approaches to best-match file searching , 1973, Commun. ACM.

[19]  Marvin B. Shapiro The choice of reference points in best-match file searching , 1977, CACM.

[20]  Michael Stonebraker,et al.  Application of Abstract Data Types and Abstract Indices to CAD Data Bases , 1986, Engineering Design Applications.

[21]  Gunter Saake,et al.  Sorting, Grouping and Duplicate Elimination in the Advanced Information Management Prototype , 1989, VLDB.

[22]  H.V. Jagadish,et al.  Materialization and incremental update of path information , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[23]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[24]  Robert M. Haralick,et al.  Structural Descriptions and Inexact Matching , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[26]  Jack A. Orenstein Redundancy in spatial databases , 1989, SIGMOD '89.

[27]  Jack A. Orenstein Spatial query processing in an object-oriented database system , 1986, SIGMOD '86.

[28]  Patrick Valduriez,et al.  Implementation Techniques of Complex Objects , 1986, VLDB.

[29]  Manfred Nagl,et al.  Graph-Grammars and Their Application to Computer Science , 1982, Lecture Notes in Computer Science.

[30]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[31]  Umeshwar Dayal,et al.  Simplifying Complex Objects: The PROBE Approach to Modelling and Querying Them , 1987, BTW.