Distance browsing in spatial databases

We compare two different techniques for browsing through a collection of spatial objects stored in an R-tree spatial data structure on the basis of their distances from an arbitrary spatial query object. The conventional approach is one that makes use of a <italic>k</italic>-nearest neighbor algorithm where <italic>k</italic> is known prior to the invocation of the algorithm. Thus if <italic>m</italic> < <italic>k</italic> neighbors are needed, the <italic>k</italic>-nearest neighbor algorithm has to be reinvoked for <italic>m</italic> neighbors, thereby possibly performing some redundant computations. The second approach is incremental in the sense that having obtained the <italic>k</italic> nearest neighbors, the <italic>k</italic> + 1<italic><supscrpt>st</supscrpt></italic> neighbor can be obtained without having to calculate the <italic>k</italic> + 1 nearest neighbors from scratch. The incremental approach is useful when processing complex queries where one of the conditions involves spatial proximity (e.g., the nearest city to Chicago with population greater than a million), in which case a query engine can make use of a pipelined strategy. We present a general incremental nearest neighbor algorithm that is applicable to a large class of hierarchical spatial data structures. This algorithm is adapted to the R-tree and its performance is compared to an existing <italic>k</italic>-nearest neighbor algorithm for R-trees [Rousseopoulos et al. 1995]. Experiments show that the incremental nearest neighbor algorithm significantly outperforms the <italic>k</italic>-nearest neighbor algorithm for distance browsing queries in a spatial database that uses the R-tree as a spatial index. Moreover, the incremental nearest neighbor algorithm usually outperforms the <italic>k</italic>-nearest neighber algorithm when applied to the <italic>k</italic>-nearest neighbor problem for the R-tree, although the improvement is not nearly as large as for distance browsing queries. In fact, we prove informally that at any step in its execution the incremental nearest neighbor algorithm is optimal with respect to the spatial data structure that is employed. Furthermore, based on some simplifying assumptions, we prove that in two dimensions the number of distance computations and leaf nodes accesses made by the algorithm for finding <italic>k</italic> neighbors is <italic>O</italic>(<italic>k</italic> + <italic>k</italic>).

[1]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Computing k-Nearest Neighbors , 1975, IEEE Transactions on Computers.

[2]  Paul M. Aoki Generalizing Search'' in Generalized Search Trees (Extended Abstract) , 1998, ICDE 1998.

[3]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[4]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[5]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[6]  Caroline M. Eastman,et al.  Partially Specified Nearest Neighbor Searches Using k-d Trees , 1982, Inf. Process. Lett..

[7]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[8]  Nick Roussopoulos,et al.  Direct spatial search on pictorial databases using packed R-trees , 1985, SIGMOD Conference.

[9]  Hanan Samet,et al.  Efficient Processing of Spatial Queries in Line Segment Databases , 1991, SSD.

[10]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[11]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[12]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[13]  Marshall W. Bern,et al.  Approximate Closest-Point Queries in High Dimensions , 1993, Inf. Process. Lett..

[14]  Peter Widmayer,et al.  The LSD tree: spatial access to multidimensional and non-point objects , 1989, VLDB 1989.

[15]  Walter A. Burkhard,et al.  Some approaches to best-match file searching , 1973, Commun. ACM.

[16]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[17]  Andrew U. Frank,et al.  The Fieldtree: A Data Structure for Geographic Information Systems , 1990, SSD.

[18]  Ambuj K. Singh,et al.  Dimensionality reduction for similarity searching in dynamic databases , 1998, SIGMOD '98.

[19]  Dennis Shasha,et al.  Query Processing for Distance Metrics , 1990, VLDB.

[20]  Paul M. Aoki Generalizing "search" in generalized search trees , 1998, Proceedings 14th International Conference on Data Engineering.

[21]  Behrooz Kamgar-Parsi,et al.  An improved branch and bound algorithm for computing k-nearest neighbors , 1985, Pattern Recognit. Lett..

[22]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[23]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[24]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[25]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[26]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[27]  Raj Jain,et al.  Algorithms and strategies for similarity retrieval , 1996 .

[28]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[29]  Hans-Peter Kriegel,et al.  3D Similarity Search by Shape Approximation , 1997, SSD.

[30]  Christos Faloutsos,et al.  Fast Nearest Neighbor Search in Medical Image Databases , 1996, VLDB.

[31]  Ralf Hartmut Güting,et al.  Rule-based optimization and query processing in an extensible geometric database system , 1992, TODS.

[32]  Oliver Günther,et al.  Spatial database indices for large extended objects , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[33]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[34]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[35]  David J. DeWitt,et al.  Equi-depth multidimensional histograms , 1988, SIGMOD '88.

[36]  D. B. Lomet,et al.  A robust multi-attribute search structure , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[37]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[38]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[39]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[40]  Walid G. Aref,et al.  Estimating Selectivity Factors of Spatial Operations , 1993, FMLDO.

[41]  Hanan Samet,et al.  A consistent hierarchical representation for vector data , 1986, SIGGRAPH.

[42]  Stanley M. Selkow,et al.  The Efficiency of Using k-d Trees for Finding Nearest Neighbors in Discrete Space , 1986, Inf. Process. Lett..

[43]  James Lee Hafner,et al.  Efficient Color Histogram Indexing for Quadratic Form Distance Functions , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[45]  Andreas Henrich A Distance Scan Algorithm for Spatial Access Structures , 1994, ACM-GIS.

[46]  Hanan Samet,et al.  Orthogonal Polygons as Bounding Structures in Filter-Refine Query Processing Strategies , 1997, SSD.

[47]  Alan J. Broder Strategies for efficient incremental nearest neighbor search , 1990, Pattern Recognit..

[48]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[49]  Andreas Henrich,et al.  The LSD/sup h/-tree: an access structure for feature vectors , 1998, Proceedings 14th International Conference on Data Engineering.

[50]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[51]  Walid G. Aref,et al.  Uniquely reporting spatial objects: yet another operation for comparing spatial data structures , 1992 .

[52]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.